This blogpost is a follow up on my previous post about setting up a cluster, if you haven’t read the previous ones, I strongly suggest to read them first:
Consul DNS service
Consul provides a DNS service on port 8600. Clients can ask Consul through DNS queries to resolve a FQDN to a specific service. This allows us to run the services on multiple nodes, without having to update the configuration files. For example: Synapse needs a PostgreSQL database, Synapse resolves the FQDN of the PostgreSQL job to the IP of the node:
dig @127.0.0.1:8600 postgresql.service.consul ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47726 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;postgresql.service.consul. IN A ;; ANSWER SECTION: postgresql.service.consul. 0 IN A <IP OF THE NODE> ;; Query time: 19 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Sep 24 18:30:00 CEST 2020 ;; MSG SIZE rcvd: 70
However, the OS expects that the DNS service always runs on the priviliged port
53 and we don’t want to give Consul more rights, just to bind to this port.
We can solve this using dnsmasq,
a lightweight DNS server which can query the Consul DNS service for
domains and forward all other queries to other DNS services.
Dnsmasq is available in the Debian and Ubuntu repositories, so it can be installed by running:
sudo apt install dnsmasq
Although dnsmasq will be installed, it will not work out of the box.
The reason for this is that
NetworkManager are already
running on Debian and Ubuntu machines.
Port 53 is already binded by one of them.
To overcome this, disable
sudo systemctl disable --now systemd-resolved
And disable NetworkManager’s DNS service:
sudo vim /etc/NetworkManager/NetworkManager.conf # Change dns=default dns=none # Restart NetworkManager sudo systemctl restart NetworkManager
And restart the dnsmasq service:
sudo systemctl restart dnsmasq
However, we still haven’t told dnsmasq what needs to happen when a DNS query
Let’s change that by editing
# Listen on this specific port instead of the standard DNS port # (53). Setting this to zero completely disables DNS function, # leaving only DHCP and/or TFTP. port=53 # Never forward plain names (without a dot or domain part) domain-needed # Never forward addresses in the non-routed address spaces. bogus-priv # By default, dnsmasq will send queries to any of the upstream # servers it knows about and tries to favour servers to are known # to be up. Uncommenting this forces dnsmasq to try each query # with each server strictly in the order they appear in # /etc/resolv.conf strict-order # Set Listen address listen-address=172.17.0.1 # Set to Server IP for network responses bind-interfaces # Enable forward lookup of the 'consul' domain: server=/consul/192.168.0.10#8600 # Uncomment and modify as appropriate to enable reverse DNS lookups for # common netblocks found in RFC 1918, 5735, and 6598: rev-server=192.168.0.0/16,127.0.0.1#8600 # Adapt this to your IP range! # Accept DNS queries only from hosts whose address is on a local subnet. local-service
And add the following to
nameserver 172.17.0.1 # docker0 IP
This way, dnsmasq will:
- Listen on port 53
- Never forward plain names and unroutable addresses
- Follow the order of
- Only resolve for local hosts
- Listen on the IP of the
docker0interface. This is important if you want to have Docker containers to use this local DNS server. Docker skips all
127.0.0.Xaddresses when copying over the
/etc/resolv.conffile to the Docker container from the host. This issue is discussed in detail on StackOverflow.
Note: For now, I couldn’t get DNS resolvement in the Docker containers working with the UFW firewall enabled, so I had to disable it for now.
Update Nomad jobs with FDQNs
Now that the DNS resolvement using Consul works, you need to update all your
Nomad jobs from hardcoded IP addresses to FQDNs.
When you start the job, the job will use the DNS service of
resolve these FQDNs.