This blogpost is a follow up on my previous post about setting up a cluster, if you haven’t read the previous ones, I strongly suggest to read them first:

In this series of blogposts, I will explain how I configured my homeservers as a Nomad cluster with Consul as a DNS resolver for the cluster nodes and services.

Consul DNS service

Consul provides a DNS service on port 8600. Clients can ask Consul through DNS queries to resolve a FQDN to a specific service. This allows us to run the services on multiple nodes, without having to update the configuration files. For example: Synapse needs a PostgreSQL database, Synapse resolves the FQDN of the PostgreSQL job to the IP of the node:

dig @127.0.0.1:8600 postgresql.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47726
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;postgresql.service.consul.	IN	A

;; ANSWER SECTION:
postgresql.service.consul. 0	IN	A	<IP OF THE NODE>

;; Query time: 19 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Sep 24 18:30:00 CEST 2020
;; MSG SIZE  rcvd: 70

However, the OS expects that the DNS service always runs on the priviliged port 53 and we don’t want to give Consul more rights, just to bind to this port. We can solve this using dnsmasq, a lightweight DNS server which can query the Consul DNS service for .consul domains and forward all other queries to other DNS services.

Installing dnsmasq

Dnsmasq is available in the Debian and Ubuntu repositories, so it can be installed by running:

sudo apt install dnsmasq

Although dnsmasq will be installed, it will not work out of the box. The reason for this is that systemd-resolved and NetworkManager are already running on Debian and Ubuntu machines. Port 53 is already binded by one of them. To overcome this, disable systemd-resolved:

sudo systemctl disable --now systemd-resolved

And disable NetworkManager’s DNS service:

sudo vim /etc/NetworkManager/NetworkManager.conf

# Change dns=default
dns=none

# Restart NetworkManager
sudo systemctl restart NetworkManager

And restart the dnsmasq service:

sudo systemctl restart dnsmasq

However, we still haven’t told dnsmasq what needs to happen when a DNS query is received. Let’s change that by editing /etc/dnsmasq.conf:

# Listen on this specific port instead of the standard DNS port
# (53). Setting this to zero completely disables DNS function,
# leaving only DHCP and/or TFTP.
port=53
# Never forward plain names (without a dot or domain part)
domain-needed
# Never forward addresses in the non-routed address spaces.
bogus-priv
# By  default,  dnsmasq  will  send queries to any of the upstream
# servers it knows about and tries to favour servers to are  known
# to  be  up.  Uncommenting this forces dnsmasq to try each query
# with  each  server  strictly  in  the  order  they   appear   in
# /etc/resolv.conf
strict-order

# Set Listen address
listen-address=172.17.0.1 # Set to Server IP for network responses
bind-interfaces

# Enable forward lookup of the 'consul' domain:
server=/consul/192.168.0.10#8600

# Uncomment and modify as appropriate to enable reverse DNS lookups for
# common netblocks found in RFC 1918, 5735, and 6598:
rev-server=192.168.0.0/16,127.0.0.1#8600 # Adapt this to your IP range!

# Accept DNS queries only from hosts whose address is on a local subnet.
local-service

And add the following to /etc/resolv.conf:

nameserver 172.17.0.1  # docker0 IP

This way, dnsmasq will:

  • Listen on port 53
  • Never forward plain names and unroutable addresses
  • Follow the order of /etc/resolv.conf
  • Only resolve for local hosts
  • Listen on the IP of the docker0 interface. This is important if you want to have Docker containers to use this local DNS server. Docker skips all 127.0.0.X addresses when copying over the /etc/resolv.conf file to the Docker container from the host. This issue is discussed in detail on StackOverflow.

Note: For now, I couldn’t get DNS resolvement in the Docker containers working with the UFW firewall enabled, so I had to disable it for now.

Update Nomad jobs with FDQNs

Now that the DNS resolvement using Consul works, you need to update all your Nomad jobs from hardcoded IP addresses to FQDNs. When you start the job, the job will use the DNS service of dnsmasq to resolve these FQDNs.