This blogpost is a follow up on my previous post about setting up a cluster, if you haven’t read the previous ones, I strongly suggest to read them first:

In this series of blogposts, I will explain how I configured my homeservers as a Nomad cluster with Consul as a DNS resolver for the cluster nodes and services.

1. Gossip protocol between Nomad servers
2. RPC/HTTP protocol between Nomad servers and clients

Nomad is non-secure by default! Fine for experiments, but not for production! Both protocols need to be secured to avoid that others might misuse your cluster for their own gain!

### Gossip protocol

The gossip protocol is secured using symmetric encryption. Let’s generate a new key:

./nomad operator keygen
cg8StVXbQJ0gPvMd9o7yrg==


Add this key to the server stanza of your Nomad server configuration:

server {
...

# Encrypt gossip communication
encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="
}


Restart the Nomad server and you’re good to go!

### RPC/HTTP protocol

The RPC and HTTP protocol use TLS certificates to authenticate nodes and verify them.

To generate TLS certificates with a private CA, install the FOSS tool from Cloudflare for generating certificates:

sudo apt install golang-cfssl


#### Generating the CA certificate

We first need a CA certificate which we can use to sign certificates for each Nomad agent:

cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca


This will generate 3 files:

1. nomad-ca-key.pem: The CA’s private key to sign certificates
2. nomad-ca.pem: The CA’s public key to verify certificates
3. nomad-ca.csr: Certification request.

#### Generating a node certificate

Now that we have the CA certificates we can generate and sign certificates for each Nomad agent. However, we first need to add a configuration file for the CFSSL tool to extend the expiration time (87600 h = 10 years) of the certificates.

Create the following cfssl.json file:

{
"signing": {
"default": {
"expiry": "87600h",
"usages": ["signing", "key encipherment", "server auth", "client auth"]
}
}
}


Generate the certificates for:

2. 1x Nomad client, if you add more worker nodes, you need to generate more of these.
3. 1x Nomad CLI, otherwise you cannot use the CLI tools anymore
# Nomad server

cfssljson -bare worker1

-profile=client - | cfssljson -bare cli


Note: *localhost and 127.0.0.1 as host allows local communication on the node. The FQDN consist of ..nomad.*

sudo chown nomad:nomad *.pem
sudo chmod 400 *.pem
sudo chmod 400 *.csr


If you haven’t done it already, the same applies for your job descriptions, Nomad config file and the Nomad binary. It’s better to grant only the permissions that are needed to operate the cluster. If something goes wrong, the access an attacker has is heavily limited.

# Job descriptions

sudo chmod 600 config.hcl


### Configuring ACL

Even though TLS certificates and Gossip encryption is enabled, users can still control the Nomad cluster without any access control. To avoid this, we will enable ACLs (Access Control Lists) which limit what a client can do depending on it’s access token.

acl {
enabled = true
}


And restart every Nomad client. ACL is now enabled, but we don’t have an access token yet! Bootstrap the ACL system by running:

./nomad acl bootstrap


This will generate a management token which is equivalent of root in Linux. Save this in a secure place, you might need it later! I use this token to access the Nomad UI for example since the default Nomad policy is deny all. Without this token, the UI won’t show your current jobs, servers and clients.

See HashiCorp Learn for a detailed tutorial.

## Securing a Consul cluster

A complete tutorial regarding securing your Consul cluster can be found on HashiCorp Learn: https://learn.hashicorp.com/collections/consul/security-networking

### Gossip protocol

The same approach is used as with Nomad since they share the same Gossip protocol.

./consul keygen
cg8StVXbQJ0gPvMd9o7yrg==


While Nomad only needs the encryption key in Nomad servers, Consul needs to have the key in all Consul agents (clients and servers):

encrypt: "cg8StVXbQJ0gPvMd9o7yrg=="


Restart the Consul agent and you’re good to go!

### RPC/HTTP protocol

The same approach from Nomad is used with Consul. We will use consul tls ca create to generate the CA certificate instead of cfssl, but cfssl can be used too. A detailed guide is available at HashiCorp Learn: https://learn.hashicorp.com/tutorials/consul/tls-encryption-secure

Generate the CA certificate and key:

./consul tls ca create


Generate the server and clients certificate using the CA certificate:

# Consul server (repeat for each server), expire in 10 years
consul tls cert create -server -d=3650

# Consul client (repeat for each client), expire in 10 years
consul tls cert create -client -d=3650


Distribute the certificates among the nodes. Even though auto_encrypt exist, for the sake of keeping it simple, I didn’t use it. I applied the following configuration on each Consul node:

{
...
"enable_script_checks": false,
"disable_remote_exec": true,
"verify_server_hostname": true,
"verify_outgoing": true,
"verify_incoming_https": false,
"verify_incoming_rpc": true,
"ca_file": "<CA CERT>",
"cert_file": "<SERVER/CLIENT CERT>",
"key_file": "<SERVER/CLIENT KEY>",
...
}


Note: verify_incoming_https is set to false, otherwise you have to add the certificates to your browser if you want to access the UI.

Restart the consul services on all nodes:

sudo systemctl restart consul


The CLI tool to manage Consul will just work since we set verify_incoming_https to false, but we have to instruct the tool that we require the https scheme from now on.

Add the following to your .bashrc file and source it:

# Set Consul HTTPS

# Source it
source .bashrc


## Configuring ACL

As with the Nomad cluster, Consul access is also not restricted without ACLs. However, Consul requires some additional work in comparison to setting up ACLs in Nomad.

### Enabling ACL on agents

To enable ACL on Consul agents, we have to configure it in the Consul configuration file first and restart each agent:


"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"default": "WILL ADD THIS ONE LATER",
"agent": "SAME HERE"
}
}


By configuring the default policy to deny, we require everyone who interacts with Consul to supply an access token. For those who don’t supply one, the default one is used. The default token is used for example by the DNS resolvement service of Consul. We will configure these tokens a bit further in this post.

Note: Consul agents won’t work anymore with the current policy!

Bootstrapping the ACL system:

consul acl bootstrap


This will generate also a management token, just like we generated for the Nomad cluster. This one can be used as well to access the UI and perform management tasks using the CLI tool.

To use this token for the CLI tool, add the following to your .bashrc file:

export CONSUL_HTTP_TOKEN=<secret id of the token>


Be careful where you store this token, everyone who can access this token can also control your Consul cluster!

### Generating tokens

Generating a token consists of 3 steps:

1. Create a policy
2. Create a token for this policy, multiple tokens can be generated using the same policy.

A detailed tutorial can be found on HashiCorp Learn.

#### Agents

Create a file consul-agent-policy.hcl with the following content:

# consul-agent-policy.hcl

node_prefix "" {
policy = "write"
}
service_prefix "" {
}


This policy allows a Consul agent to write node related data and read service related data.

Now add the policy to Consul:

consul acl policy create \
-name consul-agent \
-rules @consul-agent-policy.hcl


And generate for each agent a token:

consul acl token create -description "Consul agent token" \
-policy-name consul-agent


Add the token to each Consul agent configuration as the agent token.

#### Services

In my cluster setup, I use Traefik which interact with Consul to automatically configure reverse proxying and generating Let’s Encrypt certificates.

I used the following policy file:

# traefik-policy.hcl

key "traefik" {
policy = "write"
}

session "" {
policy = "write"
}

service_prefix "" {
}

node_prefix "" {
policy = "write"
}


The rest of the commands are exactly the same as with agents, except that a different name and policy file is used.

#### DNS

Consul DNS only needs to read which services are up and running, so we have to allow Consul DNS to read Consul:

## dns-request-policy.hcl

node_prefix "" {