This blogpost is a follow up on my previous post about setting up a cluster, if you haven’t read the previous ones, I strongly suggest to read them first:

In this series of blogposts, I will explain how I configured my homeservers as a Nomad cluster with Consul as a DNS resolver for the cluster nodes and services.

Securing a Nomad cluster

Nomad uses 2 protocols:

  1. Gossip protocol between Nomad servers
  2. RPC/HTTP protocol between Nomad servers and clients

Nomad is non-secure by default! Fine for experiments, but not for production! Both protocols need to be secured to avoid that others might misuse your cluster for their own gain!

A complete tutorial regarding securing your Nomad cluster can be found on HashiCorp Learn: https://learn.hashicorp.com/collections/nomad/transport-security

Gossip protocol

The gossip protocol is secured using symmetric encryption. Let’s generate a new key:

./nomad operator keygen
cg8StVXbQJ0gPvMd9o7yrg==

Add this key to the server stanza of your Nomad server configuration:

server {
  ...

  # Encrypt gossip communication
  encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="
}

Restart the Nomad server and you’re good to go!

RPC/HTTP protocol

The RPC and HTTP protocol use TLS certificates to authenticate nodes and verify them.

To generate TLS certificates with a private CA, install the FOSS tool from Cloudflare for generating certificates:

sudo apt install golang-cfssl

Generating the CA certificate

We first need a CA certificate which we can use to sign certificates for each Nomad agent:

cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca

This will generate 3 files:

  1. nomad-ca-key.pem: The CA’s private key to sign certificates
  2. nomad-ca.pem: The CA’s public key to verify certificates
  3. nomad-ca.csr: Certification request.

Generating a node certificate

Now that we have the CA certificates we can generate and sign certificates for each Nomad agent. However, we first need to add a configuration file for the CFSSL tool to extend the expiration time (87600 h = 10 years) of the certificates.

Create the following cfssl.json file:

{
  "signing": {
    "default": {
      "expiry": "87600h",
      "usages": ["signing", "key encipherment", "server auth", "client auth"]
    }
  }
}

Generate the certificates for:

  1. 1x Nomad server
  2. 1x Nomad client, if you add more worker nodes, you need to generate more of these.
  3. 1x Nomad CLI, otherwise you cannot use the CLI tools anymore
# Nomad server
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem \
-config=cfssl.json -hostname="server.global.nomad,localhost,127.0.0.1" - | \
cfssljson -bare leader

# Nomad client
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem \
-config=cfssl.json -hostname="client.global.nomad,localhost,127.0.0.1" - | \
cfssljson -bare worker1

# Nomad CLI
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem \
-profile=client - | cfssljson -bare cli

Note: *localhost and 127.0.0.1 as host allows local communication on the node. The FQDN consist of ..nomad.*

Also make sure that the certificates are only readable by your Nomad user:

sudo chown nomad:nomad *.pem
sudo chown nomad:nomad *.csr
sudo chmod 400 *.pem
sudo chmod 400 *.csr

If you haven’t done it already, the same applies for your job descriptions, Nomad config file and the Nomad binary. It’s better to grant only the permissions that are needed to operate the cluster. If something goes wrong, the access an attacker has is heavily limited.

# Job descriptions
sudo chown nomad:nomad *.nomad
sudo chmod 600 *.nomad

# Nomad binary
sudo chown nomad:nomad nomad
sudo chmod 700 nomad

# Nomad config
sudo chown nomad:nomad config.hcl
sudo chmod 600 config.hcl

Configuring ACL

Even though TLS certificates and Gossip encryption is enabled, users can still control the Nomad cluster without any access control. To avoid this, we will enable ACLs (Access Control Lists) which limit what a client can do depending on it’s access token.

Simply add the following stanza to your Nomad config file:

acl {
  enabled = true
}

And restart every Nomad client. ACL is now enabled, but we don’t have an access token yet! Bootstrap the ACL system by running:

./nomad acl bootstrap

This will generate a management token which is equivalent of root in Linux. Save this in a secure place, you might need it later! I use this token to access the Nomad UI for example since the default Nomad policy is deny all. Without this token, the UI won’t show your current jobs, servers and clients.

See HashiCorp Learn for a detailed tutorial.

Securing a Consul cluster

A complete tutorial regarding securing your Consul cluster can be found on HashiCorp Learn: https://learn.hashicorp.com/collections/consul/security-networking

Gossip protocol

The same approach is used as with Nomad since they share the same Gossip protocol.

./consul keygen
cg8StVXbQJ0gPvMd9o7yrg==

While Nomad only needs the encryption key in Nomad servers, Consul needs to have the key in all Consul agents (clients and servers):

encrypt: "cg8StVXbQJ0gPvMd9o7yrg=="

Restart the Consul agent and you’re good to go!

RPC/HTTP protocol

The same approach from Nomad is used with Consul. We will use consul tls ca create to generate the CA certificate instead of cfssl, but cfssl can be used too. A detailed guide is available at HashiCorp Learn: https://learn.hashicorp.com/tutorials/consul/tls-encryption-secure

Generate the CA certificate and key:

./consul tls ca create

Generate the server and clients certificate using the CA certificate:

# Consul server (repeat for each server), expire in 10 years
consul tls cert create -server -d=3650

# Consul client (repeat for each client), expire in 10 years
consul tls cert create -client -d=3650

Distribute the certificates among the nodes. Even though auto_encrypt exist, for the sake of keeping it simple, I didn’t use it. I applied the following configuration on each Consul node:

{
    ...
    "enable_script_checks": false,
    "disable_remote_exec": true,
    "verify_server_hostname": true,
    "verify_outgoing": true,
    "verify_incoming_https": false,
    "verify_incoming_rpc": true,
    "ca_file": "<CA CERT>",
    "cert_file": "<SERVER/CLIENT CERT>",
    "key_file": "<SERVER/CLIENT KEY>",
    ...
}

Note: verify_incoming_https is set to false, otherwise you have to add the certificates to your browser if you want to access the UI.

Restart the consul services on all nodes:

sudo systemctl restart consul

The CLI tool to manage Consul will just work since we set verify_incoming_https to false, but we have to instruct the tool that we require the https scheme from now on.

Add the following to your .bashrc file and source it:

# Set Consul HTTPS
export CONSUL_HTTP_ADDR="http://<IP>:8500"

# Source it
source .bashrc

Configuring ACL

As with the Nomad cluster, Consul access is also not restricted without ACLs. However, Consul requires some additional work in comparison to setting up ACLs in Nomad.

Enabling ACL on agents

To enable ACL on Consul agents, we have to configure it in the Consul configuration file first and restart each agent:


"acl": {
  "enabled": true,
  "default_policy": "deny",
  "enable_token_persistence": true,
  "tokens": {
    "default": "WILL ADD THIS ONE LATER",
    "agent": "SAME HERE"
  }
}

By configuring the default policy to deny, we require everyone who interacts with Consul to supply an access token. For those who don’t supply one, the default one is used. The default token is used for example by the DNS resolvement service of Consul. We will configure these tokens a bit further in this post.

Note: Consul agents won’t work anymore with the current policy!

Bootstrapping the ACL system:

consul acl bootstrap

This will generate also a management token, just like we generated for the Nomad cluster. This one can be used as well to access the UI and perform management tasks using the CLI tool.

To use this token for the CLI tool, add the following to your .bashrc file:

export CONSUL_HTTP_TOKEN=<secret id of the token>

:warning: Be careful where you store this token, everyone who can access this token can also control your Consul cluster!

Generating tokens

Generating a token consists of 3 steps:

  1. Create a policy
  2. Create a token for this policy, multiple tokens can be generated using the same policy.
  3. Add the token to your agent or client

A detailed tutorial can be found on HashiCorp Learn.

Agents

Create a file consul-agent-policy.hcl with the following content:

# consul-agent-policy.hcl

node_prefix "" {
  policy = "write"
}
service_prefix "" {
  policy = "read"
}

This policy allows a Consul agent to write node related data and read service related data.

Now add the policy to Consul:

consul acl policy create \
  -name consul-agent \
  -rules @consul-agent-policy.hcl

And generate for each agent a token:

consul acl token create -description "Consul agent token" \
  -policy-name consul-agent

Add the token to each Consul agent configuration as the agent token.

Services

In my cluster setup, I use Traefik which interact with Consul to automatically configure reverse proxying and generating Let’s Encrypt certificates.

I used the following policy file:

# traefik-policy.hcl

key "traefik" {
    policy = "write"
}

session "" {
    policy = "write"
}

service_prefix "" {
    policy = "read"
}

node_prefix "" {
    policy = "write"
}

The rest of the commands are exactly the same as with agents, except that a different name and policy file is used.

DNS

Consul DNS only needs to read which services are up and running, so we have to allow Consul DNS to read Consul:

## dns-request-policy.hcl

node_prefix "" {
  policy = "read"
}
service_prefix "" {
  policy = "read"
}

# Only needed if using prepared queries
query_prefix "" {
  policy = "read"
}

After generating the token, add it to the config file of each Consul agent as the default token and restart all agents.