Building a cluster part 4
This blogpost is a follow up on my previous post about setting up a cluster, if you haven’t read the previous ones, I strongly suggest to read them first:
In this series of blogposts, I will explain how I configured my homeservers as a Nomad cluster with Consul as a DNS resolver for the cluster nodes and services.
Securing a Nomad cluster
Nomad uses 2 protocols:
- Gossip protocol between Nomad servers
- RPC/HTTP protocol between Nomad servers and clients
Nomad is non-secure by default! Fine for experiments, but not for production! Both protocols need to be secured to avoid that others might misuse your cluster for their own gain!
A complete tutorial regarding securing your Nomad cluster can be found on HashiCorp Learn: https://learn.hashicorp.com/collections/nomad/transport-security
Gossip protocol
The gossip protocol is secured using symmetric encryption. Let’s generate a new key:
./nomad operator keygen
cg8StVXbQJ0gPvMd9o7yrg==
Add this key to the server
stanza of your Nomad server configuration:
server {
...
# Encrypt gossip communication
encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="
}
Restart the Nomad server and you’re good to go!
RPC/HTTP protocol
The RPC and HTTP protocol use TLS certificates to authenticate nodes and verify them.
To generate TLS certificates with a private CA, install the FOSS tool from Cloudflare for generating certificates:
sudo apt install golang-cfssl
Generating the CA certificate
We first need a CA certificate which we can use to sign certificates for each Nomad agent:
cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca
This will generate 3 files:
-
nomad-ca-key.pem
: The CA’s private key to sign certificates -
nomad-ca.pem
: The CA’s public key to verify certificates -
nomad-ca.csr
: Certification request.
Generating a node certificate
Now that we have the CA certificates we can generate and sign certificates for each Nomad agent. However, we first need to add a configuration file for the CFSSL tool to extend the expiration time (87600 h = 10 years) of the certificates.
Create the following cfssl.json
file:
{
"signing": {
"default": {
"expiry": "87600h",
"usages": ["signing", "key encipherment", "server auth", "client auth"]
}
}
}
Generate the certificates for:
- 1x Nomad server
- 1x Nomad client, if you add more worker nodes, you need to generate more of these.
- 1x Nomad CLI, otherwise you cannot use the CLI tools anymore
# Nomad server
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem \
-config=cfssl.json -hostname="server.global.nomad,localhost,127.0.0.1" - | \
cfssljson -bare leader
# Nomad client
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem \
-config=cfssl.json -hostname="client.global.nomad,localhost,127.0.0.1" - | \
cfssljson -bare worker1
# Nomad CLI
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem \
-profile=client - | cfssljson -bare cli
Note: *localhost
and 127.0.0.1
as host allows local communication on the
node. The FQDN consist of
Also make sure that the certificates are only readable by your Nomad user:
sudo chown nomad:nomad *.pem
sudo chown nomad:nomad *.csr
sudo chmod 400 *.pem
sudo chmod 400 *.csr
If you haven’t done it already, the same applies for your job descriptions, Nomad config file and the Nomad binary. It’s better to grant only the permissions that are needed to operate the cluster. If something goes wrong, the access an attacker has is heavily limited.
# Job descriptions
sudo chown nomad:nomad *.nomad
sudo chmod 600 *.nomad
# Nomad binary
sudo chown nomad:nomad nomad
sudo chmod 700 nomad
# Nomad config
sudo chown nomad:nomad config.hcl
sudo chmod 600 config.hcl
Configuring ACL
Even though TLS certificates and Gossip encryption is enabled, users can still control the Nomad cluster without any access control. To avoid this, we will enable ACLs (Access Control Lists) which limit what a client can do depending on it’s access token.
Simply add the following stanza to your Nomad config file:
acl {
enabled = true
}
And restart every Nomad client. ACL is now enabled, but we don’t have an access token yet! Bootstrap the ACL system by running:
./nomad acl bootstrap
This will generate a management
token which is equivalent of root
in Linux.
Save this in a secure place, you might need it later!
I use this token to access the Nomad UI for example since the default Nomad
policy is deny all.
Without this token, the UI won’t show your current jobs, servers and clients.
See HashiCorp Learn for a detailed tutorial.
Securing a Consul cluster
A complete tutorial regarding securing your Consul cluster can be found on HashiCorp Learn: https://learn.hashicorp.com/collections/consul/security-networking
Gossip protocol
The same approach is used as with Nomad since they share the same Gossip protocol.
./consul keygen
cg8StVXbQJ0gPvMd9o7yrg==
While Nomad only needs the encryption key in Nomad servers, Consul needs to have the key in all Consul agents (clients and servers):
encrypt: "cg8StVXbQJ0gPvMd9o7yrg=="
Restart the Consul agent and you’re good to go!
RPC/HTTP protocol
The same approach from Nomad is used with Consul.
We will use consul tls ca create
to generate the CA certificate instead of
cfssl
, but cfssl
can be used too.
A detailed guide is available at HashiCorp Learn: https://learn.hashicorp.com/tutorials/consul/tls-encryption-secure
Generate the CA certificate and key:
./consul tls ca create
Generate the server and clients certificate using the CA certificate:
# Consul server (repeat for each server), expire in 10 years
consul tls cert create -server -d=3650
# Consul client (repeat for each client), expire in 10 years
consul tls cert create -client -d=3650
Distribute the certificates among the nodes.
Even though auto_encrypt
exist, for the sake of keeping it simple, I didn’t
use it.
I applied the following configuration on each Consul node:
{
...
"enable_script_checks": false,
"disable_remote_exec": true,
"verify_server_hostname": true,
"verify_outgoing": true,
"verify_incoming_https": false,
"verify_incoming_rpc": true,
"ca_file": "<CA CERT>",
"cert_file": "<SERVER/CLIENT CERT>",
"key_file": "<SERVER/CLIENT KEY>",
...
}
Note: verify_incoming_https is set to false, otherwise you have to add the certificates to your browser if you want to access the UI.
Restart the consul services on all nodes:
sudo systemctl restart consul
The CLI tool to manage Consul will just work since we set
verify_incoming_https
to false
, but we have to instruct the tool that we
require the https
scheme from now on.
Add the following to your .bashrc
file and source
it:
# Set Consul HTTPS
export CONSUL_HTTP_ADDR="http://<IP>:8500"
# Source it
source .bashrc
Configuring ACL
As with the Nomad cluster, Consul access is also not restricted without ACLs. However, Consul requires some additional work in comparison to setting up ACLs in Nomad.
Enabling ACL on agents
To enable ACL on Consul agents, we have to configure it in the Consul configuration file first and restart each agent:
"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"default": "WILL ADD THIS ONE LATER",
"agent": "SAME HERE"
}
}
By configuring the default policy to deny
, we require everyone who interacts
with Consul to supply an access token.
For those who don’t supply one, the default one is used.
The default token is used for example by the DNS resolvement service of Consul.
We will configure these tokens a bit further in this post.
Note: Consul agents won’t work anymore with the current policy!
Bootstrapping the ACL system:
consul acl bootstrap
This will generate also a management
token, just like we generated for the
Nomad cluster.
This one can be used as well to access the UI and perform management tasks
using the CLI tool.
To use this token for the CLI tool, add the following to your .bashrc
file:
export CONSUL_HTTP_TOKEN=<secret id of the token>
Be careful where you store this token, everyone who can access this
token can also control your Consul cluster!
Generating tokens
Generating a token consists of 3 steps:
- Create a policy
- Create a token for this policy, multiple tokens can be generated using the same policy.
- Add the token to your agent or client
A detailed tutorial can be found on HashiCorp Learn.
Agents
Create a file consul-agent-policy.hcl
with the following content:
# consul-agent-policy.hcl
node_prefix "" {
policy = "write"
}
service_prefix "" {
policy = "read"
}
This policy allows a Consul agent to write node
related data and read
service
related data.
Now add the policy to Consul:
consul acl policy create \
-name consul-agent \
-rules @consul-agent-policy.hcl
And generate for each agent a token:
consul acl token create -description "Consul agent token" \
-policy-name consul-agent
Add the token to each Consul agent configuration as the agent
token.
Services
In my cluster setup, I use Traefik which interact with Consul to automatically configure reverse proxying and generating Let’s Encrypt certificates.
I used the following policy file:
# traefik-policy.hcl
key "traefik" {
policy = "write"
}
session "" {
policy = "write"
}
service_prefix "" {
policy = "read"
}
node_prefix "" {
policy = "write"
}
The rest of the commands are exactly the same as with agents, except that a different name and policy file is used.
DNS
Consul DNS only needs to read which services are up and running, so we have to allow Consul DNS to read Consul:
## dns-request-policy.hcl
node_prefix "" {
policy = "read"
}
service_prefix "" {
policy = "read"
}
# Only needed if using prepared queries
query_prefix "" {
policy = "read"
}
After generating the token, add it to the config file of each Consul agent as the default token and restart all agents.