Building a cluster part 6
This blogpost is a follow up on my previous post about setting up a cluster, if you haven’t read the previous ones, I strongly suggest to read them first:
In this series of blogposts, I will explain how I configured my homeservers as a Nomad cluster with Consul as a DNS resolver for the cluster nodes and services.
This cluster is monitored using Prometheus and Grafana. This allows me to see in detail which nodes are operational, how high the workload is, etc.
Enabling telemetry in Nomad
Nomad doesn’t expose the Prometheus telemetry data by default. We can enable this functionality by editing the configuration file of each Nomad agent you want to monitor.
Add the following stanza to your configuration file:
telemetry {
collection_interval = "5s"
disable_hostname = true
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
And restart the Nomad agent:
sudo systemctl restart nomad
The metrics are now available at: https://<IP>:4646/v1/metrics?format=prometheus
You can test it out using curl
which should return a JSON object with the
measured metrics.
Setting up Prometheus
Download Prometheus for ARM from the Prometheus download page: https://prometheus.io/download/
Create the following configuration file:
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['leader:9090']
labels:
group: 'production'
- job_name: 'nomad'
scrape_interval: 5s
metrics_path: '/v1/metrics'
tls_config: # TLS certs we configured previously for Nomad
insecure_skip_verify: true
scheme: https
params:
format: ['prometheus'] # Specify ?format=prometheus
static_configs:
- targets: ['<IP 1>:4646', '<IP 2>:4646'] # Specify nodes here, you can also use Consul services
labels:
group: 'production'
Add a systemd service file (/etc/systemd/system/prometheus.service
) to run
Prometheus at boot:
[Unit]
Description=Prometheus Time Series Collection and Processing Server
Documentation=https://prometheus.io/docs/prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=<INSTALL PATH>/prometheus \
--config.file <CONFIG PATH> \
--storage.tsdb.path <STORAGE PATH> \
--web.console.templates=<INSTALL PATH>/consoles \
--web.console.libraries=<INSTALL PATH>/console_libraries
KillMode=process
KillSignal=SIGINT
LimitNOFILE=infinity
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
StartLimitBurst=3
StartLimitIntervalSec=10
TasksMax=infinity
[Install]
WantedBy=multi-user.target
And enable it:
sudo systemctl enable --now prometheus.service
Prometheus should be available at: <IP>:9090
in your browser.
Setting up Grafana
Install Grafana using a PPA and APT:
# Add stable PPA
sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
# Install
sudo apt update
sudo apt install grafana
A systemd service file will be installed together with Grafana, you can edit the service file if you want to use a different configuration path than the default one. You can find a complete installation guide in the Grafana docs.
If you surf to <IP>:3000
with a browser, you will get the Grafana login screen.
Grafana will ask you to change the admin password, the default login is:
- Username: admin
- Password: admin

Go to Settings > Data sources > Add data source. Select the Prometheus data source and fill in the IP and port of Prometheus.

Now you can play around and add dashboards, panels, etc. You can find more information here: https://grafana.com/docs/grafana/latest/getting-started/getting-started/
My dashboard looks like this:

Configuring Grafana email alerts
Now that you have your dashboard running, you can also add Grafana alerts to graphs and send alerts to your inbox when something goes wrong.
To enable email alerts, we have to configure a SMTP server for Grafana. This can be done by adding the following to the Grafana configuration file:
[smtp]
enabled = true
host = <SMTP SERVER IP>:<SMTP SERVER PORT>
user = <USERNAME>
password = <PASSWORD>
from_address = <EMAIL TO USE>
from_name = <NAME SENDER>
ehlo_identity = <EMAIL>
startTLS_policy = <TLS ENABLED?>
Examples and explanation of each configuration parameter can be found in the documentation of Grafana: https://grafana.com/docs/grafana/latest/administration/configuration/#smtp
Now go to Alerts > Notification channels and configure a new notification channel with Email as type and add the email addresses to which Grafana must send your alerts. You can send a test alert as well from this page to make sure that the configuration is working.
If you open now a panel with a Graph as visualization, you can click on the Alert tab and add a new alert. Currently, Grafana can only add alerts to a Graph visualization and only a single alert per panel.
Once you configured the trigger rule and the notification message, click on apply and wait a bit. Grafana will trigger a notification (by default) if the rule is triggered for more than 5 minutes. This behaviour can be changed by editing the Alert’s rule ‘For’ parameter.

