This blogpost is a follow up on my previous post about setting up a cluster, if you haven’t read the previous one, you can find it here.

In this series of blogposts, I will explain how I configured my homeservers as a Nomad cluster with Consul as a DNS resolver for the cluster nodes and services.

Configuring a master node

Instead of using Odroid HC2s for the master node, I used a Raspberry Pi 3B+. Ideally, you use the same hardware for all the nodes to avoid that you have multiple procedures to configure the nodes of your cluster. However, I wanted to avoid that I needed to buy more hardware and I like a challenge :smile:

Install OS

Most SBC use a microSD card as disk for the OS. It’s advised to use at least a UHS-1 class 10 or higher microSD card. Use GParted or GNOME disks to format the microSD card as EXT4 with a MS DOS partition table:

GParted for formatting the microSD card

I picked Ubuntu Server as OS for the master node because Armbian doesn’t support the Raspberry Pi. This results in some additional configuration to lower the amount of wear levelling of the microSD card. Since this Raspberry Pi only has to manage the cluster and provide monitoring (see a further blogpost), we don’t need to enable zRAM.

Download the latest Ubuntu Server release for your SBC, in my case, I downloaded the official Ubuntu Server image for the Raspberry Pi 3B+ from Canonical.

The official Ubuntu Server image for the Raspberry Pi 3B+ from Canonical

Flash Ubuntu Server using dd or GNOME disks.

  1. Click on your microSD card in GNOME disks
  2. Under the 3-dots button, you can click on ‘Restore image’
  3. Select the Ubuntu Server image and click on ‘Restore’

Grab a cup of coffee, this can take some time :smile:

GNOME disks for flashing an Ubuntu Server image on the microSD card

Login over SSH

Properly eject your microSD card and put it into the SBC. Let it boot for some time and try to login over SSH:

ssh ubuntu@<IP>

If you don’t know the IP of the SBC, you can use arp-scan:

sudo arp-scan --localnet
Starting arp-scan 1.9.7 with 256 hosts (https://github.com/royhills/arp-scan)
<IP>            <MAC>                   <Ethernet interface name of the device>

Verify that you can access the Raspberry Pi over SSH and logout. Copy now your SSH key of your machine to the SBC:

ssh-copy-id ubuntu@<IP>

Try to login again as the ‘ubuntu’ user, you should not get a SSH password prompt if your SSH key is unlocked:

ssh ubuntu@<IP>

Install the UFW firewall

Most Linux distributions do not have a firewall installed and enabled by default. I am a big fan of UFW (Uncomplicated FireWall) because it is so easy to configure :smile:

sudo apt install ufw  # Install UFW from the repositories
sudo ufw allow ssh  # Allow SSH access
sudo ufw enable  # Enable firewall

Configure Ubuntu Server

Ubuntu Server for the Raspberry Pi does not come with raspi-config, an utility to configure the Raspberry Pi like the one from Armbian. So we’re here on our own to configure the Raspberry Pi manually…

Disable root login over SSH and require SSH key authentication:

sudo vim /etc/ssh/sshd_config
# Disallow password login
PasswordAuthentication no

# Disallow root login
PermitRootLogin no

# Limit SSH users to the default user:
AllowUsers <USER>

# Restart sshd
sudo systemctl restart sshd

Configure a static IP using the new Netplan syntax of Ubuntu:

# Remove cloud-init to avoid generating a dynamic IP on boot
sudo apt remove --purge cloud-init
sudo apt autoremove --purge cloud-init

# Configure netplan
sudo vim /etc/netplan/01-static-ip.yaml

# Change the YAML file to:
network:
  ethernets:
    eth0:
      dhcp4: no
      addresses:
        - <STATIC IP>/24
      gateway4: <GATEWAY>
      nameservers:
        addresses: [84.200.69.80, 84.200.70.40] # DNS servers
  version: 2

Set the hostname of the Raspberry Pi:

sudo vim /etc/hostname

Finetune motd login greeter (optional)

If you want the same login greeter look as Armbian on your Raspberry Pi, you have to configure motd by replacing /etc/update-motd/00-header with the one from Armbian.

Ubuntu Server does not install the toilet binary and fonts by default. This is used to print the name of the board as ASCII art. To install this dependency, run:

# Install toilet
sudo apt install toilet

# Copy font from Armbian to Ubuntu Server
sudo scp <USER>@<IP>:/usr/share/figlet/standard.flf <USER>@<IP>:/usr/share/figlet

I adapted the Armbian motd files to the following script for the Raspberry Pi:

#!/bin/bash
#
# Copyright (c) Authors: http://www.armbian.com/authors and Dylan Van Assche (2020)
#
# This file is licensed under the terms of the GNU General Public
# License version 2. This program is licensed "as is" without any
# warranty of any kind, whether express or implied.
#

# Pretty display
function display() {
        # $1=name $2=value $3=red_limit $4=minimal_show_limit $5=unit $6=after $7=acs/desc{
        # battery red color is opposite, lower number
        if [[ "$1" == "Battery" ]]; then local great="<"; else local great=">"; fi
        if [[ -n "$2" && "$2" > "0" && (( "${2%.*}" -ge "$4" )) ]]; then
        printf "%-14s%s" "$1:"
                if awk "BEGIN{exit ! ($2 $great $3)}"; then echo -ne "\e[0;91m $2"; else echo -ne "\e[0;92m $2"; fi
                printf "%-1s%s\x1B[0m" "$5"
                printf "%-11s%s\t" "$6"
                return 1
        fi
}


# IP address
function get_ip_addresses() {
        local ips=()
        for f in /sys/class/net/*; do
                local intf=$(basename $f)
                # match only interface names starting with e (Ethernet), br (bridge), w (wireless), r (some Ralink drivers use ra<number> format)
                if [[ $intf =~ $SHOW_IP_PATTERN ]]; then
                        local tmp=$(ip -4 addr show dev $intf | awk '/inet/ {print $2}' | cut -d'/' -f1)
                        # add both name and IP - can be informative but becomes ugly with long persistent/predictable device names
                        #[[ -n $tmp ]] && ips+=("$intf: $tmp")
                        # add IP only
                        [[ -n $tmp ]] && ips+=("$tmp")
                fi
        done
        echo "${ips[@]}"
}

# Storage
function storage_info() {
        # storage info
        RootInfo=$(df -h /)
        root_usage=$(awk '/\// {print $(NF-1)}' <<<${RootInfo} | sed 's/%//g')
        root_total=$(awk '/\// {print $(NF-4)}' <<<${RootInfo})
        StorageInfo=$(df -h $STORAGE 2>/dev/null | grep $STORAGE)
        if [[ -n "${StorageInfo}" && ${RootInfo} != *$STORAGE* ]]; then
                storage_usage=$(awk '/\// {print $(NF-1)}' <<<${StorageInfo} | sed 's/%//g')
                storage_total=$(awk '/\// {print $(NF-4)}' <<<${StorageInfo})
        fi
}

# CPU critical load
critical_load=$(( 1 + $(grep -c processor /proc/cpuinfo) / 2 ))

#####################################################################################################################

# Header
BOARD_NAME="RPi 3B+"
toilet -f standard -F metal "${BOARD_NAME}"

# OS and kernel release
. /etc/os-release
KERNELID=$(uname -r)
echo -e "Welcome to \e[0;91m${PRETTY_NAME}\x1B[0m with \e[0;91mLinux $KERNELID\x1B[0m\n"

# System status
ip_address=$(get_ip_addresses &)
storage_info
# get uptime, logged in users and load in one take
UPTIME=$(LC_ALL=C uptime)
UPT1=${UPTIME#*'up '}
UPT2=${UPT1%'user'*}
users=${UPT2//*','}
users=${users//' '}
time=${UPT2%','*}
time=${time//','}
load=${UPTIME#*'load average: '}
load=${load//','}

# memory and swap
mem_info=$(LC_ALL=C free -w 2>/dev/null | grep "^Mem" || LC_ALL=C free | grep "^Mem")
memory_usage=$(awk '{printf("%.0f",(($2-($4+$6+$7))/$2) * 100)}' <<<${mem_info})
memory_total=$(awk '{printf("%d",$2/1024)}' <<<${mem_info})
swap_info=$(LC_ALL=C free -m | grep "^Swap")
swap_usage=$( (awk '/Swap/ { printf("%3.0f", $3/$2*100) }' <<<${swap_info} 2>/dev/null || echo 0) | tr -c -d '[:digit:]')
swap_total=$(awk '{print $(2)}' <<<${swap_info})

display "System load" "${load%% *}" "${critical_load}" "0" "" "${load#* }"
printf "Up time:       \x1B[92m%s\x1B[0m\t\t" "$time"
display "Local users" "${users##* }" "3" "2" ""
echo "" # fixed newline
display "Memory usage" "$memory_usage" "70" "0" " %" " of ${memory_total}MB"
display "Zram usage" "$swap_usage" "75" "0" " %" " of $swap_total""Mb"
printf "IP:            "
printf "\x1B[92m%s\x1B[0m" "$ip_address"
echo "" # fixed newline
display "Usage of /" "$root_usage" "90" "1" "%" " of $root_total"
display "storage/" "$storage_usage" "90" "1" "%" " of $storage_total"
echo ""
echo ""

Test it out by logging out and back in, the login greeter should now look like the one from Armbian :smiley:

Install cluster software

The cluster is operated by Nomad and Consul, an alternative to Kubernetes.

Consul

Consul is responsible for resolving FQDNs of services and nodes. Consul provides a DNS service on port 8600 and a UI on port 8500. It doesn’t matter on which node you access the UI, they act as a cluster.

Create a user to run Consul and become that user:

useradd consul -m --shell=/bin/bash
sudo su consul

Downloading consul for the Raspberry Pi 3B+:

wget https://releases.hashicorp.com/consul/1.8.4/consul_1.8.4_linux_arm64.zip
unzip consul_1.8.4_linux_arm64.zip
rm consul_1.8.4_linux_arm64.zip

Verify the consul binary:

./consul -v
Consul v1.8.4
Revision 12b16df32

For high availability of the consul master node, we will run 3 instances, 1 active and 2 in standby. If the main instance crashes, the standby instances will take over. This way, downtime is avoided when consul is crashing.

Create a config file /home/consul/config.json for a master node:

{
    "bootstrap_expect": 3,
    "client_addr": "<IP>",
    "datacenter": "<DATACENTER>",
    "data_dir": "<STORAGE LOCATION>",
    "domain": "consul",
    "dns_config": {
        "enable_truncate": true,
        "only_passing": true
    },
    "encrypt": "<ENCRYPTION KEY>",
    "leave_on_terminate": true,
    "log_level": "ERROR",
    "rejoin_after_leave": true,
    "server": true,
    "ui": true
}
  • IP: IP address of the node
  • DATACENTER: Name of the datacenter to join
  • STORAGE LOCATION: Location where consul may write to
  • ENCRYPTION KEY: A symetric key used by consul agents to encrypt their traffic. You have to generate one.
  • IP CONSUL MASTER NODE: The IP address of the consul master node. The node will join the consul cluster by contacting the master node.

Now that Consul is ready to go, we can install consul as a systemd service, by creating a new service:

sudo vim /etc/systemd/system/consul.service

And add the following content with IP the IP address of the node and CONFIG FILE the path to the consul config file.

[Unit]
Description=Consul cluster leader
Documentation=https://consul.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
User=consul
Group=consul
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/home/consul/consul agent -bind <IP> -config-file <CONFIG FILE>
KillMode=process
KillSignal=SIGINT
LimitNOFILE=infinity
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
StartLimitBurst=3
StartLimitIntervalSec=10
TasksMax=infinity

[Install]
WantedBy=multi-user.target

Enable and start the service as your default USER:

sudo systemctl enable consul
sudo systemctl start consul

Nomad

Nomad can run Docker containers, Java VMs and scripts as cluster jobs. It monitors jobs, assigns them to workers and registers everything with Consul. No configuration is needed to access the services when Consul integration is enabled. The UI is available at port 4646, it doesn’t matter from which node you access the UI, they act as a cluster.

First, create a user to run Nomad as your default USER and become that user:

useradd nomad -m --shell=/bin/bash
sudo su nomad

The installation of Nomad is almost the same as Consul, download the binary and verify it:

wget https://releases.hashicorp.com/nomad/0.12.4/nomad_0.12.4_linux_arm64.zip
unzip nomad_0.12.4_linux_arm64.zip
rm nomad_0.12.4_linux_arm64.zip
./nomad -v
Nomad v0.12.4 (8efaee4ba5e9727ab323aaba2ac91c2d7b572d84)

To make sure that our Nomad cluster leader has no down time, we use here again 1 active instance and 2 standby instances.

Create a Nomad config /home/nomad/config.hcl and add the following content with IP the IP address of the node, <STORAGE LOCATION> where Nomad may write to, <DATACENTER NAME> name of the datacenter and NOMAD MASTER NODE with the IP address of the Nomad master node.

# Increase log verbosity
log_level = "ERROR"

# Setup data dir
data_dir = "<STORAGE LOCATION>"

# Datacenter name
datacenter = "<DATACENTER>"

# Let the server gracefully exit after a SIGTERM
leave_on_terminate = true

# Enable the server
server {
    enabled = true

    # Self-elect, should be 3 or 5 for production
    bootstrap_expect = 3
}

# Prometheus configuration
telemetry {
    collection_interval = "5s"
    disable_hostname = true
    prometheus_metrics = true
    publish_allocation_metrics = true
    publish_node_metrics = true
}

# Consul configuration
consul {
  address             = "<IP>:8500"
}

Now that Nomad is ready to go, we can install consul as a systemd service, by creating a new service:

sudo vim /etc/systemd/system/nomad.service

And set CONFIG to the path of the Nomad config file.

[Unit]
Description=Nomad cluster leader
Documentation=https://nomadproject.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
User=nomad
Group=nomad
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/home/nomad/nomad agent -config <CONFIG>
KillMode=process
KillSignal=SIGINT
LimitNOFILE=infinity
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
StartLimitBurst=3
StartLimitIntervalSec=10
TasksMax=infinity

[Install]
WantedBy=multi-user.target

Enable and start the service as your default USER:

sudo systemctl enable nomad
sudo systemctl start nomad

Bringing the cluster online

Now that we both have 1 master and 1 worker node, we can bring up the cluster.

  1. Make sure that all SBCs are turned off: sudo poweroff
  2. Connect all Ethernet cables with the Ethernet switch
  3. Boot all SBCs, the order doesn’t matter since the worker nodes keep trying to find the master node every couple of seconds.
Our test setup: 1 worker and 1 master node. The 2nd worker will be added later on.

The cluster won’t work because the UFW firewall blocks the cluster traffic, let’s change that:

# Nomad traffic
sudo ufw allow 4647
sudo ufw allow 4648

# Nomad UI
sudo ufw allow 4646

# Consul traffic
sudo ufw allow 8300
sudo ufw allow 8600
sudo ufw allow 8300
sudo ufw allow 8301
sudo ufw allow 8302

# Consul UI
sudo ufw allow 8500

If everything goes well, you should be able to get the Nomad and Consul UI running by opening the following links in your browser:

  • Nomad UI: <IP>:4646
  • Consul UI: <IP>:8500
Nomad UI showing the status of clients (worker nodes) and servers (master nodes)
Consul UI showing the status of the services and the nodes