Some Linux Guy

Just a blog from yet another Linux user

Overcomplicated

May 16, 2023 - 7 minute read - homelab

I finally got around to posting something to my site, and it is all about how I did everything the hard way…

So, for my day job I poke at computers with sticks a lot. (Normally they don’t poke back too much…) At home, I also have a few servers in a homelab environment, as well as a few public cloud servers. Additionally, I relocated a while back and have environments at my old and new places, connected by VPN. All of these devices (including Docker containers) are controlled by a series of Ansible playbooks.

Some of the hardware I have:

  • Edgerouter firewalls
  • Unifi switches and APs
  • IP cameras at both locations
  • A main fileserver in each location
  • SFF mini PCs (with minimal local storage)
  • Miscellaneous wired and wireless devices

Some of the miscellaneous scripts I have running (most written from scratch):

  • Archive and rotate camera video files
  • Back up critical data
  • Integrates with Home Assistant to enable/disable my internal security cameras when I leave/return home
  • A Makefile that is used to build Hugo static sites (like this one)
  • Munin plugins that report details on UPS status, printer toner levels, and Unifi networking environment
  • Build the BIND and DHCPD configuration files for both locations (built from a main template file that contains all MAC address static IP assignments, accounting for the wifi subnet local to that site)

Various Ansible playbooks currently being used:

  • Configure ISCSI targets (as well as Samba and NFS) on fileservers
  • Configure ISCSI initiators and filesystem mounts on Docker servers for each Docker container that needs one
  • Configure Docker compose files on each Docker server (and associated application configuration files, with credentials/settings/etc. pulled from Ansible variables)
  • Base server OS configurations (both public and private servers)
  • Deploy web sites and core services (both public and private servers)
  • Deploy SSL certificates for all services that need them (router web UIs, internal sites, public sites, etc.)

My ISCSI targets and initiators are configured via Ansbile variables:

  unifi_controller:
    login: unifi
    lvmname: docker_unifi
    password: <RANDOM_PASSWORD>
    fs_label: docker_unifi
    fs_folder: unifi
    user: unifi
    group: unifi
    sites: [ sjc ]

This defines everything needed to mount the data volume needed by that Docker container. These variables are used to define the ISCSI target configurations (depending on if that container should be running in that location) and configuring the ISCSI initiator and filesystem mount (depending on if that container should be running on that Docker host).

One issue with this is that if you run ansible in debug mode (e.g., using --check), and you call the open_iscsi module with discovery (e.g., with discover=yes), it will actually blow away the existing credentials on your iscsi initiator configurations. However, if you don’t perform a discovery before trying to log into the iscsi targets, it will fail. The workaround is setting an environment variable on the command line (--extra-vars iscsi_discover=true) that guards the iscsi parts of my playbook.

Most configuration files are templated with Jinja and are controlled by Ansible, with conditionals being controlled by Ansible host and group variables. These variables also control which Ansible plays are activated based on host, host group, or location (depending on need) as well as what parts of templates get written to configuration files. For example, there are lots of if statements in the docker-compose.yaml template file (i.e., every container definition is surrounded by control blocks:

{% if enable_unifi_controller | bool %}
  unifi-controller:
    container_name: unifi-controller
    environment:
      - PUID={{ uid_map.unifi }}
      - PGID={{ uid_map.unifi }}
      - MEM_LIMIT=2048
    image: linuxserver/unifi-controller
    ports:
      - 3478:3478/udp
      - 10001:10001/udp
      - 8080:8080
      - {{ port_map['unifi-controller'] }}:8443
      - 1900:1900/udp
      - 5514:5514
    restart: always
    volumes:
      - type: bind
        source: /srv/docker/volumes/unifi
        target: /config
      - type: bind
        source: /dev/log
        target: /dev/log
{% endif %}

In this case, enable_unifi_controller is set to either true or false based on Ansible group variables, so only the container definitions (and any required variables or credentials) needed for that specific Docker host are written to docker-compose.yaml:

enable_unifi_controller: "{{ true if vars['containers'][site_name]['unifi_controller'] is defined and vars['containers'][site_name]['unifi_controller'] == inventory_hostname else false }}"

In other plays and template files, it could depend on combinations of various Ansible host/group variables.

Something else I did was to make sure that each Docker host has all the same user and group IDs so that I can move a container to another host while maintaining the local user mapping. Those UID/GIDs are stored in the uid_map Ansible group variable. Also, services that have a web frontend are tracked in the port_map Ansible group variable (so that my nginx container can forward the web traffic to it).

Various Docker containers in use:

  • Local Docker registry in each location
  • Miscellaneous services (homeassistant, mariadb, owncloud, plex, unifi-controller, etc.)
  • A common Docker container that includes many of the local services I use (bind, dhcp, motion, samba, munin, nginx, etc.), launched by a common init script

My “common” Docker container contains a number of basic services in it, mainly to avoid from keeping up with a dozen different upstream containers for easily configurable/installable services. In most cases, the distro-provided version of software is sufficient, in others I use upstream repos. It is built by a ridiculously long RUN command:

  RUN export DEBIAN_FRONTEND=noninteractive; \
      export DEBCONF_NONINTERACTIVE_SEEN=true; \
      apt update -qqy && apt install -qqy --option Dpkg::Options::="--force-confnew" \
      apcupsd autoconf automake autopoint bind9 bind9-host binutils build-essential ca-certificates cron curl dnsutils \
      ffmpeg gettext git imagemagick iputils-ping isc-dhcp-server less libavcodec-dev libavdevice-dev libavformat-dev \
      libavutil-dev libjpeg-dev libmicrohttpd-dev libswscale-dev libtool libwebp-dev libzip-dev lsb-release lsof munin \
      nano nfs-kernel-server pkgconf procps samba spawn-fcgi tftpd-hpa tzdata wget x264 && \
      curl -fsSL https://packages.redis.io/gpg | gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg && \
      echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" > /etc/apt/sources.list.d/redis.list && \
      curl -fsSL https://nginx.org/keys/nginx_signing.key | gpg --dearmor -o /usr/share/keyrings/nginx-archive-keyring.gpg && \
      echo "deb [signed-by=/usr/share/keyrings/nginx-archive-keyring.gpg] http://nginx.org/packages/debian $(lsb_release -cs) nginx" > /etc/apt/sources.list.d/nginx.list && \
      apt update -qqy && apt install -qqy nginx redis-server && \
      apt -qqy autoremove && \
      apt -qqy clean && \
      rm -rf /var/lib/apt/lists/*

Additionally, I call my Ansible playbooks using a bash function that actually has auto-complete (by parsing Ansible inventory for hosts and Ansible playbooks for roles) to make it easier to tab-complete roles as well as targets:

  function ansible_role () {
    local ROLE="$1"; shift
    local NODES="$1"; shift
    local COMMAND="ansible -m include_role -a name=${ROLE}"
    for i in "$@"; do
      case $i in
        --iscsi) COMMAND+=" --extra-vars iscsi_discover=true"; shift;;
        --diff) COMMAND+=" --diff"; shift;;
        --check) COMMAND+=" --check"; shift;;
      esac
    done
    COMMAND+=" ${NODES}"
    ANSIBLE_CONFIG=/srv/ansible/ansible.cfg $COMMAND
  }

  function _ansible_autocomplete () {
    roles=$(/bin/ls -d /srv/ansible/roles/* | xargs basename -a | tr "\n" " ")
    targets=$(cat /srv/ansible/hosts | grep -vE "#|children|hosts|^-" | sed -e 's/[^a-z0-9-]//g')
    case $COMP_CWORD in
      1) COMPREPLY=($(compgen -W "${roles}" ${COMP_WORDS[COMP_CWORD]})) ;;
      2) COMPREPLY=($(compgen -W "${targets}" ${COMP_WORDS[COMP_CWORD]})) ;;
    esac
  }

  complete -F _ansible_autocomplete ansible_role

Why no Kubernetes? I didn’t start off with enough servers to make it feasible. My foray into Docker first started with a used Dell Poweredge R610 many years ago, and slowly progressed into fileservers backing SFF mini PCs running Docker (mainly due to trying to constrain excessive heat generation). Also, I used SaltStack at a former job, so the way I create, group, and call my Ansible roles is similar to the way I did with SaltStack (e.g., I never reference individual playbooks, it is alway roles that include other roles and playbooks and host groups that include other hosts and host groups).

Finally, I bundle a lot of core maintenance tasks into a single script:

  # /srv/scripts/bin/task_runner
  
  Please specify a command:
  
  Options:

    --debug         Do not run commands, just display them
    --quiet         Do not print any output at all unless there are errors
    --verbose       Increase verbosity

    --backup        Back up data
    --build         Rebuild an updated Docker image
    --camcheck      Check home/away status for launching internal cameras
    --camsync       Copy newly-recorded camera videos to permanent storage
    --camcleanup    Purge older camera videos from temporary and long-termm storage
    --sitecron      Announce public IP to public servers
    --update        Rebuild an updated Docker image and apply all updates to all servers

What’s next?

  • Still need to fully automate Let’s Encrypt (mainly need to automate the deployment of individual BIND zone file updates)
  • Experimenting with self-hosted password managers
  • Add centralized logging for everything
  • Additional metrics gathering
  • Still have not figured out how to fully automate the replacement of the SSL certificate on my Brother laser printer
  • Autodetection of IP address changes for each site (for maintaining site-to-site as well as external VPN access)
  • Need to automatically umount and/or remove ISCSI targets that are no longer configured
  • Experiment with Kubernetes in some form
  • An annoyingly long to-do list that somehow never actually shrinks…

This really does seem like a whole lot of work for just a few computers, cloud servers, firewalls, and wifi…