homelab-planning
It starts with a big git repo.
terraform → proxmox → ansible & nix/nixos → docker compose → tailscale → fun???
hardware
storage - truenas host
Intel. Lots of storage, lots of ram. Power efficient.
Dedicated “storage” for the system. Currently a random 3-disk zfs pool w/ ~6TB. Kinda slow, but perfect. 10G link to switch.
compute - proxmox host
AMD/Nvidia - Lots of ram - dedicated graphics card (2070S)
Proxmox is the main compute workhorse. Runs several docker-enabled LXC containers and a couple VMs.
LXC containers each have a docker compose “stack” - monitoring, srv (proxy, git server, file sync, etc), media (file streamin, photo upload), arr (archiver) - and some others
VMs include:
- Proxmox Backup Server - backs up all important LXC/VM volumes (this data eventually gets rsynced to the storage box)
- Home assistant
- Anything else that requires a full vm
backups — 3,2,1
- All devices make backups locally (nixos configurations, docker volumes, dotfiles in a git repo)
- All data is sent to the storage box in
- `[pool]/archives/[service])
- rsynced to
- RaspberryPI (Offsite)
- rsynced to
[pool]/library/Backups/[service]/*[pool]/library/Photos/*- rsynced to
- RaspberryPI (Offsite)
- Backblaze B2 (Cloud)
- rsynced to
- `[pool]/archives/[service])
software
tailscale
Tailscale is used as the VPN of choice. It’s let’s me connect all hosts to private network and let’s me access them from the internet, pretty easily, pretty reliably.
nixos
Nixos is a declarative OS. We’ll be utilizing flakes (NOT home-manager) to declare the os and any package dependencies yada yada.
this will be the main os for any virtualized guest running a service because we can declare the config and guarantee that it’s reproducible.
packer
Packer will build images - nixos, debian, centos to use as proxmox templates
terraform
Terraform to configure hardware as much as possible.
Starting with proxmox module. (eventually, routers, switches, etc)
terraform will write out n LXC/VMs within proxmox. Most will run nixOS.
Then there’s one of each for testing/building/whatever. They don’t boot with the machine tho.
- ubuntu (debian)
- fedora (rhel)
- arch
- windows
- macos
The nixos guests will each run a stack (docker compose? docker swarm? k3s?) of services (ctrl, srv, stats, media, arr, nvidia, etc).
proxmox
Proxmox, as a hypervisor, runs the guests, backs them up, and handles networking, configs, and other things I don’t want to manage myself.
proxmox backup server
in one of the vms, we run proxmox backup server. This will backup the other lxc/vm, utilizing a minimal amount of space thanks to it’s deduplication.
docker compose
Docker compose is resposible for running our services.
Each “stack” has a git repo with a compose.yml file defined. This compose.yml will import other to build a complete “stack”.
You’ll often find a “agents.yml” file as well which contains some common containers for monitoring or other agentic activities depending on the host — cadvisor, watchtower, portainer/gitea agent
traefik
Traefik is the reverse proxy of choice. Any containers running on the same host utilize labels to receive a domain, otherwise a config file is made per-service.
We also utilize traefik to proxy to services outside of docker as well — i.e. the truenas server at nas.[domain]
gitea
Gitea is the internal git server of choice, offering actions and agents to power most of this setup.
Everything is mirrored to a cloud provider for … reasons (github? gitlab? sourceforge)
ollama & friends
Ollama (and open-webui, comfyui) are run an nvidia-enabled guest for running local and openAI API-compatible models!
misc
- owncloud - file sync
- flame - dashboard
- coolify - internal dev/deployment/staging pipeline
- coder - internal remote dev containers
- vaultwarden - password manager server
- jellyfin & friends - media streaming
- arr & friend - internet archiving
- grafana & friends - monitoring + stats
- other misc containers that I should probably get rid of
homelab-troubleo
So, after homelab-planning, I’ve run into a couple issues, and some awesome stuff, then more issues.
I have no ssh access
Immediately after provisioning an LXC/VM with proxmox, I don’t have ssh access.
So this step, in between terraform and ansible, is still manual.
There’s a couple different ways to solve though,
SSH → SSH proxy through the proxmox host into the lxcahh bad memory, i thought that the host always has access- Cloud-init images → nada
- Packer → Seems like a lot, but all of this is a lot sooo….
- NixOS → Some way of creating custom images and uploading to proxmox
Terraform → Ansible
All of my hosts are statically defined in ansible. So, when terraform is done, ansible kinda “assumes” that everything is a-okay.
Plus, the two run entirely independently. This is alright since don’t have ssh access, but it’d be nice for terraform to run ansible-playbook -i containers:[list-of-ips] sync --tasks core,docker or something similar.
Docker Mounts
Most of my current services have a docker volume structure like this
|[srv]-data - runtime data
|[srv]-config - config data
|./[misc] - various configs i wanted saved with the stackFor example traefik
traefik-data- docker volume with runtime datatraefik-config-docker volume with config data./_certs- bind mounts with random stuff
Some static mounts are fine, these can be commited to the stack.
My original assumption was that docker volumes would be easiest to move/backup, this is not true. Files are files. Everything is a file and there’s nothing simpler than moving files around.
For the future, my plan for each host to have a “docker” mountpoint. Inside of this directory, each services will have a config and data folder, respectively.
In the stack, we’ll give an env variable like $STATIC_MOUNT for each service to use in definiing volumes.
Having the volumes separated like this allows the hosts to be spun up/down as needed and just attach the storage.
This leads me to the next semi-issue…
My storage paths are a kinda….
In the nas, I have a few main datasets
library- music, movie, books, photos and everything elsearchive- backups and whatnot (synced to B2 and the RPI)shared- per-user home directories
Proxmox mounts the library and archive directory. Archives is used for backups, holding isos, and whatever else proxmox needs.
Docker hosts will mount library, services will bind-mount as needed.
To fix the docker problem, I can create a new dataset — services — or something. This will house each services config and data dirs. This new dataset will be mounted on proxmox as nfs, then the service host can mount that and the service can bind mount. Nice.
The goal
is to have a system where I define some resources (LXC/VM) in terraform the deploys to proxmox. Once the resources are up, I then run ansible to configure them, install the stacks, boot them. When the stacks run, they have a predefined config and maybe data.
I can scale this up/down in terraform as needed.
More Homelabbing
SSH
is awesome, when you understand it. Currently it’s the bane of my existence (not really).
The problem is that these isos/lxc don’t have ssh enabled by default. There are a couple ways to solve this.
- Prebuilt isos - build you own isos/lxc-template with ssh enabled by default
- cloud-init - setup cloud-init to enable ssh with specific keys
- manually - go in the pve dashboard and enable ssh (HELL NO!)
So most likely, the best thing is a combination of the first two options.
A custom prebuilt iso with packages, users, ssh w/ public-keys enabled. Also, enable cloud-init for further configuration during provisioning.
Customize cloud-init during provisioning to further add ssh-keys and specifically set ip addresses.
IP Addresses
Currently, we have no control over the ip addressing in a vm. This is okayish —
- dhcp - i tried enabling dhcp but the problem is that now i don’t have access to the ip
- static - i set it, it doesn’t respect it — likely resolved by the ssh issue.
The plan
- Get rid of the router.
- Turn it into another docker node, (services like monitoring and backups)
- Get rid of proxmox.
- Move to docker swarm / stacks on bare metal with resource controls
- Expose a NFS share that can be accessed from docker containers and on the network.
- Sell the temporary NAS.
- Rely on shared storage from the server and cloud / offsite backups
- (Optional) Integrate a cloud VPS
Networking
- Cloudflare for domain / dns management
- Traefik for SSL termination and reverse proxy