Leveling Up My Homelab

(cweagans.net)

95 points | by cweagans 5 days ago ago

76 comments

  • tennysont 19 hours ago ago

    For those not keeping count, total hardware spend is in the 13k-20k USD ballpark, by my count.

    The thing that I like about this post is that it touches on many of the difficulties of running a homelab with many physical hosts. You might not need all or most of this setup, but at least you have an idea of how this particular design (a decent one!) scales after reading this.

    - Array of off-the-shell compute servers with console access + networking + power

    - ArgoCD + GitOps makes a K8 cluster declarative

    - Talos makes the physical hosts that provide the K8 cluster be declarative

    - Dedicated machines for storage, Control Plan, and Networking isolate the infrequently-changing stateful parts

    This homelab does seem compute focused, which might be right for OP but is normally a mistake that people make when they build their first homelab. I'm wondering what OP's home internet bandwidth is. It seems odd to have so much compute behind a network bottleneck unless OP has high-compute-small-output workloads (ml training, data searching/analysis, NOT video encoding)

    A single machine with a lot of CPU and a LOT of memory/storage is typically what people want---so that projects they're setting up are fast and having lots of old/idling projects is fine. My old recommendation was a mini-ITX with 128 GB of ram and a modern AMD cpu should take most people very far. Perhaps a smaller NUC/Beelink computer if you're not storage hungry.

    However, keep in mind that a single machine will make it hard to tinker with large parts of the stack. It's harder to play with kernel mod settings if you need to constantly reboot all your projects and possibly nuke your lab. It's harder to test podman vs docker if in involves turning off all your contains. A more complicated home-lab gives you more surface area to tinker with. That's both more fun and makes you a better engineer. Of course, you can get most of this experience for far less money if you budget isn't quite as generous.

    I personally prefer a digital nomad aesthetic, so I focus on small & simple on-prem hosts paired with cloud stacks. I'm willing to pay a premium on compute to have less commitment and physical footprint. I've been considering setting up a K8 cluster on Hetzner dedicated machines. In my case, that Mini-ITX box is actually a storage-optimized ATX build for backing up my laptop (daily-driver) and media server.

    • Aurornis 15 hours ago ago

      > This homelab does seem compute focused, which might be right for OP but is normally a mistake that people make when they build their first homelab.

      I kept waiting for the description of what it would be used for, but there was only a passing reference to learning how to run AI workloads.

      For some people, buying and assembling hardware is the hobby. This gets old fast, especially when you add up how much was spent on hardware that's now sitting idle while it becomes more outdated year over year.

      I agree that for typical learning cases the best solution is a single, cheap consumer CPU paired with a lot of RAM. For the $8000 spent on those 8 mini PCs, you could build a 256GB RAM box with 2 or even 3 nVidia 5090 GPUs and be in a different league of performance. It's also much easier to resell big nVidia consumer GPUs and recoup some of your money for the next upgrade.

      It does look fun to assemble all of this into a rack and make it all work together. However, it's an extremely expensive means to an end. If you just want to experiment with distributed systems you can pair 128GB of RAM with a 16-core consumer CPU and run dozens or even 100 small VMs without issue. If you want to do GPU work you can even use PCIe passthrough to assign GPUs to VMs.

      • cweagans 10 hours ago ago

        > I kept waiting for the description of what it would be used for, but there was only a passing reference to learning how to run AI workloads.

        Future posts will address some of this. :)

    • cweagans 10 hours ago ago

      > For those not keeping count, total hardware spend is in the 13k-20k USD ballpark, by my count.

      Yep! Right around the $13.5k mark.

      > This homelab does seem compute focused, which might be right for OP but is normally a mistake that people make when they build their first homelab.

      Very compute focused for the specific projects that I intend to work on.

      > I'm wondering what OP's home internet bandwidth is. It seems odd to have so much compute behind a network bottleneck unless OP has high-compute-small-output workloads (ml training, data searching/analysis, NOT video encoding)

      1gbps symmetric + a failover Starlink connection. Not a ton of large output workloads at the moment.

      > However, keep in mind that a single machine will make it hard to tinker with large parts of the stack.

      Very much in agreement here. This is one of the reasons I went with multiple machines.

      > I'm willing to pay a premium on compute to have less commitment and physical footprint.

      I also like this mindset, but my other hobbies are piano (and I'm very sensitive to the way that the keys are weighted, so I prefer playing a real grand piano vs a portable/mini electronic piano) and woodworking (even more bulky equipment), so I'm already pretty committed to putting down roots.

    • scuff3d 16 hours ago ago

      Sorry if this is a naive question, but with a bunch of power packed into a single device couldn't you do a lot of experimentation in VM?

    • rcarmo 18 hours ago ago

      The SER 9 workers in particular feel like they're overly beefy. I run 4-5VMs inside Proxmox in a single equivalent box...

  • buran77 a day ago ago

    > Storage UniFi UNAS PRO: (replacing the Synology NAS) 7x 8TB Seagate IronWolf

    With the way Ubiquity has treated their software stack on the network side in the past years (major bugs, regressions, and updates that had to be reissued multiple times), I wouldn't trust them with all my data. Ubiquiti's QA was outsourced to the customers and a NAS is the last place where I want to risk bad updates, no matter how many backups I have.

    • tecleandor a day ago ago

      Same with Synology. I would directly jump to a TrueNAS install, or a BSD/Linux install with all the needed software.

      • buran77 18 hours ago ago

        Synology has shady business practices more recently and outdated tech stack since forever but I haven't heard anything particularly bad about reliability. For a NAS the safety of the data is the highest priority. Anything that endangers the data isn't just a drawback, or a "con", it's an instant elimination. Right now I wouldn't trust Ubiquiti NAS to store my recycle bin, I need to see a long track record of reliability and a commitment to quality.

    • cweagans 11 hours ago ago

      This is definitely one of the purchasing decisions that I regret. My backups are robust and trustworthy enough that I don't have data loss concerns, but the software is atrocious and the customizability is extremely limited.

      e.g. I wanted to serve tftp directly from the NAS. I can log in and `apt install tftpd-hpa`, but that package has to be reinstalled every time the NAS updates.

      I'll be replacing this in the medium term, but I'm not buying more hardware for a little while lol

  • grim_io a day ago ago

    This exceeds the computing requirements of most IT related companies, by far.

    Cool, but nothing a single compute machine wouldn't get done with a bunch of VM's if learning and home workload are the focus.

    This thing probably idles at a couple hundred watts.

    • spicyusername 21 hours ago ago

      Yea, but, like owning a car you soup up in your garage, it isn't about what you _need_, it's about what's fun and what's enough to give you something to do in your free time.

      • grim_io 20 hours ago ago

        All good, as long as there is self-awareness that it's not built on a "need", but on a "want" basis.

        It should be obvious to the reader that this is very much overkill, even for the stated goals of expandability and learning.

        • cweagans 11 hours ago ago

          How do you know if it's overkill without knowing anything about what I am/will be running on it? Certainly, there are projects for which this setup might be appropriately sized, wouldn't you think?

          Incidentally, this setup does leave me some significant headroom in terms of compute resources, but that's by design.

          • grim_io 9 hours ago ago

            People don't usually buy this much compute for running a mail server or a media transcoder, so... probably AI? :)

            If it's inference, don't expect great performance or cost-effectiveness.

            But if you learn a lot during it, I wish you all the best!

      • Aurornis 15 hours ago ago

        If we're going with the car analogy, this is like buy 8 Miatas to keep in your driveway instead of combing all of the money into a single much faster car.

        If the goal is to have a lot of something so you can play with many different things, this gets the job done. If the goal is high performance and maximum efficiency for learning or compute, a setup with dozens of smaller computers like this is usually not the optimal choice.

  • NoiseBert69 a day ago ago

    I down-leveled my homelab due to energy costs.

    It now only consists of a Intel n100 with a big SSD and 32GB RAM running Proxmox. These China TopTon-boxed with their 5x Intel i226-IV network cards are great and can be passively cooled.

    Every night the Proxmox makes a backup onto a Raspberry Pi which runs the Proxmox Backup Server.

    • rcarmo 18 hours ago ago

      Yup. Mine rates 80W on idle, but it has _two_ NAS boxes, a pair of N150s, a Ryzen APU Steam server, and a beefy i7 with 128GB RAM and a 3060 (which of course is the one burning the most watts in use). Most of it is running Proxmox and backing up VMs to the Synology (which I've demoted to dumb storage with a few Docker containers since they started burning bridges with their customers).

      • NoiseBert69 14 hours ago ago

        A friend of mine has one of these Enterprise CPUs with many many cores. He sends the entire box to sleep after CICD jobs and wakes them up with WoL.

        • rcarmo 13 hours ago ago

          That is exactly what I do with both the GPU and Ryzen boxes (pro tip: Steam Link will do WoL and wake up your gaming box remotely from both Android and iOS clients - works great on Bazzite)

    • n4bz0r 20 hours ago ago

      I've read that PBS requires fast (NVME-fast) storage and a decent CPU to handle incremental backups efficiently. What speeds do you get when restoring backups?

      • mysteria 17 hours ago ago

        HDDs work fine for me with PBS, and I get regular 120mb/s HDD speeds when restoring backups. Honestly how often do you restore backups anyways?

        • prmoustache 16 hours ago ago

          Well you should do that frequently, at least to make sure you have restorable ones.

      • NoiseBert69 14 hours ago ago

        Using BTRFS.

        Just reading of the logs:

        - Backup duration: 3.21GiB in 36s

        - Restoring a Snapshot: feels like <<1 minute

    • mindrunner 20 hours ago ago

      this. the energy costs have made me downscale to micro pcs

    • sgc 20 hours ago ago

      Is that also running a router like opnsense, or do you have a dedicated box for that? Curious about specs for networking gear...

      • NoiseBert69 14 hours ago ago

        Shabby 10 year old Fritzbox is my home LAN. It plays DHCP-Server, VDSL-Modem and Firewall.

        Next step will be most likekly a VDSL-modem + something from Ubiquiti as the new Fritz! product portfolio is... a weird mess.

    • iamshs 20 hours ago ago

      Space and energy costs are a factor for me too. My current Mac Mini SMB is really good but DAS consumes lot of power. Ideally, I would really love Mikrotik's RB5009UPr+S+IN next iteration to have antennas and a 4-Bay Rose Data Server merged. Gateway, Switch, Access Point, PoE for CCTV and NAS - all in one.

  • evanreichard 12 hours ago ago

    This year I redid my whole lab as well and prioritized declarative everything so it'd be easy to go from scratch if needed. So I've got nix to create RKE2 images with cloud-init for XCP-ng templates, Terraform with XenOrchestra provider, and ArgoCD for the K8s cluster.

    I've got two R640's so I can live migrate, and an R720XD with TrueNAS (democratic-csi for K8s persistence). QSFP (40Gb) for TrueNAS / R720XD, and SFP+ (10Gb) for R640's linked to a Brocade ICX 6610.

    So I can update the hosts, and K8 nodes with 0 downtime. Do I need it? No, but I learned a lot and had / have fun deploying and maintaining it.

  • tietjens 20 hours ago ago

    Enjoyed this. Lots of large homelabs like this are built in order to stream video or local llm models and that usually leaves me feeling a bit left out because I have been building my own, but have no interest in either of those things.

    Some services I am interested in are hosting my own RSS feed reader, an ebook library, and a password manager. But I'm always looking for more if there are any suggestions.

    • danparsonson 19 hours ago ago

      Well there's loads - Nextcloud/Syncthing to replace Dropbox, Forgejo to replace Github, Joplin on the back of Nextcloud or similar for note taking.... personal wiki, todo list, email with webmail (Roundcube), home video camera management, etc etc

      Check out https://github.com/awesome-selfhosted/awesome-selfhosted

    • marcuskaz 19 hours ago ago

      > I am interested in are hosting my own RSS feed reader, an ebook library, and a password manager

      You can do that on a Raspberry Pi Zero for $15, and for $12 you can get a 128gb microsd card, plenty of storage. It'll take up minimal power and fit in an Altoid tin.

    • rcarmo 18 hours ago ago

      You can host all of that on an N100 mini-PC in oh... around 4GB RAM, based on the equivalent footprint of the Docker/LXC combos I use for those.

  • parkuman a day ago ago

    I love the astronomical jump in compute. Mac Mini and mini PC not enough? GET 8. Love it!

    I’m in the middle of my first homelab journey with just a single mini PC with 8GB of RAM. Excited to one day get to this level.

    • wltr a day ago ago

      Yeah, I started with an obsolete PC a decade ago, and now I’m on a Raspberry Pi 2 and a couple of Orange Pis (zero, 1st gen). Weirdly, it’s enough for me now. I self-host not too many things as of now. So I guess I’d grow at least a couple of Mac minis over time too. However, I’m trying to go slowly there. Funny thing is that despite me admitting that I’m more likely to arrive there too, each time I see a superlab, I wonder ‘what do you guys do with all this?!’

  • Arn_Thor a day ago ago

    There are certain tells when anyone with a homelab starts making real money off the work they do on it

    • cweagans 11 hours ago ago

      Regrettably, I have not made money from my home lab.

      Yet. :)

  • MASNeo 21 hours ago ago

    This is awesome and I wish more of this were happening. Hardware home labs are the best way to learn. I gained most of my Linux/FreeBSD skills at home.

    It feels with cloud computing a generation of computer scientists kind of missed out on the experience.

    • tietjens 20 hours ago ago

      Cloud by day, home lab by night.

  • ocharles a day ago ago

    I'm curious how much this costs to run. I.e., how much are you paying for electricity?

    • cweagans 11 hours ago ago

      I'm currently sitting at < 200W, but I expect that to go up with a higher workload. The SER9s idle at 5-7W, but they can run at 50-60W sustained without thermal throttling. Some reviewers have claimed that they can run at 75-80W sustained for 10-15 minutes, but I think that's pretty unlikely.

      We have pretty reasonable power rates here (https://www.idahopower.com/accounts-service/understand-your-...), so ~$12-20 per month depending on tier.

    • chasely 14 hours ago ago

      Not sure about this one in specific but assuming most of the time the system is idle at ~200W, you’d be looking at ~$25/mo for most states in the US.

      Peak draw could probably be 2kW for a beefy system so electricity costs could really skyrocket depending on usage patterns.

      • cweagans 11 hours ago ago

        Peak draw for the entire cabinet with everything running full bore cannot/will not exceed 1800W (by design).

  • nathan_douglas 12 hours ago ago

    Oh hai Cameron!

    I'm rebuilding my homelab [1] too, actually, but deprioritizing it while I stave off a wee spot o' burnout.

    I'm excited to see that you're building on Talos; I am too! I used to use CoreOS back in the day, like 8-9 years ago or smth, on PXE-booted VMWare VMs, and I've always missed that cleanliness.

    That's a large part of why I'm rebuilding right now - I based everything around Ansible + Terraform, and that's workable of course but working iteratively on a homelab leaves so much cruft around, can lead to incidental complexity, etc etc etc.

    Anyway, I'm pumped to keep reading!

    [1] https://clog.goldentooth.net/

    • cweagans 11 hours ago ago

      Hey hey! Good to see you here!

      Talos is absolutely incredible. There's a learning curve to it, but it's not as steep as it seems.

      I started with Ansible, but found myself getting really annoyed at the amount of YAML I was shoveling around for basic things, so I ended up writing a series of bash scripts that rsync files out to where they need to go, run actions when a particular file changes, etc. Provisioning scripts are bundled up and piped to the remote machine over SSH. It's been pretty nice. I'm thinking about building that out into a separate project somewhere.

      I'd love to check out what you're working on! The link seems to be broken though.

      • nathan_douglas 8 hours ago ago

        Talos (and Talhelper) seem pretty reasonable so far. Digging Sops too for managing secrets. I was using Ansible Vault before which worked but was weirdly cumbersome to automate (go figure), and Sops seems to Just Work™.

        > The link seems to be broken though.

        Yeah, I'm a world-class infra engineer. smdh. Changed how the DNS record was created but didn't push my changes so they were reverted by a scheduled job facepalm

        Think it's back now...

  • mysteria 18 hours ago ago

    It might be fun to have so many machines, but in reality it's simpler and cheaper to virtualize everything on two or three powerful hosts. Considering that you're using a soundproof rack already you might as well go with used rack servers with lots of memory and compute. Those also come with goodies like BMCs, dual PSUs, and ECC.

    Personally I have two Xeon rack servers running in a Proxmox cluster with a SBC Qdevice. It has more than enough memory and compute for my needs and it also serves as my virtualized router and NAS. The whole setup only takes up 4U of space (servers + switch + modem/qdevice) with a single UPS on the floor, and idle power is around 150W.

    • cweagans 11 hours ago ago

      We may be optimizing for different needs. For instance, while I was able to get a significant amount of extra height, I didn't have a lot of cabinet depth to work with, which is somewhat limiting for traditional server hardware. There are short-depth options out there, but I also wanted at lease some GPU capability. The integrated GPUs in the SER9s are not top of the line by any means, but they're more than capable for what I want to be working on.

  • bkummel a day ago ago

    If you ask me, this is not a Homelab anymore.

  • thelastgallon 19 hours ago ago

    I'd like to replicate this (a smaller version) but with ECC RAM. It is hard to find mini PCs (or laptops) with ECC RAM support.

    • fmajid 18 hours ago ago

      HP and Lenovo have SFF or UCFF workstations with ECC support like the Z2 Mini G9 or P3 Ultra SFF.

    • wltr 14 hours ago ago

      Serious question: why would you need one for a home usage? I mean, you can easily reboot it nightly, and nobody will ever notice. Unless your personal server is used by people across the continents, of course. But even then, my SBC reboots within 20 seconds or something. Not really noticeable.

  • master_crab 14 hours ago ago

    Did i read one control plane node and 8x workers for the K8s cluster? What happens when the control plane flakes?

    I run 4 thin clients and only one of them is a worker. The rest are untainted control plane nodes in HA.

    • cweagans 11 hours ago ago

      You read that right. Currently, the DR plan is "replace/repair the machine and bring everything back up from a backup". It's not a good plan, but it's also only the short term plan. Longer term, I'll likely add another control plane node or two.

      • master_crab 11 hours ago ago

        Run three. Helps with quorum.

  • globular-toast a day ago ago

    Seeing the final picture made me think of something: one of my "life hacks" is to not accept cables that are too long. I used to think the longer the better, and just coil it up or something. But long cables just get in the way and collect dust etc.

    If something is going to be considered permanent, cut that cable down to length. Either buy shorter moulded cables, or learn to wire cables yourself. Too often have I left a cable long "just in case" only for it to get in the way for years to come.

    For patch cables it's easiest and best to buy moulded cables that fit your rack. For things like power cables (extension leads etc.) it's easiest to wire them yourself (at least, in the UK where our plugs are designed to be wired by anyone).

    • cweagans 11 hours ago ago

      I am very much on board with this line of thinking. Because things are still somewhat in flux, it was much easier to plan for excess cabling and have a place for that cabling to live in the rack so that things can be moved if needed. I'll probably re-cable it with cut-to-length cables in the future.

      One thing that I haven't found a solution for though: I have a lot of USB and HDMI cable coiled up behind the Beelink boxes (for KVM connectivity). I've found the normal length cables (1', 3', 6', etc), but I haven't been able to find custom length cables for those specific connections. Do you happen to know anywhere I can find those?

    • wltr 14 hours ago ago

      On the other hand, I did this, cut my cables, and after I needed to reorganise things slightly, it was very difficult, even a centimetre was at luxury. Also, when I need to move a computer for some reason, there’s no room, at all. These days, I’m trying to leave at least some extra cm (usually that’s an inch or two, depending on their location) for that. I’d do very tight cable cut only for situations when I’m super sure nothing will ever move. Again, even then, I’d rather leave some extra inch, just in case.

  • 000ooo000 9 hours ago ago

    How are you approaching heat management inside the soundproof closet?

  • Havoc 18 hours ago ago

    Fancy & pretty high end!

    Confused as to why 10x nodes but running single control node and no HA?

    Control nodes can be pretty light - even down to raspberry 4s

  • timwis a day ago ago

    Wow, a beast. Looking forward to the next post!

  • bediger4000 4 days ago ago

    That looks and sounds great. Good for you!

  • Mistletoe 20 hours ago ago

    What do people do with homelabs? I’ve always wondered. Is it anything of consequence or is it just like making a roadster in your garage?

    • import 19 hours ago ago

      I have two machines and NAS. Running around 50 docker containers. Including home assistant, adguard, wireguard, Vscode server, rss reader, change detection, romm, metube,vaultwarden, immich, caldav carddav server and bunch of custom applications I am working on. (Also hosting some websites).

      It’s so much fun and helps me to own my data.

    • ivanjermakov 19 hours ago ago

      I run a home server but it is a single post-lease Dell mini PC I got for $50. The only times it goes above 5% CPU/GPU usage is when I'm building a project to deploy or transcode video.

      I'm sure nobody needs this much compute for personal use (24/7), so roadster in a garage is a good analogy.

  • PeterStuer 14 hours ago ago

    Honestly, I have had 'has a homelab' as one of my mental checkboxes for great hires for a long time.

  • 20 hours ago ago
    [deleted]
  • louwrentius a day ago ago

    It’s always fun to see how people build their home labs but I went exactly the opposite route.

    I focused on energy consumption, because of cost and - gasp - wanting to be mindful about it given the current predicament.

    Anything that needs to be on 24/7 is on a Pi, and anything that consumes more power is turned off by default (remote poweron possible).

    For me at home there is zero need for redundancy and I use a cluster of four tini-mini-micro 1L PCs for my lab work. There are also turned off by default and are also low-power.

    • bionsystem 21 hours ago ago

      That's what I want to do as well, Pis running a vpn and some wake-on-lan scripts for a couple of old boxes (I will probably run SmartOS on those). Anything to share about your setup ? (blogs/gits that you wrote or followed)

      I also got some cloud credits from my employer, but a bit paranoïd about putting my data there (although most of it isn't sensitive).

    • g-clef 21 hours ago ago

      It's funny, I started with rPis for the same reason, but I'm about to replace them. I bought 20 rPi 4Bs for my homelab, and I just couldn't get them to do what I needed. I was looking to run a home k8s cluster and the Pis were just not suited to it at all (don't use sd storage for k8s 'cause it'll burn out the card w/writes, booting off usb was unstable even with powered usb hubs, netboot turned into an enormous pain in the neck).

      • emilburzo 19 hours ago ago

        If you have another machine with a SSD or at least a fast-ish HDD and want to give it another go, you could try running k3s with an external datastore (e.g. postgres).

        That's the setup I've been using on 3 x rPi since 2021 and I'm super happy with it as I can host all my own personal projects, some OSS ones (changedetection, n8n, etc), even longhorn for distributed storage and still have capacity left -- and this with just microSDs (to be fair, A1s, but still).

      • louwrentius 18 hours ago ago

        Yes, I moved to 1L PCs for lab work.

        https://news.ycombinator.com/item?id=40697831