“Kubernetes is overkill for a homelab.”
I hear this often from engineers who haven’t run production infrastructure at home. After creating and maintaining a K8s cluster that serves my family’s digital life and supports my ML research, I’ve come to realize that the question isn’t whether K8s is overkill—it’s whether you want to think like a platform engineer or remain a container user.

Why This Matters#
At work, I architect ML pipelines for medical imaging systems, manage multi-region AWS deployments, and make infrastructure decisions that affect product reliability. My homelab isn’t separate from this work—it’s where I validate architectural patterns, test failure scenarios, and develop the operational intuition that informs production decisions.
The gap between engineers who understand distributed systems and those who don’t isn’t knowledge-it’s operational experience. You can read about failure modes, but until you’ve debugged an outage at night because users can’t access photos, you don’t truly internalize how these systems behave under stress.
The Real Value Proposition#
Most engineers approach homelabs as learning environments. I think about mine as a production system that happens to run at home. This mindset shift changes everything.
When you treat your homelab as production, you implement monitoring not because a tutorial told you to, but because you need to know when services degrade. You design for failure recovery not as an academic exercise, but because restoring hundreds of thousands of photos and videos from backup is painful enough that you’ll architect it correctly the first time.
This operational rigor translates directly to work. The infrastructure patterns I validate at home—GitOps workflows, secrets management, storage orchestration—are the same patterns I implement in production medical imaging systems where downtime affects patient care. The confidence that comes from running these systems 24/7 makes me a better architect.
Beyond Container Orchestration#
K8s isn’t about running containers efficiently. It’s about building platforms that abstract infrastructure concerns from application concerns. This is the senior engineer perspective that separates tactical from strategic thinking.
In my homelab, I don’t think about deploying Plex or Immich or my MLflow tracking server. I think about how the platform handles persistent storage, how Cloudflare Tunnels route traffic securely, how backup systems protect state. Applications become declarative intent; the platform handles execution.
This platform thinking has fundamentally changed how I design systems at work. When architecting our medical imaging pipeline, I focus on building abstractions that let engineers deploy models without understanding ECS task definitions or IAM policies. The platform handles it—the same philosophy I’ve refined through years of homelab iteration.
Production Patterns at Home#
My homelab runs services that people depend on. Plex serves high-definition content to several active users. Immich replaced Google Photos for multiple users across my family and friends, managing hundreds of thousands of photos and videos. My MLflow instance tracks experiments for ML research. Development environments, GitLab runners, and various internal tools round out the cluster.
These aren’t toy applications-they’re production services with real users and real consequences for failure. With multiple users relying on my infrastructure, I’ve had to implement the same reliability patterns I use at work: automated backups with tested restore procedures, monitoring and alerting before users notice degradation, and high availability configurations to survive node failures. When something breaks, I can’t file a support ticket. I have to understand the system deeply enough to diagnose and fix it quickly.
This forces you to think about reliability engineering. Immich stores irreplaceable family memories, so I run automated regular backups to NFS storage with periodic verification restores. Plex needs consistent uptime for family movie nights, so I’ve configured pod anti-affinity to spread replicas across nodes. Monitoring alerts me to storage capacity issues and service degradation before anyone complains. These aren’t academic exercises-they’re operational requirements.
Over the years, I’ve dealt with storage controller failures, network partition scenarios, and countless other failure modes. Each incident built operational muscle that’s impossible to develop in managed cloud environments where someone else handles the hard parts.
The K3s Choice#
K3s maintains full Kubernetes API compatibility in a fraction of the resource footprint. It can run on as little as 512MB RAM where full K8s typically needs 2GB or more per node. Installation is a single binary with no kubeadm complexity. It includes a local storage provider out of the box, and integrates cleanly with external ingress solutions.
More importantly, K3s forces architectural discipline. With limited resources, you can’t be sloppy about requests and limits. You think carefully about what runs where. These constraints teach you to build efficient systems—lessons that apply even when you have unlimited cloud resources.
I’ve run both full K8s and K3s in different contexts. For homelab use, K3s is the right choice unless you specifically need to practice kubeadm for certification or your work uses distributions like OpenShift that diverge from upstream.
What You Actually Learn#
The technical skills are obvious—deploying multi-tier applications, configuring ingress and service mesh, managing persistent storage with PV/PVC and CSI drivers, implementing RBAC and network policies, setting up Prometheus and Grafana for observability. You learn Terraform for infrastructure, Helm for packaging, GitOps for deployment workflows.
But the deeper value is operational maturity. You develop intuition about system behavior. You understand resource contention not from documentation but from debugging performance issues. You internalize how networking layers interact because you’ve traced packet flows through CNI plugins, service meshes, and ingress controllers.
You learn to think in layers of abstraction. Applications sit on Deployments sit on ReplicaSets sit on Pods sit on container runtimes sit on nodes sit on hypervisors. When something breaks, you know which layer to investigate. This systems thinking is what distinguishes senior engineers from those who only understand the layer they work in.
The Investment Analysis#
My current setup consists of three PCs with more than enough compute to run my workloads comfortably. I use a compact GeeekPi server rack that’s surprisingly elegant for a homelab. The rack is tiny enough to fit in any room without looking like a datacenter, yet it’s modular and extendable for future growth. As my needs evolve, I can scale up to a genuinely powerful cluster without replacing the infrastructure. I’ll cover the hardware selection and rack setup in detail in a future post.
Compare this to managed Kubernetes on AWS, which would cost significantly more just for the control plane and modest worker nodes. Plus you don’t actually own the infrastructure, so there’s no learning at the hypervisor or networking layers.
Time investment is trickier to quantify. Initial setup took me several weeks of evenings, but I was building documentation and automation as I went. Maintenance typically averages a few hours per month—cluster upgrades, application updates, occasional troubleshooting. Experiments and improvements take as much time as I want to invest.
The return isn’t measured in dollars—it’s measured in architectural capability and operational confidence. I make better infrastructure decisions at work because I’ve validated the patterns at home. I’m more effective in incident response because I’ve debugged similar issues in my homelab. I mentor engineers more effectively because I can speak from experience, not theory.
Who Benefits From This#
You should run production infrastructure at home if you architect systems for a living or want to. Platform engineers, SREs, infrastructure architects, ML engineers who deploy their own models—anyone whose work requires understanding how distributed systems behave in production.
This isn’t for everyone. If you’re purely focused on application development and don’t care about the infrastructure layer, Docker Compose is simpler and perfectly adequate. If you prefer managed services and don’t want to think about cluster operations, that’s a valid choice.
But if you’re making architectural decisions that affect reliability, scalability, or operational cost—or you want to grow into that role—running production infrastructure at home builds capabilities you can’t develop any other way.
Practical Guidance for Starting#
Start with a single node running K3s. Pick one real application that you or your family will actually use—not a demo app. Deploy it, monitor it, and keep it running reliably. Only after you’ve mastered single-node operations should you expand to multi-node for HA.
Don’t try to implement everything at once. I’ve watched engineers burn out trying to set up service mesh, observability stack, GitOps tooling, and multiple applications simultaneously. Build incrementally. Get something working and reliable, then add complexity as you need it.
Document everything as infrastructure as code. Terraform for the cluster itself, Helm charts or raw manifests for applications, GitOps workflows for deployment. If you can’t recreate your cluster from git clone and a few commands, you’re doing it wrong. This discipline pays dividends when you inevitably need to rebuild.
Join communities where people run serious homelabs. Reddit’s r/homelab and r/kubernetes, the CNCF Slack, local Kubernetes meetups. Learn from people who’ve solved problems you haven’t encountered yet.
What is Coming Next#
I’m documenting my entire setup as a reference for others. The upcoming content series covers:
Infrastructure Foundation: Proxmox hypervisor configuration with cloud-init templates, Terraform automation for VM provisioning, and K3s cluster deployment using Ansible. I’ll show how to make everything reproducible from code, including the networking setup that lets the entire rack work as a self-contained unit—plug it in anywhere in the world and it functions identically.
Storage Deep-Dive: Comparison of NFS, local storage, and Ceph for different workload patterns. I’ll cover when to use each, performance characteristics, and how to configure Synology NFS with Kubernetes CSI drivers for persistent volumes.
Production Applications: Detailed guides for deploying Plex with hardware transcoding acceleration and HA configurations, Immich with secure remote access via Cloudflare Tunnels, and MLflow for experiment tracking. Each guide includes reliability patterns, backup strategies, and monitoring.
Observability Stack: The Prometheus and Grafana monitoring setup that alerts me to issues before users notice them, including storage capacity tracking and service health checks.
GitOps Workflow: How I manage cluster state entirely through Git with ArgoCD for automated sync and drift detection. Every change goes through version control, making the entire infrastructure auditable and recoverable.
AI/ML Workloads: Running high-performance LLM inference with vLLM and Modular’s stack on Kubernetes, GPU passthrough in Proxmox VMs, and considerations for multi-GPU distributed workloads in a homelab environment.
Network Infrastructure: Setting up pfSense for routing and firewall, Pi-hole for network-wide ad blocking and DNS management, WireGuard for secure VPN access, etc.
Home Automation: Integrating Home Assistant for smart home orchestration, running alongside production services in the same Kubernetes cluster, with automation dashboards and device management.
Hardware Guide: Detailed breakdown of cost versus performance tradeoffs, the GeeekPi rack setup, mini PC selection criteria, and scaling strategies as your needs grow.
All of this will be available at mi-homes.org, because the best way to solidify knowledge is to share with others.
The Core Insight#
Running Kubernetes at home isn’t about the technology—it’s about developing the operational mindset that separates engineers who understand systems from engineers who use systems.
You can’t learn this from courses or certifications. You have to run something that matters, face real failures with real consequences, and build the muscle memory that only comes from repeated operational cycles. When your monitoring alerts you to a problem before your users notice, when you recover from failure efficiently because you’ve practiced it, when you make architectural decisions based on operational experience rather than documentation—that’s when you’ve internalized what it means to run production infrastructure.
My homelab has made me a better engineer. Not because I learned Kubernetes, but because I developed operational maturity that only comes from running production systems. That’s the real value proposition.
