Building Your UK Private Cloud: Proxmox Architecture and Setup
A practical guide to Proxmox architecture decisions: KVM vs LXC, Ceph storage configuration, and a proven Ansible and Terraform IaC approach for UK private cloud deployments.
Parts 1 and 2 of this series covered what Proxmox is and why UK businesses choose it over public cloud. If you are reading this article, you have probably made the decision to proceed. Now comes the harder question: how do you actually architect it?
The choices you make at this stage - KVM or LXC, Ceph or ZFS, Ansible alone or with Terraform - determine whether your platform runs smoothly for years or becomes a maintenance burden that consumes more time than the public cloud bill it replaced.
Most Proxmox guides either stop at "install and click through the GUI" or assume you already know the answers. There is a gap between "Proxmox exists" and "here is a production architecture that works." This article fills that gap.
What follows is opinionated architecture guidance drawn from production Proxmox deployments. We cover five key decisions every technical team faces when building a Proxmox private cloud: which virtualisation type to use and when, how to design your storage architecture, how to separate network traffic, how to size hardware appropriately, and how to manage everything with infrastructure as code. Each section gives you a clear recommendation with rationale, not a menu of options.
KVM vs LXC: A Production Decision Framework
Part 1 introduced KVM and LXC as distinct virtualisation technologies. This section moves past the theoretical comparison into a concrete decision framework based on actual production deployment patterns.
The core principle is straightforward: default to LXC for Linux workloads; reserve KVM for specific use cases.
In production deployments, the vast majority of workloads run in LXC containers. A typical 3-node cluster might run 20 or more LXC containers alongside just two or three KVM VMs. This is not a compromise - it is deliberate architecture. LXC containers start in seconds rather than minutes, consume substantially less RAM than equivalent VMs, and are simpler to manage at scale. Community estimates suggest VM overhead runs at 5–10% versus 1–3% for LXC containers[src], though results vary by workload.
The density advantage translates directly to hardware cost. Web service containers typically need 512 MB–2 GB RAM, whilst equivalent KVM VMs need 2–8 GB just to run the operating system alongside the service. On a node with 128 GB RAM, that gap matters.
The Decision Checklist
The question is not "which technology is better." It is "which technology does this specific workload require." If you need any of the following, use KVM. Otherwise, use LXC.
In production e-commerce deployments we have worked on, the pattern holds consistently: LXC containers for web frontends, PHP application servers, databases, APIs, monitoring, logging, and utility services. KVM VMs for FTP servers (which require kernel-level networking that conflicts with LXC's shared kernel), NFS servers, and WordPress instances running third-party plugins that cannot be fully audited.
In practice, most workloads map clearly to one technology. LXC is the right choice for web frontends (Nginx, Apache), PHP-FPM application servers, databases (MySQL, PostgreSQL), caching layers (Redis, Memcached), monitoring stacks (Prometheus, Grafana), and ingress tunnels (Cloudflare). These workloads have no kernel requirements and benefit from the density and fast startup that LXC provides.
KVM is reserved for specific cases: WordPress instances running third-party plugins that cannot be fully audited (untrusted code requires stronger isolation), FTP servers such as vsftpd or ProFTPD (kernel networking conflicts with LXC's shared kernel), NFS servers (kernel filesystem modules), Windows application servers (non-Linux OS with no LXC option), and GPU workloads requiring hardware passthrough.
One practical note: starting LXC-first and migrating a workload to KVM later is straightforward. The reverse - discovering you over-specified KVM across the board - is a harder problem to fix once containers are in production.
Storage Architecture: Ceph, ZFS, or Both
Storage architecture is where many first-time Proxmox deployments go wrong. The platform supports a wide range of storage backends, which creates analysis paralysis. Production deployments converge on a clearer pattern: a tiered approach that combines Ceph for distributed shared storage with ZFS for local high-performance storage.
Two-Tier Storage Design
In practice, production clusters typically define at least two storage tiers. A common pattern defines storage.standard (SSD-backed, for general workloads) and storage.fast (NVMe-backed, for latency-sensitive workloads). Database containers and high-traffic web frontends land on fast storage; backup containers, monitoring stacks, and utility services land on standard.
This is not over-engineering - it is avoiding the alternative, which is everything competing for the same I/O budget. A backup job running at 3 am should not impact your database's read latency.
Ceph for Shared, ZFS for Local
Ceph is essential for live migration and high availability. When Proxmox needs to move a VM or container from a failing node to a healthy one, it must access the workload's storage from the destination node. Locally-stored workloads cannot migrate without first copying the disk image, which takes time. Ceph distributes storage across all nodes, so any node can access any workload's storage instantaneously.
ZFS is excellent for local storage on each node: the OS partition, local scratch space, and workloads that do not need live migration. ZFS's built-in checksumming, compression, and snapshots make it far more reliable than raw ext4 or xfs for production use.
Many production clusters use both. Ceph for VM and container disks that need mobility across nodes; ZFS for the host OS and local-first workloads.
Ceph Sizing That Works
The Proxmox official documentation specifies that a hyper-converged Ceph cluster requires at least three identical servers[src]. Community consensus recommends five nodes for production Ceph, with three nodes suitable for testing and development environments[src].
For memory, configure OSDs with at least 8 GiB of memory for good performance; the OSD daemon itself requires 4 GiB as a baseline[src]. Plan your RAM budget accordingly - eight OSDs across a three-node cluster means 64 GiB of RAM reserved just for storage services.
Network bandwidth matters significantly for Ceph. The official documentation recommends a minimum of 10 Gbps for Ceph traffic, with high-performance setups requiring 25 Gbps or more for internal cluster replication traffic[src].
OSD Placement and Pool Configuration
Distribute OSDs evenly across nodes. With a three-node cluster and 12 OSDs total, that is four OSDs per node - a balanced starting point. Avoid RAID controllers sitting between Proxmox and your disks; use HBA pass-through mode so Ceph manages disk health directly. RAID controllers interfere with Ceph's own error detection and recovery mechanisms.
For pool configuration, use the PG Autoscaler rather than calculating PG counts manually. The autoscaler adjusts placement group counts as you add OSDs and data, removing a common source of misconfiguration.
The default replication settings in Proxmox - three copies with a minimum of two copies for I/O (size=3, min_size=2) - are the correct production defaults[src]. Do not reduce these without understanding the consequences.
Network Design: Separating Traffic for Reliability
A production Proxmox cluster requires a minimum of three network segments. Running Corosync heartbeat traffic, Ceph replication, and VM/container traffic over the same network interface creates both reliability risks and performance bottlenecks. When a backup job saturates the storage network, you do not want Corosync to start missing heartbeats and trigger unnecessary failover.
The Proxmox official documentation is explicit: storage communication should never share a network with Corosync[src]. This is not a recommendation - it is a requirement for reliable cluster operation.
Three-Network Minimum
Management VLAN (MTU 1500) carries web UI access, SSH, API calls, and Corosync heartbeat traffic. This network is intentionally low-bandwidth - management traffic is minimal - but it must be reliable. Corosync uses this network to determine whether nodes are alive. If Corosync cannot reach a node, it assumes the node has failed and triggers HA failover. False positives here are disruptive.
Storage VLAN (MTU 9000, jumbo frames) carries Ceph OSD replication and data traffic. This is your highest-bandwidth network. With a three-node cluster running active write workloads, Ceph will replicate every write three times across this network. Jumbo frames reduce per-packet overhead and improve throughput - enable MTU 9000 end-to-end, which means your switches must also support it.
Guest VLAN(s) carry VM and container network traffic, both internal cluster communication and external-facing traffic. In practice, you often need multiple guest VLANs - one for internal services, one for internet-facing workloads, perhaps one for database traffic. Proxmox handles multiple guest VLANs cleanly.
A typical production VLAN layout assigns VLAN 10 to management and Corosync traffic at MTU 1500 with a minimum of 1 GbE - this carries web UI access, SSH, and cluster heartbeat. VLAN 20 handles Ceph storage replication at MTU 9000 (jumbo frames required) with a minimum of 10 GbE and 25 GbE recommended for high-performance workloads. VLAN 30 carries internal guest traffic (container-to-container communication) at MTU 1500 over 10 GbE, whilst VLAN 40 handles external-facing workload traffic on a separate segment at the same specifications.
Practical Implementation
VLAN-aware bridges in Proxmox let a single physical NIC pair carry multiple VLANs. For small clusters, 2×10 GbE per node bonded for redundancy handles management, storage, and guest traffic on separate VLANs. For Ceph-heavy clusters where storage traffic is sustained and high-volume, dedicate a 25 GbE link to the storage VLAN.
Proxmox Software-Defined Networking (SDN) enables cluster-wide VLAN management from a single interface. You define zones and virtual networks once, and Proxmox propagates the configuration to all nodes. This eliminates per-node manual network configuration and the configuration drift that follows from it over time. For any cluster beyond a single node, SDN is the right approach.
Hardware Sizing for UK Production Workloads
The two most common hardware sizing mistakes are opposite in direction: over-specifying (wasting budget on capacity that will never be used) and under-specifying (discovering the resource that constrains you on day one). These recommendations are based on production deployments and the official Proxmox hardware requirements.
CPU: Cores Over Clock Speed
Virtualisation is parallel by nature. More cores serve you better than faster cores. Intel Xeon Scalable and AMD EPYC processors are the production standard - both offer high core counts, ECC memory support, and server-grade reliability features unavailable in desktop processors.
A common guideline is to keep total vCPU allocation at 2–3 times the physical core count for typical business workloads[src]. Business workloads are rarely CPU-bound simultaneously, so moderate overcommitment works well in practice. Reserve at least one physical core per Ceph OSD service to prevent storage operations from starving your virtualised workloads.
RAM: The Binding Constraint
RAM is almost always the resource that runs out first on Proxmox clusters. It is constrained from multiple directions simultaneously. Proxmox official requirements recommend 64 GB or more for production nodes[src].
Budget RAM across four categories:
- Host OS: 4–8 GB for Proxmox itself
- ZFS ARC cache: approximately 1 GB per TB of local storage[src]
- Ceph OSDs: 8 GB per OSD service running on the node[src]
- VM and container allocations: everything your workloads actually need
ECC (error-correcting) memory is strongly recommended for production. A single uncorrected bit flip can corrupt data across multiple VMs simultaneously - this is not theoretical risk; it is a documented failure mode on non-ECC memory under load. Plan for N+1 capacity so workloads can fully migrate off a failed node without the surviving nodes running out of RAM.
Node Count and UK Data Centre Options
The minimum for Proxmox clustering is three nodes (needed for quorum). The minimum for production Ceph is five nodes per community consensus. Start with three nodes if budget constrains you, but design the architecture to scale to five - add nodes later rather than redesigning your storage architecture.
A 3-node cluster fits comfortably in a quarter-rack. Established UK co-location providers offer quarter-rack and half-rack options that suit this scale. Edmonds Commerce works with trusted data centre partners and can recommend suitable facilities based on your location and connectivity requirements. Budget for dual power supplies on each node (both live feeds connected) and IPMI/iDRAC for out-of-band remote management - essential for a cluster you do not have physical hands on.
Infrastructure as Code: The Hybrid Ansible + Terraform Approach
This is the section that separates a Proxmox cluster that "works" from one that you can actually manage reliably at scale. Manual configuration through the web GUI is appropriate for initial setup and exploration. Beyond that, infrastructure as code becomes essential.
"When dealing with infrastructure at a larger scale, full automation is not just a nice-to-have - it's an absolute must. No one wants to manually install every single node, set IP addresses by hand, initialise clusters, and join each additional node one by one."[src]
The question is not whether to use IaC - it is which combination of tools to use.
Why Both Tools
Terraform and Ansible are frequently positioned as alternatives. In practice, they solve different problems and work best together.
Terraform excels at declarative resource lifecycle: "This container should exist with 4 cores, 4 GB RAM, on node-2, connected to the internal VLAN, with 40 GB on fast storage." Terraform tracks state - it knows what exists, can detect drift, and can create, update, or destroy resources consistently.
Ansible excels at imperative configuration: "Install nginx, configure PHP-FPM with these pool settings, deploy the application, set up cron jobs, create the systemd service." Ansible does not care about resource lifecycle; it applies configuration to whatever already exists.
Using Terraform for configuration management is fighting the tool's design. Using Ansible for provisioning lifecycle is equally awkward. The separation is clean when you respect it. As one practitioner summarised it: "Terraform builds the house, and Ansible furnishes it."[src]
The BPG Provider
If you are planning new Proxmox Terraform deployments, you need to know about the provider situation. The community has moved from the Telmate provider (telmate/proxmox) to the BPG provider (bpg/proxmox) as the recommended Terraform provider for Proxmox.
Template Image Pipeline
Production deployments benefit from a base image strategy. Rather than provisioning containers from a raw OS image every time, build a pipeline of layered templates: a minimal Rocky Linux 9 base, then a base-with-common-packages template, then specialised templates for PHP, database, and web server stacks. Terraform provisions new containers from the appropriate template.
This reduces provisioning time substantially and enforces consistency - every PHP container starts from an identical known state rather than depending on a playbook running correctly from scratch each time. When you update a template, rebuilt containers are immediately on the new base.
When Pure Ansible Is Enough
Not every deployment needs Terraform. For smaller clusters with a relatively stable container set - perhaps 3 to 5 nodes, containers that do not frequently change shape or count - pure Ansible handles both provisioning and configuration without adding Terraform's state management complexity.
The trigger for introducing Terraform is when you need frequent container lifecycle changes (creating, destroying, resizing containers) across multiple environments (development, staging, production), or when the container inventory is large enough that manual state tracking becomes error-prone. If you are building a deployment where developers regularly spin up and tear down environments, Terraform's declarative state model pays for itself quickly. If you have 20 containers that rarely change, pure Ansible is simpler.
Regardless of whether you use Terraform alongside Ansible or Ansible alone, a clean approach uses two separate Ansible subprojects within a monorepo: one for cluster infrastructure (Proxmox node configuration, network setup, storage initialisation) and one for service deployment (container creation, application configuration, service startup). This separation limits the blast radius of changes - an application deployment playbook cannot accidentally reconfigure cluster networking. The pattern works equally well whether Terraform or Ansible handles the provisioning layer.
The Terraform + Ansible Handoff
When you do combine the tools, the integration pattern is clean. Terraform creates the LXC container and registers it in Ansible's dynamic inventory via the ansible_host resource. Ansible's dynamic inventory plugin reads from Terraform state. Ansible playbooks target group-based configuration - all hosts in the "php" group receive PHP-FPM installation and configuration, all hosts in the "database" group receive MySQL and replication configuration.
The key principle in this structure: provision.yml runs once after Terraform creates a new container; deploy.yml runs on every deployment. Separating these means routine deployments do not re-run slow idempotent provisioning tasks.
Putting It All Together: A Reference Architecture
The five architecture decisions above combine into a coherent reference architecture for a production UK private cloud.
Compute layer: Three to five Proxmox nodes, each running Proxmox VE on bare metal. LXC containers for the majority of workloads. KVM VMs reserved for the specific cases the decision checklist identifies. Container and VM allocation spread across nodes with Proxmox HA groups ensuring no single node failure takes down production workloads.
Storage layer: Ceph distributed across all nodes with a minimum replication factor of 3 (size=3, min_size=2). Two storage pools: NVMe-backed for latency-sensitive workloads, SSD-backed for general use. ZFS on each node for the Proxmox OS and local scratch storage.
Network layer: Three VLANs minimum - management/Corosync on MTU 1500, Ceph storage on MTU 9000 with 10 GbE or faster, guest traffic on separate VLANs per security zone. Proxmox SDN managing VLAN configuration cluster-wide.
IaC layer: Terraform with the BPG provider defining container and VM inventory, storage allocation, network assignment, and HA group membership. Ansible configuring the software stack inside every container and VM. Git version-controlling both Terraform and Ansible projects, so the entire infrastructure definition is auditable and reproducible.
The result is an infrastructure where provisioning a new service involves adding a Terraform resource block and running the deployment pipeline - Terraform creates the container, registers it in Ansible inventory, Ansible configures the software stack, and the service comes online.
Conclusion
Good architecture decisions made at this stage prevent operational pain for years. The five decisions covered in this article each have a clear right answer for most production deployments.
Default to LXC containers and use KVM only for the specific workloads that genuinely require it - the density and simplicity advantages are real. Design storage as two tiers (fast NVMe and standard SSD) with Ceph for shared mobility and ZFS for local reliability; never set min_size=1. Separate management, storage, and guest traffic onto distinct VLANs - Corosync and Ceph must not share a network. Size RAM conservatively accounting for all four consumers (host OS, ZFS ARC, Ceph OSDs, workload allocations) with ECC memory throughout. Use Terraform with the BPG provider for resource lifecycle and Ansible for configuration - together, not as alternatives.
The difference between a private cloud that runs itself and one that demands constant attention is the quality of the initial design. These patterns are derived from production deployments, not theoretical best practice - they reflect what actually works when a cluster is handling real workloads.
If you are at the architecture stage and want to discuss your specific requirements - node count, workload profiles, storage design, or IaC strategy - we design and implement Proxmox private cloud infrastructure for UK businesses and are happy to talk through the options. Get in touch.
Ready to eliminate your technical debt?
Transform unmaintainable legacy code into a clean, modern codebase that your team can confidently build upon.