Virtualization: Hypervisors, VMs, and Infrastructure
Create, configure, and manage virtual machines across hypervisors - from single-node Proxmox setups to multi-node clusters with HA, live migration, and GPU passthrough. The goal is production-ready VM infrastructure with correct storage, memory, and CPU config that won't bite you at 3 AM.
Target versions (verified May 2026):
| Tool | Version | Release date | Notes |
|---|---|---|---|
| Proxmox VE | 9.1 | Nov 2025 | Debian 13.2 (trixie), kernel 6.17.2, QEMU 10.1.2 |
| Proxmox Backup Server | 4.1 | Nov 2025 | Dedup, incremental, prune policies |
| bpg/proxmox (Terraform) | 0.100.0 | Apr 2026 | Primary Proxmox IaC provider |
| QEMU | 11.0.1 | Apr 2026 | Stable (11.0 series; GA Apr 22, 2026) |
| libvirt | 12.0.0 | Jan 2026 | Hypervisor abstraction layer |
| XCP-ng | 8.3 LTS | Oct 2024 | Xen-based, LTS since Jun 2025, EOL Nov 2028 |
| VMware ESXi | 8.0 U3i | Feb 2026 | Broadcom-owned, licensing upheaval |
| VirtualBox | 7.2.6 | Jan 2026 | Dev/testing only |
| Packer | 1.15.1 | Mar 2026 | Image builder, multi-platform |
| cloud-init | 26.1 | Feb 2026 | Instance initialization standard |
When to use
- Creating or configuring VMs on Proxmox, libvirt/KVM, XCP-ng, or VMware
- Provisioning Proxmox VMs with Terraform (bpg/proxmox provider)
- Building VM templates with Packer and cloud-init
- Configuring PCI/GPU passthrough for compute or display GPUs
- Managing storage backends (LVM-thin, ZFS, Ceph, NFS)
- Setting up Proxmox clustering, HA, and live migration
- Troubleshooting VM performance (disk I/O, memory, CPU)
- Planning backup strategies (Proxmox Backup Server, snapshots)
- Tuning disk performance (virtio-scsi, iothread, discard/fstrim)
- Memory management (ballooning, NUMA topology, hugepages)
When NOT to use
- Kubernetes manifests, Helm charts, container orchestration (use kubernetes)
- General Terraform/OpenTofu HCL patterns, state, modules (use terraform)
- Network config not hypervisor-specific: DNS, VPNs, reverse proxies (use networking)
- Ansible playbooks and configuration management (use ansible)
- Docker/container image optimization (use docker)
- OPNsense/pfSense firewall management (use firewall-appliance)
AI Self-Check
AI tools consistently produce the same VM configuration mistakes. Before returning any generated VM config, Terraform HCL, or Packer template, verify against this list:
- No hardcoded IPs, passwords, or SSH keys - use variables or cloud-init injection
- Disk interface is virtio (scsi0 with virtio-scsi controller), not IDE, unless legacy OS
-
iothread = trueon virtio-scsi disks for SSD-backed storage -
ssd = trueemulation enabled when backing store is SSD (enables guest TRIM) -
discard = onon QEMU disk config for thin-provisioned storage (fstrim passthrough) - Memory ballooning disabled unless tested on the specific guest OS (Alpine, some BSDs can't hotplug DIMMs - balloon changes need full power-cycle, not reboot)
- In bpg/proxmox Terraform: ballooning is disabled by setting the provider's minimum-memory/balloon field to 0 (verify exact key in the bpg/proxmox docs - commonly
memory_min_mborballoon; do not invent a field name without checking) - CPU type is
hostfor production (full feature passthrough), notkvm64/qemu64 - NUMA enabled for multi-socket or large-memory VMs
- QEMU guest agent enabled (cloud-init installs it, but verify)
- Cloud-init interface specified (bpg/proxmox defaults to ide2 when null)
- Terraform lifecycle:
prevent_destroyon VMs,ignore_changesondiskandnode_name - No disk resize via Terraform - use
qm resizeon host, then update Terraform var to match - PCI passthrough:
pcie = falsefor standard passthrough,xvga = falseunless display GPU - PCI passthrough: machine type is
q35whenpcie = trueis needed - GPU passthrough: AMD GPUs are prone to reset bugs (vendor-reset kernel module or
pcie_port_pm=offmay be required); NVIDIA generally resets cleanly but verify with your card model before production use - BIOS type matches use case:
seabiosdefault,ovmffor UEFI/Secure Boot/Windows 11 - Backup retention configured (not unlimited snapshots eating storage)
- Network device uses
virtiomodel, note1000orrtl8139 -
fstrim.timerenabled in guest for thin-provisioned storage (completes the discard chain) - SCSI controller explicitly set (
virtio-scsi-singlefor high IOPS,virtio-scsi-pcidefault) - Machine type matches BIOS:
i440fxwithseabios,q35withovmf(UEFI). Mixingi440fx+ovmfcauses boot failures.q35+seabiosworks but wastes q35 features. - VGA type matches use case:
serial0for headless cloud images,virtiofor GUI VMs, omit for PCI passthrough display GPUs (x-vga=1replaces the virtual display) - Current source checked: dated versions, CLI flags, API names, and support windows are verified against primary docs before repeating them
- Hidden state identified: local config, credentials, caches, contexts, branches, cluster targets, or previous runs are made explicit before acting
- Verification is real: final checks exercise the actual runtime, parser, service, or integration point instead of only linting prose or happy paths
- Routing overlap checked: overlapping skills, trigger terms, and "When NOT to use" boundaries are checked before returning guidance
- Spec claims verified: claims about tool behavior, output contracts, or repo conventions are checked against current docs, scripts, or skill files
- Hypervisor/version checked: Proxmox, QEMU/KVM, libvirt, XCP-ng, vSphere, and cloud-init advice matches the target platform
- Storage risk gated: disk format, snapshot, passthrough, and migration commands preserve data and rollback
Performance
- Choose storage format and cache mode based on workload: latency, snapshots, thin provisioning, and backup behavior differ.
- Right-size vCPU, NUMA, memory ballooning, and I/O queues from measured host pressure.
- Use templates and cloud-init for repeatable VM creation instead of manual clone drift.
Best Practices
- Snapshot before guest-agent, disk, boot, passthrough, or hypervisor upgrades, but do not treat snapshots as backups.
- Keep host, guest, and storage backups independently restorable.
- Document PCI/GPU passthrough bindings so kernel updates do not strand the host.
Workflow
Step 1: Identify the task
| Task | Start with | Reference |
|---|---|---|
| Proxmox VM creation | CLI (qm) or API (pvesh), cloud-init | references/proxmox.md |
| Terraform provisioning | bpg/proxmox provider, lifecycle rules | references/proxmox.md (Terraform section) |
| Image building | Packer + cloud-init templates | references/image-building.md |
| libvirt/KVM management | virsh, XML domain definitions | references/libvirt-qemu-kvm.md |
| GPU/PCI passthrough | IOMMU groups, vfio-pci | references/proxmox.md (PCI section) |
| Performance tuning | Disk, memory, CPU config | This file + references |
| Migration to Proxmox | From VMware, XCP-ng, or bare metal | references/proxmox.md |
Step 2: Gather requirements
Before creating or modifying VMs:
- Hypervisor and version - Proxmox VE 9.x? libvirt? VMware migration?
- Guest OS - Linux distro, Windows, BSD? (affects virtio drivers, ballooning, agent)
- CPU - core count, type (host vs emulated), pinning needs, NUMA topology
- Memory - dedicated amount, ballooning (usually: don't), hugepages for databases
- Storage - backend (LVM-thin, ZFS, Ceph, NFS), disk size, format (raw vs qcow2)
- Network - bridge, VLAN tag, virtio, firewall
- Passthrough - GPU/PCI devices, USB, serial ports
- Provisioning method - manual, Terraform, Packer template