Ironhorse is a scalable AI inference platform built on off‑the‑shelf hardware, designed to run the largest language models efficiently. Leveraging Opencode and a modular multi‑GPU architecture, it can grow from a modest dual‑GPU setup to a full six‑GPU powerhouse, scaling compute capacity to match your workload demands.
Bill of Materials
Core Components
| Component | Part | Est. Cost | Notes |
|---|---|---|---|
| Motherboard | Supermicro M12SWA-TF-O (E-ATX) | $700 | sWRX8 socket, 6x PCIe 4.0 x16, 8-ch DDR4 |
| CPU | Threadripper Pro 5955WX | $660 | 16C/32T, 4.0/4.5GHz, 128 PCIe 4.0 lanes |
| RAM | OWC 512GB (8x64GB) DDR4-3200 ECC RDIMM | $3,423 | 2Rx4 288-pin 1.2V ECC Registered |
| GPUs | 2x NVIDIA RTX 3090 | $1,600 | 48GB VRAM total (24GB each) |
| PSU | Corsair HX1500i 80+ Platinum | $400 | 1500W single PSU (not dual) |
| PSU Adapter | ADD2PSU-S Dual PSU Adapter | $17 | For future dual PSU setup |
| Frame | Bomeiqee Mining Rig Frame | $65 | 12GPU frame with fans |
| Risers | 6x LINKUP PCIE 5.0 Riser Cable 40cm | $714 | Black v2, true x16 bandwidth |
| CPU Cooler | Noctua NH-U14S TR4-SP3 | $110 | Threadripper compatible cooler |
| Case Fans | 4x Noctua NF-A12x25 PWM | $152 | 120mm high-performance fans |
Total: ~$7,875 (estimated prices based on actual purchases)
Actual Spent to date (with tax): ~$5,618 (haven't bought the 2x more 3090 GPUs, or addition PSU yet)
Critical Component Selection Criteria
Motherboard: Supermicro M12SWA-TF
Chosen specifically for its 6x PCIe 4.0 x16 slots at full bandwidth. No bifurcation or sharing - each GPU gets the full 16 lanes. This board supports both 3000WX and 5000WX Threadripper Pro CPUs with 8-channel memory configuration.
CPU: Threadripper Pro 5955WX
The 5955WX provides 16 cores/32 threads with 4.0/4.5GHz boost speeds. For GPU-bound inference workloads, this is more than adequate - the key advantage is all Threadripper Pro CPUs provide 128 PCIe 4.0 lanes regardless of core count. The 5955WX delivers excellent value compared to higher core count models for LLM inference workloads.
Power Requirements
| Component | Power Draw |
|---|---|
| 2x RTX 3090 @ load | ~700W |
| Threadripper Pro 5955WX | ~280W |
| RAM, storage, fans | ~50W |
| Total Peak Load | ~1,030W |
Assembly Process & Lessons Learned
PCIe Riser Cable Reality Check
Cheap mining risers won't work. Those USB-style risers are x1-to-x16 adapters with only x1 bandwidth (~4 GB/s). For LLM inference, you need true x16-to-x16 extension cables. The LINKUP Ultra cables deliver full 32 GB/s bandwidth with no measurable performance loss.
CPU Lock Warning
Some Threadripper Pro CPUs from Lenovo P620 workstations are firmware-locked. Always verify sellers confirm "unlocked" or "retail/OEM tray" before purchase. A locked CPU won't post in aftermarket boards.
RAM Configuration
ECC RDIMM is required for 512GB configuration. 64GB DIMMs are only available as registered ( RDIMM). Standard UDIMM maxes out at 256GB (8x32GB). The 8-channel configuration provides ~200 GB/s memory bandwidth.
Thermal Management
Open-air frame with directional airflow across GPUs works well. RTX 3090s run hot (~80-85C under load). Consider undervolting if thermals become problematic. Noctua fans provide excellent airflow without excessive noise.
Capability Overview
Ironhorse Configuration: Dual-Head Mode
- 48GB VRAM + 512GB system RAM (2 active heads)
- ~1.9 TB/s aggregate VRAM bandwidth
- Runs medium models: Qwen3-32B-Q5_K_M, deepseek-r1:32b, Nemotron-3-Nano-30B-A3B-Q6_K
- Context window: less than 65k at usable speeds
- Power draw: ~1,030W (2 heads + backbone)
Head Growth: 4-Head Configuration
Expanding Ironhorse to 4 heads (~$1,600) with second PSU unlocks greater parallel processing:
- 96GB VRAM + 512GB system RAM (4 active heads)
- ~3.7 TB/s aggregate VRAM bandwidth
- Enables larger MoE models: GLM-4.5 Air, GLM-4.6/4.7 (MoE offload), MiniMax M2.1 Q2
- Total power draw: ~1,730W (4 heads + backbone)
Full Ironhorse: 6-Head Configuration
Maximum expansion (~$2,400 for 4 additional heads + second PSU) delivers:
- 144GB VRAM + 512GB system RAM (6 active heads)
- ~5.6 TB/s aggregate VRAM bandwidth
- Full capability for premier MoE models: DeepSeek V3 Q4 (MoE offload), MiniMax M2.1 Q4, GLM-4.6/4.7
- Total power draw: ~2,494W (6 heads + backbone)
Key Takeaways
Ironhorse Design Philosophy
- Modular Growth: Start with 2 heads (GPUs), expand to 6 as needs grow
- Neural Architecture: Each head works independently while sharing the same nervous system
- Quality Interconnects: True x16 risers ensure each head communicates at full bandwidth
- Future-Proof Foundation: Threadripper Pro provides the computational backbone for expansion
Cost vs. Reality
So far, this build delivers ~48GB VRAM initially for ~$5,600 actual spend which is not very economical. However, I think the design choices regarding PCIe lanes, flexibility afforded from significant RAM will pay off when we look to cram as much model as we can into our system. I've not seen used A100 40GB cards for less than $3000, so I think we are doing better than buying server-grade gear. And while a Mac Studio has decent performance and supports larger models with greater context length, the 256GB one will cost you at least $7,500, and the 512GB model > $12,000. If we can find a way to avoid running into memory bandwidth bottlenecks (~200GB/sec vs Mac's ~800GB/sec), the GPUs should offer faster inference speeds.
Ironhorse's backbone (motherboard, CPU, RAM, frame) is established. Once the first two heads (RTX3090s) are transplanted from the previous system, I'll begin benchmarking and testing this multi-headed architecture. As workload demands increase, additional heads can be activated. Ironhorse adapts to your computational needs.