Builder's Notes
- Scaling up to five RTX 3090s, the new Minimax M2.5 as daily driver, and the refactored websiteFebruary 2026
- Getting the massive machine up and running: motherboard issues, RAM troubleshooting, and power solutionsJanuary 2026
- Starting the engineering journal: from hobbyist hacker to building LLM inference systems with off-the-shelf componentsJanuary 2026
Hardware Builds
- A scalable multi-GPU inference system designed for expansion from 2 to 6 GPUs with Threadripper backboneJanuary 2026
- Technical analysis of PCIe lanes, CPU selection, and the workstation motherboard that makes it all possibleJanuary 2026
Software
- Vector db RAG for your coding agents: a cross-platform memory system built with PostgreSQL + pgvector and MCPMarch 2026
Model Reviews
- 6-bit quantization meets 122B parameters: high-quality MoE inference at 40+ tokens/secMarch 2026
- 1-bit quantization meets 397B parameters: the largest model we've run yetFebruary 2026
- Minimax M2.1 (IQ2_M): slumming it with lower quant precisionFebruary 2026
- Performance and memory analysis of an 80B MoE model with Q5_K_XL quantizationFebruary 2026
- Performance and memory analysis of a 30B MoE model with Q5_K_M quantizationJanuary 2026
- Extreme quantization test: 358B model at 1-bit vs 30B model at 5-bitFebruary 2026
- MoE CPU offloading to fit a 120B parameter model on 48GB VRAM + 64GB RAMJanuary 2026
Performance & Experiments
- Discovering 7x speedup with Mixture of Experts models: Nemotron-3-Nano vs Qwen3-32B benchmarkJanuary 2026
- Optimizing multi-GPU setups, testing context windows, and workarounds for 65K token limitsJanuary 2026
Efficiency & Optimization
- Running MiniMax M2.5 through a power limit sweep (100W-350W per GPU) to understand efficiency and performance tradeoffsFebruary 2026
- How capping RTX 3090 power consumption to 200W cuts electricity costs by ~43%, reduces heat, and extends GPU lifespanFebruary 2026
Technical Deep Dives
- Quantization techniques and memory optimizations to support large context windows on limited VRAMJanuary 2026
- Real-world llama.cpp server setup, framework comparison, and troubleshooting journeyJanuary 2026
- Troubleshooting memory management, port conflicts, and common server failure patternsJanuary 2026
- Lightweight tool for measuring LLM inference performance across configurationsJanuary 2026