GPU Power Limit Benchmark: 100W vs 350W

← Back to LLM Garage

This benchmark compares RTX 3090 performance at different power limits across various context lengths. Results show the impact of power capping on inference throughput.

Test Configuration

GPUs: 2x RTX 3090
Model: Qwen3-32B Q5_K_M
Framework: llama.cpp server

Results

Context Length	100W Power Limit	350W Power Limit	Difference
1K tokens	15 t/s	18 t/s	17% slower
8K tokens	12 t/s	16 t/s	25% slower
32K tokens	8 t/s	14 t/s	43% slower

Analysis

At lower context lengths, power limits have minimal impact. At higher context lengths, the performance penalty becomes significant.

Recommendation

For maximum throughput, run at full power (350W). For lower electricity costs and heat, 200W is a good balance.