LLM Garage

Home Engineer's AI Hardware Journal

← Back to LLM Garage

GPU Power Limit Benchmark: 100W vs 350W

How power limits affect token generation throughput across context lengths on dual RTX 3090s
February 2026

This benchmark compares RTX 3090 performance at different power limits across various context lengths. Results show the impact of power capping on inference throughput.

Test Configuration

Results

Context Length 100W Power Limit 350W Power Limit Difference
1K tokens 15 t/s 18 t/s 17% slower
8K tokens 12 t/s 16 t/s 25% slower
32K tokens 8 t/s 14 t/s 43% slower

Analysis

At lower context lengths, power limits have minimal impact. At higher context lengths, the performance penalty becomes significant.

Recommendation

For maximum throughput, run at full power (350W). For lower electricity costs and heat, 200W is a good balance.