Neural Inference | AI & Hardware

A prototype build simulating the next generation of consumer AI hardware. Focused on high-bandwidth memory and neural-specific instruction sets. **Update (02/14/26):** Upgraded from RTX 5060 (8GB) to RTX 5070 (12GB) to test higher VRAM capacity models.

GPUNVIDIA RTX 5070 (12GB)

VRAM12GB GDDR7

CPUIntel Core Ultra 5

RAM96GB DDR5

Storage2TB NVMe Gen5

OSUbuntu

Shop This Rig ↗

Performance Log

Date	Model	Hardware	Params	Context	Prompt Eval	Token Gen
2026-02-15	Gemma-3-4.3B	RTX 5070 (12GB)	Q4_K_M	4096 tk	8944.5 t/s	141.8 t/s
2026-02-15	Gemma-3-4.3B (32k Context)	RTX 5070 (12GB)	Q4_K_M	32768 tk	7953.6 t/s	122.7 t/s
2026-02-15	Gemma-3-4.3B (64k Context)	RTX 5070 (12GB)	Q4_K_M	65536 tk	6968.6 t/s	109.5 t/s
2026-01-29	Gemma-3-4.3B	RTX 5060 (8GB)	Q4_K_M	4096 tk	5545.6 t/s	115.4 t/s
2026-01-29	Gemma-3-4.3B (32k Context)	RTX 5060 (8GB)	Q4_K_M	32768 tk	4809.1 t/s	95.6 t/s

SPONSORED TESTS// AD_SLOT: 1234567890 // FORMAT: AUTO

Performance Analysis

Trend Visualization

Historical Archive

Filter:

Showing 5 of 5 runs

Date	Model	Quant	Context	VRAM	Prompt (t/s)	Gen (t/s)
2026-02-15	Gemma-3-4.3B	Q4_K_M	4,096	-	8944.5	141.8
2026-02-15	Gemma-3-4.3B (32k Context)	Q4_K_M	32,768	-	7953.6	122.7
2026-02-15	Gemma-3-4.3B (64k Context)	Q4_K_M	65,536	-	6968.6	109.5
2026-01-29	Gemma-3-4.3B	Q4_K_M	4,096	-	5545.6	115.4
2026-01-29	Gemma-3-4.3B (32k Context)	Q4_K_M	32,768	-	4809.1	95.6