What Happens When You 'Lobotomize' an LLM?
Tomasz Kolinko rewrites LLM inference with the Effort Engine, ranking matrix multiplications by impact and running 30–50% effort for up to 3x speedups with near-full quality on 70B models. Watch the One-Shot episode for live MacBook inference and heat-map visuals showing outputs shifting as effort changes.