Rebuilding GPT-2 in a Spreadsheet — Ishaan Anand on Opening the Black Box
Some people treat transformers as a “black box” you have to trust.
Ishaan Anand takes the opposite approach — cracking them open so anyone can see exactly how they work.
In this episode, Ishaan walks us through his project to rebuild GPT-2 from scratch inside a spreadsheet, showing every matrix, every operation, and every parameter along the way. The result: a way to see attention and transformer mechanics in action — and understand the foundation that powers modern LLMs.
Why this matters
Every modern LLM — from GPT-4 to Claude, Gemini to LLaMA — is built on the same transformer architecture that GPT-2 introduced to the world in 2019.
If you can understand GPT-2, you can understand all of them — and modify them for your own needs.
• Transparency pain: Most devs never see what happens between prompt and output.
• Learning pain: Reading papers is slow, running code is abstract — but seeing the math visually clicks instantly.
• Control pain: Without intuition for how LLMs process tokens, it’s hard to fine-tune, debug, or optimize them.
Ishaan’s approach makes the invisible visible, so you can reason about models instead of guessing.
“If you understand GPT-2, you understand the DNA of every model that came after it.”
– Ishaan Anand
Full GPT-2 in a spreadsheet
- Every weight, bias, and operation exposed.
- Visualize tokenization, embedding, and transformer blocks step-by-step.
- Change inputs and watch attention patterns update in real time.
Three innovations since GPT-2
- Mixture of Experts: Conditional computation for faster, cheaper inference.
- RoPE (Rotary Position Embeddings): Better handling of longer contexts.
- Training advances: Scaling laws, curriculum learning, and better optimization.
Hands-on experimentation
- ~600 lines of Python replicate GPT-2 behavior.
- Adjustable temperature, top-k, top-p for controlling generation.
- Compare raw GPT-2 output with RLHF-aligned models to see the impact of human feedback.
Where this fits in the fast-moving landscape
- Back to basics: Even as 400B+ parameter models emerge, small, interpretable models are where new builders should start.
- Education as leverage: Teams that understand internals make better architectural and product decisions.
- Open ecosystem: This method is fully reproducible without API costs — perfect for learning, teaching, and rapid prototyping.
Enjoy!
Comments
Loading comments...