Rebuilding GPT-2 in a Spreadsheet — Ishaan Anand on Opening the Black Box [AI Tinkerers - "One-Shot"] .

Rebuilding GPT-2 in a Spreadsheet — Ishaan Anand on Opening the Black Box

Joe Heitzeberg
Joe Heitzeberg — AI Tinkerers - "One-Shot"
August 14, 2025

Some people treat transformers as a “black box” you have to trust.
Ishaan Anand takes the opposite approach — cracking them open so anyone can see exactly how they work.

In this episode, Ishaan walks us through his project to rebuild GPT-2 from scratch inside a spreadsheet, showing every matrix, every operation, and every parameter along the way. The result: a way to see attention and transformer mechanics in action — and understand the foundation that powers modern LLMs.

Why this matters

Every modern LLM — from GPT-4 to Claude, Gemini to LLaMA — is built on the same transformer architecture that GPT-2 introduced to the world in 2019.
If you can understand GPT-2, you can understand all of them — and modify them for your own needs.
• Transparency pain: Most devs never see what happens between prompt and output.
• Learning pain: Reading papers is slow, running code is abstract — but seeing the math visually clicks instantly.
• Control pain: Without intuition for how LLMs process tokens, it’s hard to fine-tune, debug, or optimize them.

Ishaan’s approach makes the invisible visible, so you can reason about models instead of guessing.

“If you understand GPT-2, you understand the DNA of every model that came after it.”
– Ishaan Anand

Full GPT-2 in a spreadsheet

  • Every weight, bias, and operation exposed.
  • Visualize tokenization, embedding, and transformer blocks step-by-step.
  • Change inputs and watch attention patterns update in real time.

Three innovations since GPT-2

  • Mixture of Experts: Conditional computation for faster, cheaper inference.
  • RoPE (Rotary Position Embeddings): Better handling of longer contexts.
  • Training advances: Scaling laws, curriculum learning, and better optimization.

Hands-on experimentation

  • ~600 lines of Python replicate GPT-2 behavior.
  • Adjustable temperature, top-k, top-p for controlling generation.
  • Compare raw GPT-2 output with RLHF-aligned models to see the impact of human feedback.

Where this fits in the fast-moving landscape

  • Back to basics: Even as 400B+ parameter models emerge, small, interpretable models are where new builders should start.
  • Education as leverage: Teams that understand internals make better architectural and product decisions.
  • Open ecosystem: This method is fully reproducible without API costs — perfect for learning, teaching, and rapid prototyping.

Enjoy!

Comments

Ready for more?

Check out other posts from this blog.

View all posts