Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Presenting an early outline of SITP at Toronto School of Foundation Modeling Season 1 (November 2025)

Preface

The Structure and Interpretation of Tensor Programs

This book is aspirationally titled The Structure and Interpretation of Tensor Programs, (from here on in abbreviated as SITP) as it’s goal is to serve a similar role for software 2.0 as The Structure and Interpretation of Computer Programs (from here on in abbreviated as SICP) did for software 1.0. Written by Harold Abelson and Gerald Sussman with Julie Sussman, SICP took learners on a whimsical whirlwind tour throughout the essence of computation starting with the elements of programs with functional programming, higher order functions, data abstraction, streams, and ending with programming their own programming languages with interpreters, compilers, and register machines.

My alma matter was amongst those which took the SICP approach, and as intended, for someone coming into first year college with high school computer science, it blew my mind. After graduating college in 2022, I followed my curiosity for diving deeper into the souls of our machine by going on to developing industrial languages and runtimes.“There is only one project, architecture, operating system and languages, compiler, it’s only one project. It’s all together.” – Boris Babayan. Particularly, I hacked on domain specific cloud compilers cloud provisioners, and cloud garbage collectors. At the end of 2022 though, when ChatGPT was released by OpenAI my mind was blown twice more. As someone programming since high school, I could not believe this at all. After two more years of hacking on cloud languages and runtimes, I started my transition from domain specific cloud compilers to domain specific tensor compilers.

The transition started with a tweet showcasing the beginnings of a tensor library evaluating the forward pass of a feed forward network from Andrej Karpathy’s Neural Networks: Zero to Hero course. While it was illuminating to start implementing each individual torch call that the nets from makemore were making, my knowledge felt quite fragmented as I forgot a lot of the foundational mathematics I saw in a single semester, and I wasn’t sure how to bridge myself to industrial deep learning systems like tinygrad, torch, jax, vllm, and sglang. Coloquially speaking, I was a neural network script kiddie.

Shortly after, I decided to take the plunge and started drinking from the firehose all the mathematical foundation I’ve since forgotten. While revisting preliminary foundation like Strang (1988), Nocedal, Wright (1999), Boyd, Lieven, Vandenberghe (2004) and reading deep learning cannon like Russel, Norvig 1995, Sutton, Barto (1992), Hastie Tibshirani (2001), Goodfellow, Bengio, Courtville (2016), Murphy (2022), the one thought I could not get out of my head was where is the SICP for software 2.0? While I found two excellent resources on building your own torch-like autograd by Tianqi Chen at Carnegie Mellon and Sasha Rush at Cornell, I personally would have really enjoyed a unified resource that took me from math, to deep learning, to deep learning systems in a single unbroken sequence of thought, and perhaps others would feel similarly. That is the genesis story for this book, whose central research question is the following: What should the SICP for Deep Learning look like?

The Structure and Interpretation of the AI Curriculum

goal: teach software 1.0 programmers software 2.0 with software 3.0

constraints

  • sicp style
  • runs nanochat
  • consolidates gpu mode lectures
  • compiles a subset of tinygrad IR

methods

  • curriculum
  • pedagogy
  • language

This work turned out in retrospect to be the seeds of SITP’s core with Part II. Neural Networks which covers the 2012-2020 “era of research” consisting of two chapters:

So in Part I. Elements of Networks, readers learn the prelimaniries for “pre-historic” machine learning:

And in Part III. Scaling Networks, readers learn about the 2020-2025 era of scaling:

Acknowledgements

Thank you to Lambda Labs for the Lambda Labs Research Grant. Thank you to a Cloud-V 10X Engineers and (RISC-V Labs).