Presenting an early outline of SITP at Toronto School of Foundation Modeling Season 1 (November 2025)
Preface
The Structure and Interpretation of Tensor Programs
This book is aspirationally titled The Structure and Interpretation of Tensor Programs, (from here on in abbreviated as SITP) as it’s goal is to serve a similar role for software 2.0 as The Structure and Interpretation of Computer Programs (from here on in abbreviated as SICP) did for software 1.0. Written by Harold Abelson and Gerald Sussman with Julie Sussman, SICP took learners on a whimsical whirlwind tour throughout the essence of computation starting with the elements of programs with functional programming, higher order functions, data abstraction, streams, and ending with programming their own programming languages with interpreters, compilers, and register machines.
My alma matter was amongst those which took the SICP approach, and as intended, for someone coming into first year college with high school computer science, it blew my mind. After graduating college in 2022, I followed my curiosity for diving deeper into the souls of our machine by going on to developing industrial languages and runtimes.“There is only one project, architecture, operating system and languages, compiler, it’s only one project. It’s all together.” – Boris Babayan. Particularly, I hacked on domain specific cloud compilers cloud provisioners, and cloud garbage collectors. At the end of 2022 though, when ChatGPT was released by OpenAI my mind was blown twice more. As someone programming since high school, I could not believe this at all. After two more years of hacking on cloud languages and runtimes, I started my transition from domain specific cloud compilers to domain specific tensor compilers.
1.5k lines of rust and 100 commits later, we can now inference the FFN neural language model from (Bengio et al. 2003) straight from Karpathy's Zero to Hero. all you have to do is replace the single "import torch" line with "import picograd" 😎 https://t.co/8paCERz3ry pic.twitter.com/iVKOCsg0zC
— Jeffrey Zhang (@j4orz) April 2, 2025
The transition started with a tweet showcasing the beginnings of a tensor library evaluating the forward pass of a feed forward network
from Andrej Karpathy’s Neural Networks: Zero to Hero course.
While it was illuminating to start implementing each individual torch call that the nets from makemore were making,
my knowledge felt quite fragmented as I forgot a lot of the foundational mathematics I saw in a single semester,
and I wasn’t sure how to bridge myself to industrial deep learning systems like tinygrad, torch, jax, vllm, and sglang.
Coloquially speaking, I was a neural network script kiddie.
Shortly after, I decided to take the plunge and started drinking from the firehose all the mathematical foundation I’ve since forgotten. While revisting preliminary foundation like Strang (1988), Nocedal, Wright (1999), Boyd, Lieven, Vandenberghe (2004) and reading deep learning cannon like Russel, Norvig 1995, Sutton, Barto (1992), Hastie Tibshirani (2001), Goodfellow, Bengio, Courtville (2016), Murphy (2022), the one thought I could not get out of my head was where is the SICP for software 2.0? While I found two excellent resources on building your own torch-like autograd by Tianqi Chen at Carnegie Mellon and Sasha Rush at Cornell, I personally would have really enjoyed a unified resource that took me from math, to deep learning, to deep learning systems in a single unbroken sequence of thought, and perhaps others would feel similarly. That is the genesis story for this book, whose central research question is the following: What should the SICP for Deep Learning look like?
We really could use a SICP for DL. We have the Little Lisper for DL (https://t.co/su31hFJeUe) but that's a different type of book entirely.
— Shriram Krishnamurthi (primary: Bluesky) (@ShriramKMurthi) May 3, 2026
The Structure and Interpretation of the AI Curriculum
goal: teach software 1.0 programmers software 2.0 with software 3.0
An interesting data point is that Codex 5.5 cannot be trusted to design good data structures purely from behavioral prompting. (I'm sure it can come up with good ideas if you prompt it, but not if it's incidental.)
— difficultyang (@difficultyang) May 15, 2026
This post was prompted by Codex coming up with a terrible internal data representation for an autograd tape with some special checkpointing behavior
— difficultyang (@difficultyang) May 15, 2026
Developing GPT is also highly non-trivial, but being able to develop PyTorch requires knowledge of a lot of math and science: calculus, linear algebra, statistics, optimization theory, neural network architecture, electrical engineering, software design, hardware programming,…
— Sebastian Raschka (@rasbt) November 15, 2025
constraints
- sicp style
- runs nanochat
- consolidates gpu mode lectures
- compiles a subset of tinygrad IR
methods
- curriculum
- pedagogy
- language
This work turned out in retrospect to be the seeds of SITP’s core with Part II. Neural Networks which covers the 2012-2020 “era of research” consisting of two chapters:
- Chapter 4. Learning Sequences from Data with Deep Neural Networks
- Chapter 5. Accelerating Sequence Models on
GPU–>
So in Part I. Elements of Networks, readers learn the prelimaniries for “pre-historic” machine learning:
- Chapter 1. Representing Data with High Dimensional Stochasticity
- Chapter 2. Learning Functions from Data with Parameter Estimation
- Chapter 3. Accelerating Functions and Data on
CPU
And in Part III. Scaling Networks, readers learn about the 2020-2025 era of scaling:
- Chapter 6. Large Language Models
- Chapter 7. Reasoning Models
- Chapter 8. Fusion Compilers
- Chapter 9. Inference Engines
Acknowledgements
Thank you to Lambda Labs for the Lambda Labs Research Grant. Thank you to a Cloud-V 10X Engineers and (RISC-V Labs).