Tower of Babel, Genesis 11:1–9

Prologue

In some sense, the 21st century truly began only after the first 20 years past the second millenium, for it was not until the creation of ChatGPT where humanity traded in their so-called bicycles of mind for motorcycle upgrades. From 2020 to 2025, programmers discovered The Scaling LawsFor some popular accounts, see Gwern and Patel, Leech (2025), where pouring internet-scale data into the weights of transformer neural networks with massively parallel and distributed compute produces large language models. In turn these marvelous machines have finally enabled communication between natural organisms and artificial machines at a higher level of abstraction than before, through the means of natural language. It’s not an overstatement to claim that the hottest new programming language is Englisha tweet by Karpathy, Jan 24 2023.

Talking to machines with natural language as opposed to more formal programming languages has been a long standing dream in the science of the mind we call artificial intelligence, which is one of the oldest and arguably most important projects of mankind. Although the methods to build an artificial intelligence will take the remainder of this book to exposit, the goal of the discipline is simply put as building machines that can think.

As usual with most intellectual ideas, many characteristics of artificial intelligence can be traced back to Aristotle, however the philosophical sketches of the field proper arguably started with Descartes, extended by La Mettrie’s Man A Machine, initiated by Leibniz’s Universal Calculus, and applied computationally with Wittgenstein’s Tractatus Logico-Philosophicus and Philosophical Investigations.

The story of artificial intelligence is tightly interconnected with computation, given that the field as we know it today started in earnest during the 20th century at the 1956 Dartmouth Summer Research Project on Artificial Intelligence. There, a group of resesearchers interested in the science of the mind went on to establish the field of “artificial intelligence” a rebranding of the disciplinetainted by the hermaneuticism of psychoanalysis with computational methodsoperationalizing the notion of computationalism rather than the correlative methods of neurosciencesee Could a Neuroscientist Understand a Microprocessor? (Jonas, Kording 2017) and the observational methods of psychology’s behaviorism. Their reasoning, (roughly and reductively) consists of the following:

Since Gödel’s Incompleteness Theorems state that there exist propositions unprovable, and the Church-Turing Thesiswhich SITP’s spiritual predecessor SICP smuggles in through a footnote in chapter 4.1 The Metacircular Evaluator states that all representable languages implementable with computation have the same expressivity, if we ever wanted to physically realize non-biological artificial intelligence, the constructive stateful mathematics on Turing Machines implemented with von Neumann Architectures via electricity and semiconductors are the correct substrate to conduct the science of the mind, as opposed to classical stateless mathematics.

Although united by the idea of using computation to mechanize the mind and thus serve as the basis of artificial intelligence, they were divided between which exact computational techniques and to employ. The two prominent campsothers include embodimentalists, dynamicalists, and self-organizers, which arguably is the next frontier of the discpline. i.e robotics, self-reproducing robotics at the time being the symbolists and the connectionists which in some sense are the primordial “software 1.0” and “software 2.0” we now know today, and in fact, this distinction already explored by our mathematical cousins, when physicists transitioned from using the deterministic logic of Aristotle to to stochastic one of Laplace. More technically, Baye’s Theorem generalizes Aristotelian Logic as a corner case with the probability of some belief given evidence is 1.

With the way in which the field played out, the logical approach with symbolic techniques to artificial intelligence started out as the favorite school of thoughtthus known as “classical” AI or “good-old-fashioned” AI as opposed to the probabilistic approach largely summarized and arguably due to the 1969 book Perceptrons by Marvin Minsky and Seymour Papert. The logical approach used logical tools from logicianslike Frege, Tarski, Brouwer, Gentzen, Curry-Howard, Martin Löf, Girard, and so on to create expert systems such as those of ELIZA. Overtime however, largely due to the continually increasing capability of hardware, the probabilistic approach with machine learning techniques started to see some more success, and empirically validate the idea that GOFAI techniques do not describe large combinatorial phenomena well. (todo: wittgenstein’s tractatus vs PI predicted GOFAI not being able to represent phenomena with too many parts to count)

Ironically enough, although these people whom gather under the umbrella of “connectionists” were united in the idea of using probabilistic learning methodsfollowing the same principle of modeling phenomena with mathematical models where the system minimizes free energy which experienced large success in the 19th century when physicists such as Helmholtz, Gibbs, and Boltzmann modeled energy, enthalpy, and entropy, they themselves were divided between which exact models $y = f_{θ} (x)$ to employ. The three prominent ones being gaussian processes, kernel machines, and neural networks. Theoretically, (kernel machines todo), but practically neural networks when made deep are able to learn representations. The true watershed moment was during 2012 when a neural network named AlexNet trained on a parallel graphics processor was released, scoring a loss on the ImageNet dataset 10.8% better than the next runner-up.

The 2012-2019 period in deep learning is now becoming known as the era of research, as diverse and various inductive biases were explored through the means of network architectures, resulting in different neural networks such as feedforward neural networks, convolutional neural networks, recurrent neural networks, long-short-term-memory neural networks, and so on, up until the scaling the attention mechanism and feedforward nets inside the transformer archicture started gaining dominance in the 2020-2025 period, now known as the the era of scaling, which brings us back to the present day.

Keyboard shortcuts

The Structure and Interpretation of Tensor Programs

Prologue