This article is not intended just for those in the tech world. It is written for anyone who wants to learn more about what’s behind the curtains of the technology driving humanity forward; Share it with your kids, your grandma, maybe even your dog.
Perhaps you have heard the phrase “software is eating the world,” first coined in 2011 by Marc Andreesen, the creator of the first graphical web browser, co-founder of the first cloud computing company, and one of the most successful venture capitalists of the 21st century. It refers to the idea that some of the most important business practices and technology of the 19th and 20th century have been replaced with or improved by software.
Why is that? Well it comes down to two main things: Cost and speed. If you’ve ever seen the movie Hidden Figures, you’ll know that humans landed on the moon thanks to calculations done by hand by hundreds of people. Software allows all those calculations to be performed virtually, and thousands of times faster than any human ever could. Instead of paying all those people salaries for years, one person can download the code that performs those calculations from the internet, for free, and run them in minutes on a computer that costs half as much as a typewriter did in 1961. Crazy, right?
And then the internet made sending information easier and faster than ever. Imagine if Paul Revere’s midnight ride was just a text, an entire conversation delivered in seconds:
Paul Revere: “Hey”
Samuel Adams: “Wassup”
PR: “Whe British are coming”
SA: “Where?”
PR: “They’re on their way to seize our weapons in Concord. They’re traveling through Lexington.”
SA: “Ok, I’ll text the others in the militia leadership groupchat.”
Perhaps history would be a little less lively, perhaps not. Similarly, instead of searching through a library, shuffling through yellow pages, or making dozens of phone calls, a simple Google search of any topic takes seconds and returns millions of pages of easily accessible, free information.
The core idea of software is that it doesn’t fundamentally change the ways in which things are done, but it improves them and makes the world a better place by making existing processes more efficient.
I’m sure you’ve seen quite a bit of buzz recently about artificial intelligence (AI). Some have even said that software ate the world, and now AI is going to eat software. But what is it? How is it used? Is it free? How does it work? Why is everyone so excited about it? To answer those, let’s first dig into the basics of AI.
- AI has been an academic field for almost 70 years, but only in the past ~15 years has it received any attention for its commercial application.
- Its gist is that computers can simulate being “intelligent.” The nuances of that word can get quite hairy, but the idea is that AIs can do human-like things without human input, just code.
Machine Learning (ML) is the study of programs that can improve their performance on a given task automatically. ML programs are called models and an agent is anything (man or machine) that perceives the world and takes action. A model’s architecture is how its code is constructed. The process of creating a useful model is called training. Keep these terms in mind for later. Nearly all current AI today uses ML models. They come in a few main forms:
- Unsupervised Learning: Give it data and it finds trends without any human input.
- Supervised Learning: A human first classifies data into several categories, then the model makes predictions about new, similar data that isn’t classified).
- Reinforcement Learning: A model is “rewarded” for “good” responses and “punished” for “bad” ones.
- Deep Learning: This is a subset of ML that uses biologically-inspired “neurons” to perform the tasks above (instead of statistical models). These are called artificial neural networks. Other commonly used models include decision trees, support vector machines, regression analysis, bayesian networks, and gaussian processes.
AI research (and applications) does not have any one focus. Here are a few of the most common branches, which you may have heard of:
- Natural Language Processing: This is the type of AI you are likely most familiar with (such as ChatGPT by OpenAI). NLP allows a computer to read, write, and communicate in human languages. Its key quality is extracting not just the grammatical rules of writing, but the meaning of words, sentences, and entire bodies of writing.
- Perception: This is the ability of machines to analyze data from sensors (cameras, radar, lidar, microphones, etc.) to deduce aspects of the real world. Computer vision is the subset of this which deals with visual input.
- General Intelligence: Artificial General Intelligence (AGI) is the north star of a lot of AI organizations. It’s basically where a machine can solve the same problems as humans.
This is all pretty neat!… and abstract. There’s one more important thing to know, that’s equally thorny. In 2017, Google published a paper called “Attention is All You Need,” which introduced a model architecture called the Transformer, which is the basis for most modern ML models. The Transformer had several key advancements: Compared to its predecessors, it is much more efficient, much faster, has better result performance, and most importantly, is very easily scalable. That last part is particularly useful because it allows very large models to be trained, such as OpenAI’s GPT-3, 3.5, and 4.
However, that scalability comes at a price. To understand why, we have to go into the hardware that runs these models.
Computers today are made of a variety of different components, but two are particularly important: CPUs (Central Processing Units) and GPUs (Graphics Processing Units).
CPUs are the main processors used to execute instructions (i.e. run programs) on a computer. The type of CPUs used today have been in use since the 1950s. CPUs are designed to execute tasks sequentially, as often the next “step” in a program requires the output from the previous one. Newer CPUs can do this extremely quickly, and take advantage of having multiple “cores” (which more or less are just subdivisions of the processor) to improve speed even more. The base model M3 chip in the 2023 Macbook Pro has 8 CPU cores, for example.
By contrast, GPUs are much newer and execute instructions differently than CPUs. Originally designed to render visual graphics and accelerate image processing for programs like video games, they don’t work the same way that CPUs do: Instead of performing sequential (serial) tasks incredibly quickly, they perform simultaneous (parallel) tasks with blazing speed. Graphics need this because, well, there are lots of pixels on screens today. A 120Hz 4k monitor displays about a billion pixels every second! But less widely known is that GPUs are particularly useful for Machine Learning. The reason is that the process of training and running models doesn’t require communication between tasks (i.e. one calculation depending on the previous one), so all those calculations can be run at the same time, in parallel.
In Machine Learning, smaller models can be trained on a single computer, because there aren’t as many parallel processes. However, with larger models, no single GPU is powerful enough to train a model with any useful speed.
Most machine learning isn’t actually done on a single computer, but remotely via the cloud, with the hardware running in data centers (because it’s usually cheaper to pay for equipment on a per-minute basis than buying all the equipment outright). Because of this, it’s not actually all that important for GPUs to look or function like the kind you may see in gaming computers. In a data center, you’ll see aisle after aisle of thousands of GPUs right next to each other, instead of each being in their own dedicated computers. Unconstrained by size requirements, GPUs designed for data centers have physically grown larger and larger, so that each has a maximum number of transistors (generally, more transistors = more power). However, they can only get so big and so powerful, as the machines that make microchips have a maximum size of silicon wafers that can be fed inside and minimum size of transistors they can make. As a result, per-GPU performance is limited.
To compensate for this, NVIDIA, the world’s leading chipmaker, has created two important technologies: NVLink and NVSwitch. NVLink is a technology that allows GPUs to communicate much faster with CPUs i.e. allows a GPU to receive instructions really fast). However, it’s not the real star of the show. NVSwitch, introduced in 2018, is a technology that allows GPUs to directly communicate with each other. Think of it as a really fancy internet cable. This means that instead of having 30 GPUs running separately, they can run as if they were one mega-chip. This key technology has enabled programmers to train gigantic ML models.