This February I decided to take a month off from Fiber in order to catch up with the AI wave. This is a log of what I read and watched, with notes that might help other engineers doing the same.

I didn't follow any existing curriculum. I knew that I wanted to invest one month and cover as much as I could across the entire LLM pipeline, from pre-training to optimization. [1] I ended up spending my time roughly like so:

Week 1: Pre-training
Karpathy's lectures, 3Blue1Brown, GPT papers
Week 2: Post-training
RLHF, PPO vs. DPO, context distillation
Week 3: Scaling
Llama papers, scaling laws, CUDA, matmul
Week 4: Kitchen sink
DeepSeek and everything else I could fit

Background

It might help to start with what I knew about AI before February.

From the top: my first exposure to AI was back in High School, when I was part of a study group for AIMA. We jumped around parts of the book for a semester and covered a bunch of search algorithms () and some of neural networks. We even saw the basics of RL, IIRC.

The summer after freshman year of college, I took Andrew Ng's Coursera course on Deep Learning. The course was just starting to become famous at the time. I then watched some of Ng's CS229 lectures (this old version here). These two courses gave me a good introduction to both traditional machine learning (eg. SVMs) and the hot-new-thing of the time, deep learning.

That summer I also implemented a minimalistic library for backproping in C++, which I recently rediscovered: https://github.com/felipap/nndl-cpp. (I'm not sure it works?)

Later in college, I took a proper Intro to AI course, which focused on a lot of NLP. I didn't see much appeal in NLP back then. I was underwhelmed by what language models seemed capable of doing. And I would've never expected "small" modifications on those models to start sounding intelligent.

In 2018, I discovered Kaggle and got hooked on predictive analytics. I spent a lot of time playing with XGBoost and doing feature engineering with pandas. It felt magical to train models to find patterns in data and learn to predict the future

I graduated college in May 2020 and decided to go work with AI. I started a little data science shop to sell predictive solutions to retailers, usually e-commerce companies trying to understand which of their customers would churn. I ran that company for 9 months, and learned a lot about the best practices and traps of using machine learning in production.

These experiences helped me follow the conversation around AI these past couple of years, but my understanding was still superficial. I felt the strongest need to catch up with everything I had missed since 2020.

Week 1: Pre-training

I kicked off the first week by watching Karpathy's video "Deep Dive into LLMs like ChatGPT", which had just come out a couple days earlier. This is a longer, updated version of another introduction he did in late 2023, which I decided to watch second.

Background

Week 1: Pre-training

Zero to Hero

Attention + GPT papers

End of week one

Week 2: Post-training

Other helpful links

Fine-tuning

End of week two

Week 3: Scaling

LLaMAs 🦙

Scaling laws

Mistral

Llama 3 → Infra → CUDA

End of week 3

Week 4: Kitchen sink

DeepSeek

Everything else

Final thoughts

Notes