Show HN: I built a deep learning engine from scratch in Python
29 points
2 days ago
| 1 comment
| github.com
| HN
I’ve spent the last few months building a deep learning engine completely from scratch in Python (using only math and random).

What started as a basic linear algebra calculator project grew into a symbolic tensor system with autodiff, custom matrix ops, attention mechanisms, LayerNorm, GELU, and even a text generation demo trained on the Brown corpus.

I'm still an undergrad, so my main goal is to deeply understand how deep learning actually works under the hood - gradients, attention, backpropagation, optimizers - by building it step-by-step with full visibility into everything, and without relying on big frameworks or libraries.

It’s not fast or production-ready, but that’s not the point. As of now, it’s more so aimed at exploration and understanding. I mainly wanted to explore how deep learning works by building it through first principles.

It’s still a work in progress (lots to learn and improve in terms of structure, docs, and performance), but I figured it was worth sharing.

I’d love any feedback, questions, ideas, or even just thoughts about what you’d add, change, or do differently. Thanks for reading!

benliong78
1 day ago
[-]
That’s fantastic. In your opinion what are some of the best books / resources you use to have this kind of understanding of LLM and the underlying deep learning algorithm?
reply
gmwhitebox_dev
15 hours ago
[-]
Thank you!

To approach it from first principles, I didn’t really follow any one specific tutorial or course. I really just started from the bottom up. Starting with my Tensor module, I familiarized myself with the math operations, backprop, computation graphs, autodiff… etc. That was the hardest part honestly, but it set the foundation for everything else in the system.

Once I had that working, the rest (activations, loss functions, optimizers, layers, transformers) started to make a lot more sense. Writing it all myself gave me full control, removed the abstraction, and helped me to really internalize how each part of the system fits together, and why it works - not just how.

Here’s some resources I found helpful, I also have some links to additional resources in the project readme:

- Deep Learning Foundations and concepts: Book by Christopher Bishop. Mainly covers theory and statistical ML.

- Natural Language Processing with Transformers: Book by Lewis Tunstall, Leandro von Werra, and Thomas Wolf (Hugging Face). Good for understanding real world NLP/ LLMs.

- UvA Deep Learning Tutorials: Website for building and understanding DL modules, has a lot of project-based notebooks (tutorial 7 on GNNs was very helpful).

- Deep Learning: Book by Goodfellow, Bengio & Courville. Covers a lot of foundational theory and math.

- Stanford’s CS231 Course: This is fully available online, with lecture videos and coding walk-throughs. Super helpful for learning about backprop, CNNs, and deep nets, etc.

- The Annotated Transformer: Website by Harvard NLP. This goes over the ‘Attention is All You Need’ paper in an understandable way with example code

- codingvidya.com: A good aggregator for finding ML books and learning resources.

- Andrej Karpathy’s YouTube channel, and his micrograd and tinygrad Repos are super helpful resources, especially when learning by building from scratch.

I’m still learning as I go, but I’m happy to share what’s worked for me so far! Hope this helps!

reply