Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
84 points
6 hours ago
| 6 comments
| github.com
| HN
yu3zhou4
5 hours ago
[-]
README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code
reply
juancn
4 hours ago
[-]
Looks interesting, it reminds me of the first llama.cpp, but better documented.
reply
nazgulsenpai
5 hours ago
[-]
I love the documentation formatted in lessons. I can't wait to read through it.
reply
dwa3592
3 hours ago
[-]
Very nice job on read me.

>>Physically, LLM is a file which contains a lot of float numbers.

aka atoms of the LLM.

reply
cyanydeez
3 hours ago
[-]
the universe is just atomic if statments
reply
cookiengineer
3 hours ago
[-]
Wanted to add that the author has an amazing blog with lots of interesting papers: https://jedrzej.maczan.pl/
reply
einpoklum
3 hours ago
[-]
It seems the author believes checking the return values of CUDA API calls is not "tiny" enough :-(
reply