FilterHN

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

84 points

by yu3zhou4

6 hours ago

| past

| 6 comments

| github.com

| HN

▲

yu3zhou4

5 hours ago

[-]

README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code

▲

juancn

4 hours ago

[-]

Looks interesting, it reminds me of the first llama.cpp, but better documented.

▲

nazgulsenpai

5 hours ago

[-]

I love the documentation formatted in lessons. I can't wait to read through it.

▲

dwa3592

3 hours ago

[-]

Very nice job on read me.

>>Physically, LLM is a file which contains a lot of float numbers.

aka atoms of the LLM.

▲

cyanydeez

3 hours ago

[-]

the universe is just atomic if statments

▲

cookiengineer

3 hours ago

[-]

Wanted to add that the author has an amazing blog with lots of interesting papers: https://jedrzej.maczan.pl/

▲

einpoklum

3 hours ago

[-]

It seems the author believes checking the return values of CUDA API calls is not "tiny" enough :-(