FilterHN

Show HN: VLM Inference Engine in Rust

1 points

by Beefin

1 hour ago

| past

| 1 comment

| mixpeek.com

| HN

▲

storystarling

21 minutes ago

[-]

What hardware are you running this on to get 2-3s latency? A 14GB model plus KV cache seems like it would require a 24GB card (3090/4090) to avoid swapping. I've found that once you spill over to system RAM on consumer gear the performance usually falls off a cliff.