Q8 KV cache lets a 30B model fit 100K context on a 24 GB RTX 5090
2 points
1 hour ago
| 0 comments
| buraak.com
| HN
No one has commented on this post.