FilterHN
new
ask
show
jobs
submit
FilterHN
show menu
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
14 points
by
jchandra
2 days ago
|
past
| 1 comment
|
jchandra.com
|
HN
▲
vivahir215
2 days ago
[-]
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
reply