FilterHN

High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

14 points

2 days ago

| 1 comment

| HN

2 days ago

[-]

Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?