16 points
2 hours ago
| 4 comments
| HN
dgellow
1 hour ago
[-]
please at least don't write your HN message with an LLM... or at the very, very minimum, make it look like something a human would want to read...
reply
esafak
1 hour ago
[-]
Please tell us about the chunking algorithm itself. The flowchart says you have three paths: structured, fixed, and hybrid. What happens in each one? Can you access the library from Rust?
reply
OutOfHere
1 hour ago
[-]
That's great if it works, but I have always been extremely skeptical of relying on chunks as a means of retrieving information. Considering chunks misses the surrounding context and nuanced conditions that qualify the chunk's usage. I believe it is better to use the entire document in large context instead, then have the LLM summarize its relevance into an accurate title and blurb. When retrieving, filter and match over the titles and blurbs, then again give the LLM the entire text.
reply
DetroitThrow
1 hour ago
[-]
>Production-ready — 17 versions shipped, 315+ installs

In no universe would this mean production ready. That's just the bot traffic any package would get.

Brutal feedback from someone building RAG systems: nobody wants to use slop, the commit history is 6 initial commits with thousands of LOC with mostly README updates after that.

Also I get junk from a very simple PDF. How did you verify the claims of the capabilities of this project?

reply