Happy to answer any questions or take suggestions on how I can improve it!
My site, https://www.emergentmind.com, is similar, though I'm two years in :)
I've found Gemini 2.5 Flash is the best model in terms of speed/cost/quality. Pro is great as well, but probably not necessary for most chat-with-paper functionality.
I'll add too that building an AI layer on top of arXiv is a deep, deep rabbit hole depending on how far you want to take the project. Drop me a note if you want to chat more about my experience with it.
Regardless, thanks for sharing this!
Emergent Mind, my tool, has been in the works for over two years. If that's the interface you're referring to, thank you.
Asxiv, what this post is about, was built in a day by the OP.
EDIT: Another thought: maybe the output could also support markdown/latex like chatgpt.
it would be good if you made some sort of protection against these techniques. I think feeding images of pages instead of the page code itself would be beneficial.