I've been experimenting heavily with large-context multimodal LLMs (like Gemini 3 Pro) for coding tasks. I noticed that feeding raw text files consumes a massive amount of tokens and often clutters the context window.
Inspired by recent research (like the DeepSeek-OCR paper) suggesting visual encoders can be more efficient than text tokenizers for structured data, I built pixrep.
It’s a CLI tool that converts your codebase into a structured, syntax-highlighted PDF hierarchy.
Key features: - Token Efficiency: In my local benchmark, a repo requiring ~31k text tokens was compressed to ~19k tokens when fed as a visual PDF (using the `onepdf` mode). - Semantic Minimap: It uses Python's `ast` (and regex for other langs) to generate a UML/call-graph minimap at the top of each file. - Linter Heatmap: It can run `ruff` or `eslint` and overlay a heatmap on the PDF to warn the LLM about risky lines visually. - OnePDF Mode: Packs the core code into a single, ASCII-optimized PDF file for single-shot uploading.
It's written in Python and uses ReportLab for PDF generation.
I'd love to hear your thoughts on "Visual RAG" or any feedback on the implementation!