You're going to have to manage that. One thing people are starting to do is provide "search" or "context gathering" tools (subagents/agent-as-tool) to keep their context clean and only pull in actually relevant resources from the repo.
Anthropic had a good post that included this very issue, and while they demonstrated it with MCP servers on average 20k tokens each, the same idea and alleviation work generally.
Even before you get to this, you can provide a list or table in your instruction files for where to find important files, types, subsystems. Give them a table of contents instead of the whole book, they can (mostly) go find the relevant content