I also have my gripes about the way 2 hop is mentioned here. With figure 3 being the canonical example of what I would consider too trivial/misleading (The exact text match of "Eric Watts" being in the question and in the context). It leads to the natural question of how does it do compared to an LLM with a grep tool.
What I would consider more interesting is practical synthesis over such a large context where you can't just string lookup answers. For example maybe dumping all of Intel's x86 manuals into context and then asking an LLM to try to write assembly or something.
The more we can drive towards selective attention over larger and larger sets of "working memory", the better, I think.
I suspect cleverer mechanisms of context injection/pruning/updating would result in effective memory more so than my suspicion increasing the context window forever will do, regardless of what tricks we apply to distil attention over it.
There is probably a lot of low hanging fruit in this area.