By delegating to sub agents (eg for brainstorming or review), you can break out of local maxima while not using quite as many more tokens.
Additionally, when doing any sort of complex task, I do research -> plan -> implement -> review, clearing context after each stage. In that case, would I want to make 7x research docs, 7x plans, etc.? probably not. Instead, a more prudent use of tokens might be to have Claude do research+planning, and have Codex do a review of that plan prior to implementation.
And how does one compare the results in a way that is easy to parse? 7 models producing 1 PR each is one way, but it doesn’t feel very easy to compare such.