I think the intuition that lots of people jumped to early about how "specs are the new code" was always correct, but at the same time it was absolutely nuts to think that specs can be represented in good ways with natural language and bullet-lists in markdown. We need chain-of-spec that's leveraging something semi-formal and then iterating on that representation, probably with feedback from other layers. Natural-language provides constraints, guess-and-check code generation is sort at the implementation level, but neither are actually the specification which is the heart of the issue. A perfect intermediate language will probably end up being something pretty familiar that leverages and/or combines familiar formal methods from model-checkers, logic, games, discrete simulations, graphs, UML, etc. Why? It's very hard to beat this stuff for compression, and this is what all the "context compaction" stuff is really groping towards anyway. See also the wisdom about "programming is theory building" and so on.
I think if/when something like that starts getting really useful you probably won't hear much about it, and there won't be a lot of talk about the success of hybrid-systems and LLMs+symbolics. Industry giants would have a huge vested interest in keeping the useful intermediate representation/languages a secret-sauce. Why? Well, they can pretend they are still doing something semi-magical with scale and sufficiently deep chain-of-thought and bill for extra tokens. That would tend to preserve the appearance of a big-data and big-computing moat for training and inference even if it is gradually drying up.
https://www.reddit.com/r/ChatGPT/comments/14sqcg8/anyone_els...
If you compare the outputs of a CoT input vs a control input, the outputs will have the reasoning step either way for the current generation of models.
Afaik PaLM (Google's OG big models) tried this trick, but it didn't work for them. I think it's because PaL used descriptive inline comments + meaningful variable names. Compare the following:
```python
# calculate the remaining apples
apples_left = apples_bought - apples_eaten
```
vs.
```python
x = y - z
```
We have ablations in https://arxiv.org/abs/2211.10435 showing that both are indeed useful (see "Crafting prompts for PAL").
See "Programmatic Tool Calling"
And there was an AI productivity startup called Lutra AI doing this, although they've since pivoted to some kind of MCP infra thing: https://lutra.ai/