FilterHN

The ways LLMs fail (and the techniques you have to use to account for that) have more in common than physical engineering disciplines than software engineering does!

▲

cadamsdotcom

51 minutes ago

[-]

Yep. Consider woodworking - the wood you use might warp over time, or maybe part of it ends up in the sun or the thing you’ll make gets partly exposed to water.

Can you make a thing that’ll serve its purpose and look good for years under those constraints? A professional carpenter can.

We have it easy in software.

▲

mpalmer

50 minutes ago

[-]

Physical engineers might scoff good-naturedly at an attempt by project managers to refer to work scheduling as "logistics engineering".

But they really shouldn't because obviously scheduling and logistics is difficult, involving a lot of uncertainty and tolerances.

▲

timr

45 minutes ago

[-]

Uncertainty and tolerance implies that you have a predictable distribution in the first place.

Engineers are not just dealing with a world of total chaos, observing the output of the chaos, and cargo culting incantations that seem to work for right now [1]…oh wait nevermind we’re doing a different thing today! Have you tried paying for a different tool, because all of the real engineers are using Qwghlm v5 Dystopic now?

There’s actually real engineering going on in the training and refining of these models, but I personally wouldn’t include the prompting fad of the week to fall under that umbrella.

[1] I hesitate to write that sentence because there was a period where, say, bridges and buildings were constructed in this manner. They fell down a lot, and eventually we made predictable, consistent theoretical models that guide actual engineering, as it is practiced today. Will LLM stuff eventually get there? Maybe! But right now we’re still plainly in the phase of trying random shit and seeing what falls down.

▲

scuff3d

57 minutes ago

[-]

Lol. This has to be a troll. No way someone seriously wrote this and meant it.

▲

simonw

42 minutes ago

[-]

Little bit of both.

▲

voidhorse

27 minutes ago

[-]

I completely agree that much of software engineering is not engineering, and building systems around LLMs is no better in this sense.

When the central component of your system is a black box that you cannot reason about, have no theory around, and have essentially no control over (a model update can completely change your system behavior) engineering is basically impossible from the start.

Practices like using autoscorers to try and constrain behaviors helps, but this doesn't make the enterprise any more engineering because of the black box problem. Traditional engineering disciplines are able to call themselves engineering only because they are built on sophisticated physical theories that give them a precise understanding of the behaviors of materials under specified conditions. No such precision is possible with LLMs, as far as I have seen.

The determinism of traditional computing isn't really relevant here and targets the wrong logical level. We engineer systems, not programs.

▲

timr

1 hour ago

[-]

lol. who is “we”? I honestly can’t tell if you’re being serious.

I’m going to start a second career in lottery “engineering”, since that’s a stochastic process too.

▲

simonw

43 minutes ago

[-]

The "we" was a tongue-in-cheek reference to the "we" in the original question:

> Are we still calling this things engineering?

▲

timr

37 minutes ago

[-]

Yeah, I understand the symmetry, but…it begs the question.

▲

j45

41 minutes ago

[-]

Engineering how to engineer things might be engineering in some ways.

▲

skeeter2020

3 hours ago

[-]

"professionally trained & legally responsible for the results" is definitely not the same thing as what we used to just call "good at googling".

▲

aeve890

2 hours ago

[-]

I'd say this shit is even worse that "good at googling". Literal incantation for stochastic machines is like just two notches above checking the horoscope.

▲

calebkaiser

2 hours ago

[-]

Based on the comments, I expected this to be slop listing a bunch of random prompt snippets from the author's personal collection.

I'm honestly a bit confused at the negativity here. The article is incredibly benign and reasonable. Maybe a bit surface level and not incredibly in depth, but at a glance, it gives fair and generally accurate summaries of the actual mechanisms behind inference. The examples it gives for "context engineering patterns" are actual systems that you'd need to implement (RAG, structured output, tool calling, etc.), not just a random prompt, and they're all subject to pretty thorough investigation from the research community.

The article even echoes your sentiments about "prompt engineering," down to the use of the word "incantation". From the piece:

> This was the birth of so-called "prompt engineering", though in practice there was often far less "engineering" than trial-and-error guesswork. This could often feel closer to uttering mystical incantations and hoping for magic to happen, rather than the deliberate construction and rigorous application of systems thinking that epitomises true engineering.

▲

timr

47 minutes ago

[-]

There’s nothing particularly wrong with the article - it’s a superficial summary of stuff that has historically happened in the world of LLM context windows.

The problem is - and it’s a problem common to AI right now - you can’t generalize anything from it. The next thing that drives LLMs forward could be an extension of what you read about here, or it could be a totally random other thing. There are a million monkeys tapping on keyboards, and the hope is that someone taps out Shakespeare’s brain.

▲

sgt101

3 hours ago

[-]

Why would I believe that any of this works? This is just some blokes idea of what people should do.

There is no evidence offered. No attempt to measure the benefits.

▲

calebkaiser

1 hour ago

[-]

Most of the inference techniques (what the author calls context engineering design patterns) listed here originally came from the research community, and there are tons of benchmarks measuring their effectiveness, as well as a great deal of research behind what is happening mechanistically with each.

As the author points out, many of the patterns are fundamentally about in-context learning, and this in particular has been subject to a ton of research from the mechanistic interpretability crew. If you're curious, I think this line of research is fascinating: https://transformer-circuits.pub/2022/in-context-learning-an...

▲

elteto

5 hours ago

[-]

Are there any open source examples of good context engineering or agent systems?

▲

calebkaiser

1 hour ago

[-]

Any of the "design patterns" listed in the article will have a ton of popular open source implementations. For structured generation, I think outlines is a particularly cool library, especially if you want to poke around at how constrained decoding works under the hood: https://github.com/dottxt-ai/outlines

▲

alecco

1 hour ago

[-]

This looks AI generated slop.

▲

grigio

4 hours ago

[-]

I'd like a RSS feed of this blog..

▲

vladsanchez

2 hours ago

[-]

It's available, https://buttondown.com/chrisloy/rss but it's not in sync with the blog, just a single 2024 entry found. :shrug:

▲

voidhorse

5 hours ago

[-]

There is nothing precise about crafting prompts and context—it's just that, a craft. Even if you do the right thing and check some fuzzy boundary conditions using autoscorers, the model can still change out from beneath you at any point and totally alter the behavior of your system. There is no formal language here. After all, mathematics exists because natural language is notoriously imprecise.

The article has some good practical tips and it's not on the author but man I really wish we'd stop abusing the term "engineering" in a desperate attempt to stroke our own egos and or convince people to give us money. It's pathetic. Coming up with good inputs to LLMs is more art than science and it's a craft. Call a spade a spade.

▲

calebkaiser

1 hour ago

[-]

I think it's fair to question the use of the term "engineering" throughout a lot of the software industry. But to be fair to the author, his focus in the piece is on design patterns that require what we'd commonly call software engineering to implement.

For example, his first listed design pattern is RAG. To implement such a system from scratch, you'd need to construct a data layer (commonly a vector database), retrieval logic, etc.

In fact I think the author largely agrees with you re: crafting prompts. He has a whole section admonishing "prompt engineering" as magical incantations, which he differentiates from his focus here (software which needs to be built around an LLM).

I understand the general uneasiness around using "engineering" when discussing a stochastic model, but I think it's worth pointing out that there is a lot of engineering work required to build the software systems around these models. Writing software to parse context-free grammars into masks to be applied at inference, for example, is as much "engineering" as any other common software engineering project.

▲

amonks

32 minutes ago

[-]

long shot, apropos of nothing, just recognized your name:

If you are the cincinnatian poet Caleb Kaiser, we went to college together and I’d love to catch up. Email in profile.

If you aren’t, disregard this. Sorry to derail the thread.

▲

qrios

4 hours ago

[-]

I agree with you one hundred percent.

But: Interestingly, the behavior of LLMs in different contexts is also the subject of scientific research.

▲

chrisweekly

3 hours ago

[-]

"Context crafting", ok, sure. I think a lot of expert researchers (like simonw) would agree.

▲

satisfice

4 hours ago

[-]

My thoughts exactly. The author is saying we should think strategically about the use of context. Sure. Yes. But for that to qualify as engineering we need solid theory about how context works.

We don’t have that, yet. For instance experiments show that not all parts of the context window are equally well attended. Imagine trying to engineer a bridge when no one really knows how strong steel is.

▲

skeeter2020

3 hours ago

[-]

or how wide the river is year round