FilterHN

Why XML Tags Are So Fundamental to Claude

91 points

by glth

5 hours ago

| past

| 18 comments

| glthr.com

| HN

▲

Lerc

3 minutes ago

[-]

I am unconvinced.

To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.

Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.

Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.

I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.

Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.

While it might be fighting the bitter lesson, I think for explicitly structured data there should be benefits. I'd even go as far to suggest the metadata could handle nesting if it contained dimensions that performed rope operations to keep track of the depth.

If you had such a metadata stream per token there's also the possibility of fine tuning instruction models to only follow instructions with a 'said by user' metadata, and then at inference time filter out that particular metadata signal from all other inputs.

It seems like that would make prompt injection much harder.

▲

RadiozRadioz

39 minutes ago

[-]

> a contrast between Claude’s modern approach [...] XML, a technology dating back to 1998

Are we really at the point where some people see XML as a spooky old technology? The phrasing dotted around this article makes me feel that way. I find this quite strange.

▲

theowaway213456

11 minutes ago

[-]

The evidence suggests that XML was never that popular though for the general audience, you have to admit.

For Web markup, as an industry we tried XHTML (HTML that was strictly XML) for a while, and that didn't stick, and now we have HTML5 which is much more lenient as it doesn't even require closing tags in some cases.

For data exchange, people vastly prefer JSON as an exchange format for its simplicity, or protobuf and friends for their efficiency.

As a configuration format, it has been vastly overtaken by YAML, TOML, and INI, due to their content-forward syntax.

Having said all this I know there are some popular tools that use XML like ClickHouse, Apple's launchd, ROS, etc. but these are relatively niche compared to (e.g.) HTML

▲

coldtea

15 minutes ago

[-]

XML has been "spooky old technology" for over a decade now. It's heyday was something like 2002.

Nobody dares advertise the XML capabilities of their product (which back then everybody did), nobody considers it either hot new thing (like back then) or mature - just obsolete enterprise shit.

It's about as popular now as J2EE, except to people that think "10 years ago" means 1999.

▲

intrasight

22 minutes ago

[-]

Yup. Kids these days...

▲

ryanschneider

2 minutes ago

[-]

Wait am I in the minority talking to Claude in markdown? I just assumed everyone does that, or at least all developers. It seems to work really well.

▲

strongpigeon

5 minutes ago

[-]

This seems like an actual good use for XML. Using it as a serialization format always rubbed me the wrong way (it’s super verbose, the named closing tag are unnecessary grammar-wise, the attribute-or-child question etc.) But to markup and structure LLM prompts and response it feels better than markdown (which doesn’t stream that well)

▲

kid64

2 hours ago

[-]

The thesis here seems to be that delimiters provide important context for Claude, and for that putpose we should use XML.

The article even references English's built-in delimiter, the quotation mark, which is reprented as a token for Claude, part of its training data.

So are we sure the lesson isn't simply to leverage delimiters, such as quotation marks, in prompts, period? The article doesn't identify any way in which XML is superior to quotation marks in scenarios requiring the type of disambiguation quotation marks provide.

Rather, the example XML tags shown seem to be serving as a shorthand for notating sections of the prompt ("treat this part of the prompt in this particular way"). That's useful, but seems to be addressing concerns that are separate from those contemplated by the author.

▲

jinushaun

1 hour ago

[-]

Except quotation marks look like regular text. I regularly use quotes in prompts for, ya know, quotes.

▲

wolttam

1 hour ago

[-]

The GP isn't suggesting to literally use quotes as the delimiter when prompting LLMs. They're pointing out that we humans already use delimiters in our natural language (quotation marks to delimit quotes). They're suggesting that delimiters of any kind may be helpful in the context of LLM prompting, which to me makes intuitive sense. That Claude is using XML is merely a convention.

▲

TutleCpt

12 minutes ago

[-]

I think this article is 100% relevant to you today. Anthropic put out a training video, a number of months ago saying that XML should be highly encouraged for prompts. See https://m.youtube.com/watch?v=ysPbXH0LpIE

▲

michaelcampbell

2 hours ago

[-]

Total tangent, but what vagary of HTML (or the Brave Browser, which I'm using here) causes words to be split in very odd places? The "inspect" devtools certainly didn't show anything unusual to me. (Edit: Chrome, MS Edge, and Firefox do the same thing. I also notice they're all links; wonder if that has something to do with it.)

https://i.imgur.com/HGa0i3m.png

▲

werdnapk

2 hours ago

[-]

CSS on the <a> tags:

word-break: break-all;

▲

knallfrosch

59 minutes ago

[-]

It's an error in the site's CSS. CSS has way better methods, like splitting words correctly depending on the language and hyphenating it.

Although I can never remember the correct incantation, should be easy for LLMs.

▲

fancy_pantser

2 hours ago

[-]

CSS word-break property

▲

rosstex

34 minutes ago

[-]

Ask Claude?

▲

apwheele

2 hours ago

[-]

I think XML is good to know for prompting (similar to how <think></think> was popular for outputs, you can do that for other sections). But I have had much better experience just writing JSON and using line breaks, colons, etc. to demarcate sections.

E.g. instead of

    <examples>
      <ex1>
        <input>....</input>
        <output>.....</output>
      </ex1>
      <ex2>....</ex2>
      ...
    </examples>
    <instructions>....</instructions>
    <input>{actual input}</input>

Just doing something like:

    ...instructions...
    input: ....
    output: {..json here}
    ...maybe further instructions...
    input: {actual input}

Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.

N of 1 anecdote anyway for one use case.

▲

galaxyLogic

10 minutes ago

[-]

XML helps because it a) Lets you to describe structures b) Make a clear context-change which make it clear you are not "talking in XML" you are "talking about XML".

I assume you are right too, JSON is a less verbose format which allows you to express any structure you can express in XML, and should be as easy for AI to parse. Although that probably depends on the training data too.

I recently asked AI why .md files are so prevalent with agentic AI and the answer is ... because .md files also express structure, like headers and lists.

Again, depends on what the AI has been trained on.

I would go with JSON, or some version of it which would also allow comments.

▲

ekjhgkejhgk

2 hours ago

[-]

Could you clarify, do those tags need to be tags which exist and we need to lear about them and how to use them? Or we can put inside them whatever we want and just by virtue of being tags, Claude understands them in a special way?

▲

ezfe

2 hours ago

[-]

They probably don’t need to be specific values. The model is fine tuned to see the tags as signals and then interprets them

▲

galaxyLogic

4 minutes ago

[-]

If it walks like a duck ... AI thinks it is something like a duck.

▲

apwheele

2 hours ago

[-]

All the major foundation models will understand them implicitly, so it was popular to use <think>, but you could also use <reason> or <thinkhard> and the model would still go through the same process.

▲

marxisttemp

17 minutes ago

[-]

XML is much more readable than JSON, especially if your data has characters that are meaningful JSON syntax

▲

galaxyLogic

7 minutes ago

[-]

I think readability is in the eye of the reader. JSON is less verbose, no ending tags everywhere, which I think makes it more readable than XML.

But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

▲

imglorp

3 hours ago

[-]

A very minor porcelain on some of the agent input UX could present this structure for you. Instead of a single chat window, have four: task, context, constraints, output format.

And while we're at it, instead of wall-of-text, I also feel like outputs could be structured at least into thinking and content, maybe other sections.

▲

ixxie

35 minutes ago

[-]

How about other frontier models, and smaller models?

▲

alansaber

2 hours ago

[-]

Sounds like as 1. XML is the cleanest/best quality training data (especially compared to PDF/HTML) 2. It follows that a user providing semantic tags in XML format can get best training alignment (hence best results). Shame they haven't quantified this assertion here.

▲

lsc4719

1 hour ago

[-]

Makes sense

▲

twoodfin

2 hours ago

[-]

This isn’t surprising: XML’s core purpose was to simplify SGML for a wider breadth of applications on the web.

HTML also descended from SGML, and it’s hard to imagine a more deeply grooved structure in these models, given their training data.

So if you want to annotate text with semantics in a way models will understand…

▲

tingletech

2 hours ago

[-]

XML and HTML are SGMLs

▲

ChrisSD

1 hour ago

[-]

HTML diverged from SGML pretty early on. Various standards over the years have attempted to specify it as an application of SGML but in practice almost nobody properly conformed to those standards. HTML5 gave up the pretence entirely.

▲

TheJoeMan

4 hours ago

[-]

That first image, “Structure Prompts with XML”, just screams AI-written. The bullet lists don’t line up, the numbering starts at (2), random bolding. Why would anyone trust hallucinated documentation for prompting? At least with AI-generated software documentation, the context is the code itself, being regurgitated into bulleted english. But for instructions on using the LLM itself, it seems pretty lazy to not hand-type the preferred usage and human-learned tips.

▲

rafram

4 hours ago

[-]

No, it’s two screenshots from Anthropic documentation, stitched together: https://platform.claude.com/docs/en/build-with-claude/prompt...

The post even links to that page, although there’s a typo in the link.

▲

glth

3 hours ago

[-]

Author here: I have just fixed the typo. Thank you.

And yes, these are screenshots from Anthropic’s documentation.

▲

dmd

3 hours ago

[-]

They're not even stitched together ; there's just no padding between the two images.

▲

Calavar

4 hours ago

[-]

It looks like a screenshot from the Claude desktop app, so I don't think the author is trying to disguise the AI origin of the marerial

▲

croes

2 hours ago

[-]

You just hallucinated the content is AI generated.

▲

michaelcampbell

2 hours ago

[-]

"This is AI" is the new "This is 'shopped, I can tell by the pixels."

▲

tingletech

2 hours ago

[-]

I can tell by the em dashes

▲

doctorpangloss

1 hour ago

[-]

There must be an OpenClaw YouTube video helping people post to hacker news, or something, because the front page is overrun with AI slop like this article, that makes no sense anyway. The author literally has no idea what any of this stuff means.

▲

wolttam

4 hours ago

[-]

Anthropic’s tool calling was exposed as XML tags at the beginning, before they introduced the JSON API. I expect they’re still templating those tool calls into XML before passing to the model’s context

▲

pocketarc

3 hours ago

[-]

Yeah like I remember prior to reasoning models, their guidance was to use <think> tags to give models space for reasoning prior to an answer (incidentally, also the reason I didn't quite understand the fuss with reasoning models at first). It's always been XML with Anthropic.

▲

wolttam

3 hours ago

[-]

Exactly the same story here. I still use a tool that just asks them to use <think> instead of enabling native reasoning support, which has worked well back to Sonnet 3.0 (their first model with 'native' reasoning support was Sonnet 3.7)

▲

Zebfross

3 hours ago

[-]

I thought the goal was minimal instruction to let Claude determine the best way to solve the problem. Not adding this to my workflow anytime soon.

▲

TheLNL

2 hours ago

[-]

It is not for the end user, it is more for things like wrappers and automation scripts.

Nobody expects the end user to prompt the AI using a structured language like xml

▲

CactusBlue

1 hour ago

[-]

I think the main advantage of the XML here is that the model is expected to have a matching end tag that is balanced, which reduces the likelihood of malformed outputs.

▲

esafak

3 hours ago

[-]

This sounds like something for harnesses, not end users. Are they really expecting us to format prompts as XML??

▲

Eric_WVGG

1 hour ago

[-]

bemused by how competently designed this is, compared to enshittified blogs and whatnot

To be realistic, this design needs more weirdly sexual etsy garbage, “one weird tip,” and “punch the monkey”