First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.
Then, schemas sound great, until you run into DTD, XSD, and RelaxNG. Relax only exists because XSD is pretty much incomprehensible.
Then let’s talk about entity escaping and CDATA. And how you break entire parsers because CDATA is a separate incantation on the DOM.
And in practice, XML is always over engineered. It’s the AbstractFactoryProxyBuilder of data formats. SOAP and WSDL are great examples of this, vs looking at a JSON response and simply understanding what it is.
I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.
OTOH namespaces, XSD, XSLT were great, modulo the noisy tags. XSLT was the first purely functional language that enjoyed mass adoption in the industry. (It was also homoiconic, like Lisp, amenable to metaprogramming.) Namespaces were a lifesaver when multiple XML documents from different sources had to be combined. XPath was also quite nice for querying.
XML is noisy because of the closing tags, but it also guarantees a level of integrity, and LZ-type compressors, even gzip, are excellent at compacting repeated strings.
Importantly, XML is a relatively human-friendly format. It has comments, requires no quoting, no commas between list items, etc.
Complexity killed XML. JSON was stupid simple, and thus contained far fewer footguns, which was a very welcome change. It was devised as a serialization format, a bit human-hostile, but mapped ideally to bag-of-named-values structures found in basically any modern language.
Now we see XML tools adopted to JSON: JSONSchema, JSONPath, etc. JSON5 (as used in e.g. VSCode) allows for comments, trailing commas and other creature comforts. With tools like that, and dovetailing tools like Pydantic, XML lost any practical edge over JSON it might ever have.
What's missing is a widespread replacement for XSLT. Could be a fun project.
XSLT was cool. Too bad XSL and Apache-FOP never took off.
I say "the ditt-ka-pow" for The Dumbest Thing That Could Possibly Work (DTTCPW).
The contrast with only JSON is far too simplistic; XML got dropped from places where JSON is uninvolved, like why use a relational database when you can have an XML database??? Or those config files on unix are for the most part still not-XML and not-JSON. Or there's various flavors of markdown which do not give you the semi-mythical semantic web but can be banged out easily enough in vi or whatever and don't require schemas and validation or libraries with far too many security problems and I wouldn't write my documentation (these days) using S-expressions anyhow.
This being said there probably are places where something that validates strictly is optimal, maybe financial transactions (EDIFACT and XML are different hells, I guess), at least until some cheeky git points out that data can be leaked by encoding with tabs and spaces between the elements. Hopefully your fancy and expensive XML security layer normalizes or removes that whitespace?
Hence why in 2026, I still hang around programming stacks, like Java and .NET, where XML tooling is great, instead of having to fight with YAML format errors, Norway error, or JSON without basic stuff like comments.
While they equal each other in complexity, YAML does not even have namespaces )
I don’t get this argument. There exist streaming APIs with convenient mapping. Yes, there can exist schemas with weird structure, but in practice they are uncommon. I have seen a lot of integration formats in XML, never had the need to parse to DOM first.
But it didn't take long before XML might well be a binary format for all it matters to us humans looking at it, parsing it, dealing with it.
JSON came along and it's simplicity was baked in. Anyone can argue it's not a great format but it forcefully maintains the simplicity that XML lost quite quickly.
Our AOLServer like clone in 2000 used them to great effect in our widget component library.
SOAP was terrible everywhere, not just in Nigeria as OP insinuates. And while the idea of XML sounds good, the tools that developed on top of it were mostly atrocious. Good riddance.
I remember a decade ago seeing job ads that explicitly requested XML skills. The fact that being able to do something with XML was considered a full time job requiring a specialist says everything there is to be said about XML.
Between markdown and HTML, there is no need for XML in that domain anymore either.
Source: https://learn.microsoft.com/en-us/office/open-xml/word/worki...
JSON wasn't even designed for anything. It's literally the necessary and sufficient part of JavaScript that you could pass to an eval() to get a data structure out. It required zero tooling and even third-party module to hit the ground running.
These items make XML deeply tedious and annoying to ingest and manipulate. Plus, some major XML libraries, like lxml in Python, are extremely unintuitive in their implementation of DOM structures and manipulation. If ingesting and manipulating your markup language feels like an endless trudge through a fiery wasteland then don't be surprised when a simpler, more ergonomic alternative wins, even if its feature set is strictly inferior. And that's exactly what happened.
I say this having spent the last 10 years struggling with lxml specifically, and my entire 25 year career dealing with XML in some shape or form. I still routinely throw up my hands in frustration when having to use Python tooling to do what feels like what should be even the most basic XML task.
Though xpath is nice.
Lxml, or more specifically its inspiration ElementTree is specifically not a (W3C) DOM or dom-style API. It was designed for what it called “data-style” XML documents where elements would hold either text or sub-elements but not both, which is why mixed-content interactions are a chore (lxml augments the API by adding more traversal axis but elementtree does not even have that, it’s a literal tree of elements). effbot.org used to have a page explaining its simplified infoset before Fredrik passed and registration lapsed, it can be accessed through archive.org.
That means lxml is, by design, not the right tool to interact with mixed-content documents. But of course the issue is there isn’t really a right tool for that, as to my knowledge nobody has bothered building a fast DOM-style library for Python.
If you approach lxml as what ElementTree was designed as it’s very intuitive: an element is a sequence of sub-elements, with a mapping of attributes. It’s a very straightforward model and works great for data documents, as well as fits great within the langage. But of course that breaks down for mixed content documents as your text nodes get relegated to `tail` attributes (and ElementTree straight up discards comments and PIs, though lxml reverted that).
and often having less bizarre and overly complex features is a feature by itself
XML did some good things for its day, but no, we abandoned it for very good reasons.
dsssl was the scheme based domain specific "document style semantics and and specification language"
the syntax change was in the era of general lisp syntax bashing.
but to xml syntax? really? that was so surreal to me.
The industry abandoned both in favor of JSON and RPC for speed and perceived DX improvements, and because for a period of time everyone was in fact building only against their own servers.
There are plenty of examples over the last two decades of us having to reinvent solutions to the same problems that REST solved way back then though. MCP is the latest iteration of trying to shoehorn schemas and self-documenting APIs into a sea of JSON RPC.
There is virtually zero scenarios where anyone at all ever said "This thing we're using JSON for would be easier if we just used XML".
JSON was the undisputed winner of a competition that never was in a great part because of the vast improvements over DX. I remind you that JSON is the necessary and sufficient subset of JavaScript that allowed to define data, and to parse it all anyone had to do was to pipe it to a very standard and ubiquitous eval(). No tooling, no third-party module, no framework. Nothing. There is no competition at all.
You seem to be arguing that REST lost because if you look around today you will only find RPC. I agree. My point wasn't that REST won. Part of my point, though, was that REST lost and the industry has tried multiple times to bolt don't JSON RPC solutions to the same problems REST already addressed. If you would like to see some of those examples just look up Swagger, Open API, or MCP.
I agree JSON won, and I agree that it was picked based on arguments over DX. I'm not sure where you and k disagree here.
There exist plenty of people actually using REST. It can reduce complexity of SPAs.
> This is not engineering. This is fashion masquerading as technical judgment.
The boring explanation is that AI wrote this. The more interesting theory is that folks are beginning to adopt the writing quirks of AI en masse.
Maybe this is okay if you know your schema beforehand and are willing to write an XSD. My usecase relied on not knowing the schema. Despite my excitement to use a SAX-style parser, I tucked my tail between my legs and switched back to JSONL. Was I missing something?
XML is extensible markup, i.e. it's like HTML that can be applied to tasks outside of representing web pages. It's designed to be written by hand. It has comments! A good use for XML would be declaring a native UI: it's not HTML but it's like HTML.
JSON is a plain text serialization format. It's designed to be generated and consumed by computers whilst being readable by humans.
Neither is a configuration language but both have been abused as one.
Are you sure about that? I've heard XML gurus say the exact opposite.
This is a very good example of why I detest the phrase “use the right tool for the job.” People say this as an appeal to reason, as if there weren't an obvious follow-up question that different people might answer very differently.
This assertion is comically out of touch with reality, particularly when trying to describe JSON as something that is merely "readable by humans". You could not do anything at all with XML without having to employ half a dozen frameworks and tools and modules.
The complexity about XML comes from the many additional languages and tools built on top of it.
Many are too complex and bloated, but JSON has little to nothing comparable, so it's only simple because it doesn't support what XML does.
get me right: Json is superior in many aspects, xml is utterly overengineered.
but xml absolutely was _meant_ for data exchange, machine to machine.
Here's the bullet point from that verbatim:
The design goals for XML are:
XML shall be straightforwardly usable over the Internet.
XML shall support a wide variety of applications.
XML shall be compatible with SGML.
It shall be easy to write programs which process XML documents.
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
XML documents should be human-legible and reasonably clear.
The XML design should be prepared quickly.
The design of XML shall be formal and concise.
XML documents shall be easy to create.
Terseness in XML markup is of minimal importance.
Or heck, even more concisely from the abstract: "The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML."It's always talking about documents. It was a way to serve up marked-up documents that didn't depend on using the specific HTML tag vocabulary. Everything else happened to it later, and was a bad idea.
and relaxng is a human friendly schema syntax that has transformers from and to xsd.
Then if there were any problems in my XML, trying to decipher horrible errors determining what I did wrong.
The docs sucked and where "enterprise grade", the examples sucked (either too complicated or too simple), and the tooling sucked.
I suspect it would be fine now days with LLMs to help, but back when it existed, XML was a huge hassle.
I once worked on a robotics project where a full 50% of the CPU was used for XML serialization and parsing. Made it hard to actually have the robot do anything. XML is violently wordy and parsing strings is expensive.
Also "worse is better". Many developer still prefer to use something that is similar to notepad.exe, instead of actual tools that understand the formats on a deeper level.
Although ironically there are less production-time human mistakes when editing an XML that is properly validated with a XSD than a YAML file, because Norway.
I remember XML proponents back then argued that it allows semantics -- although, it was never clear how a non-human would understand it and process.
The funny thing about namespaces is that the prefix, by the XML docs, should be meaningless -- instead you should look at the URL of the namespace. It's like if we read a doc with snake:front-left-paw, and ask how come does a snake have paws? -- Because it's actually a bear -- see the definition of snake in the URL! It feels like mathematical concepts -- coordinate spaces, numeric spaces with different number 1 and base space vectors -- applied to HTML. It may be useful in rare cases. But few can wrap their heads around it, and right from the start, most tools worked only with exactly named prefixes, and everyone had to follow this way.
JSON does not have very much or very good data types either, but (unlike XML) at least JSON has data types. ASN.1 has more data types (although standard ASN.1 lacks one data type that JSON has (key/value list), ASN.1X includes it), and if DER or another BER-related format is used then all types use the same framing, unlike JSON. One thing JSON lacks is octet string type, so instead you must use hex or base64, and must be converted after it has been read rather than during reading because it is not a proper binary data type.
> The funny thing about namespaces is that the prefix, by the XML docs, should be meaningless -- instead you should look at the URL of the namespace. It's like if we read a doc with snake:front-left-paw, and ask how come does a snake have paws? -- Because it's actually a bear -- see the definition of snake in the URL!
This is true of any format that you can import with your own names though, and since the names might otherwise conflict, it can also be necessary. This issue is not only XML (and JSON does not have namespaces at all, although some application formats that use it try to add them in some ways).
So, for example, <b> and <i> have perfect semantic, while <article> not so much. What does the browser do with an <article>? Or maybe it is there for an indexing engine? I myself have no idea (nor that I investigated that, I admit).
But all that was misunderstood, very much like XML itself.
What tools? Namespaces being defined by their urls is sure not the reason XML is complex, and the tools I remember running into supported it well
I've seen a bunch of times where an API returns invalid XML that has to be manipulated before parsing but never that for JSON.
I think that's the real sell for JSON. A lot of APIs are terrible, and JSON being simpler, terrible JSON beats terrible XML.
I think a simplified Haskell-ish script host (à la Elm) with a smattering of debugging capabilities would have been amazing.
There’s one improvement XML had over JSON; and that’s comments.
The author laments about features and functionality that were largely broken, or implemented in a ways that countered their documentation. There were very few industries that actually wrote good interfaces and ensured documentation matched implementation, but they were nearly always electrical engineers who’d re-trained as software engineers through the early to late 90s.
Generally speaking namespaces were a frequent source of bugs and convoluted codepaths. Schemas, much like WSDL’s or docs, were largely unimplemented or ultimately dropped to allow for faster service changes. They’re from the bygone era of waterfall development, and they’re most definitely not coming back.
Then there’s the insane XML import functionality, or recursive parsing, which even today results in legacy systems being breached.
Then again, I said “author” at the start of this comment, but it’s probably disingenuous to call an LLM an author. This is 2026 equivalent of blogspam, but even HN seems to be falling for it these days.
The AI seems to also be missing one of the most important points; migration to smaller interfaces, more meaningful data models and services that were actually built to be used by engineers - not just a necessary deliverable as part of the original system implementation. API specs in the early 2000’s were a fucking mess of bloated, Rube-Goldbergesque interdependent specs, often ready to return validation errors with no meaningful explanation.
The implementation of XML was such a mess it spawned an an entire ecosystem of tooling to support it; SoapUI, parsers like Jackson and SAX (and later StAX), LINQ to XML, xmlstarlet, Jing, Saxon..
Was some of this hugely effective and useful? Yes. Was it mostly an unhinged level of abstraction, or a resulting implementation by engineers who themselves didn’t understand the overly complex features? The majority of the time.
<aaaa bbbb="bbbb" cccc="cccc"/>
{"bbbb":"bbbb","cccc":"cccc"}
See that the difference is only two characters? Yet XML also has a four-character element name, which JSON lacks. And JSON is packed to the limit, while XML is written naturally and is actually more readable than JSON.> the various XML-based "standards" spawned by enterprise committees are monuments to over-engineering. But the core format (elements, attributes, schemas, namespaces) remains sound. We threw out the mechanism along with its abuses.
It's mostly only arguing for using the basic XML in place of the basic JSON.
I largely agree to that, although I wouldn't consider the schemas among its core, go read the Schema specifications and tell me when you come out.
But I agree that a good part of XML's downfall was due to its enterprise committees: no iteration, and few incentives to make things lean and their specifications simple; a lot of the companies designing them had an interest in making them hard to implement.
XML only has text data (although other kinds can be represented, it isn't very good at doing so), and the structure is named blocks which can have named attributes and plain text inside; and is limited to a single character set (and many uses require this character set to be Unicode).
XML does not require a schema, although it can use one, which is a benefit, and like they say does work better than JSON schema. Some ASN.1 formats (such as DER) can also be used without a schema, although it can also use a schema.
My own nonstandard TER format (for ASN.1 data) does have comments, although the comments are discarded when being converted to DER.
Namespaces are another benefit in XML, that JSON does not have. ASN.1 has OIDs, which have some of this capability, although not as much as XML (although some of my enhancements to ASN.1 improve this a bit). However, there is a problem with using URIs as namespaces which is that the domain name might later be assigned to someone else (ASN.1 uses OIDs which avoids this problem).
My nonstandard ASN1_IDENTIFIED_DATA type allows a ASN.1X data file to declare its own schema, and also has other benefits in some circumstances. (Unlike XML and unlike standard ASN.1, you can declare that it conforms with multiple formats at once, you can declare conformance with something that requires parameters for this declaration, and you can add key/value pairs (identified by OIDs) which are independent of the data according to the format it is declared as.)
(I have other nonstandard types as well, such as a key/value list type (called ASN1_KEY_VALUE_LIST in my implmentation in C).)
XSLT is a benefit with XML as well, although it would also be possible to make a similar thing with other formats (for databases, there is SQL (and Tutorial D); there is not one for ASN.1 as far as I know but I had wanted such a thing, and I have some ideas about it).
The format XML is also messy and complicated (and so is YAML), compared with JSON or DER (although there are many types in DER (and I added several more), the framing is consistent for all of them, and you do not have to use all of the types, and DER is a canonical form which avoids much of the messiness of BER; these things make it simpler than what it might seem to some people).
Any text format (XML, JSON, TER, YAML, etc) will need escaping to properly represent text; binary formats don't, although they have their own advantages and disadvantages as well. As mentioned in the article, there are some binary XML formats as well; it seems to say that EXI requires a schema (which is helpful if you have a schema, although there are sometimes reasons to use the format without a schema; this is also possible with ASN.1, e.g. PER requires a schema but DER does not).
Data of any format is not necessarily fully self-descriptive, because although some parts may be self-described, it cannot describe everything without the documentation. The schema also cannot describe everything (although different schema formats might have different capabilities, they never describe everything).
> When we discarded XML, we lost: ...
As I had mentioned, other formats are capable of this too
> What we gained: Native parsing in JavaScript
If they mean JSON, then, JSON was made from the syntax of JavaScript, although before JSON.parse was added into standard JavaScript they might have used eval and caused many kind of problems with that. Also, if you are using JavaScript then the data model is what JavaScript does, although that is a bit messy. Although JavaScript now has a integer type, it did not have at the time that JSON was made up, so JSON cannot use the integer type.
> I am tired of lobotomized formats like JSON being treated as the default, as the modern choice, as the obviously correct solution. They are none of these things.
I agree and I do not like JSON either, but usually XML is not good either. I would use ASN.1 (although some things do not need structured data at all, in which case ASN.1 is not necessary either).
(Also, XML, JSON, and ASN.1 are all often badly used; even if a format is better does not mean that the schema for the specific application will be good; it can also be badly designed, and in my experience it often is.)
Even with zipped payloads, it's just way unnecessarily chatty without being more readable.
I remember the arguments largely revolving around verbosity and the prevalence of JSON use in browsers.
That doesn't mean bandwidth wasn't a consideration, but I mostly remember hearing devs complain about how verbose or difficult to work with XML was.
I didn't see any.
My main point is that the very purpose of XML is not to transfer data between machines. XML use case is to transfer data between humans and machines.
Look at the schemas. They are all grammatical. DTD is a textbook grammar. Each term has a unique definition. XSD is much more powerful: here a term may change definition depending on the context: 'name' in 'human/name' may be defined differently than 'name' in 'pet/name' or 'ship/name'. But within a single context the definition stays. As far as I know Relax NG is even more powerful and can express even finer distinctions, but I don't know it too well to elaborate.
Machines do not need all that to talk to each other. It is pure overhead. A perfect form to exchange data between machines is a dump of a relational structure in whatever format is convenient, with pretty straightforward metadata about types. But humans cannot author data in the relational form; anything more complex than a toy example will drive a human crazy. Yet humans can produce grammatical sequences in spades. To make it useful for a machine that grammatical drive needs only a formal definition and XML gives you exactly that.
So the use case for XML is to make NOTATIONS. Formal in the sense they will be processed by a machine, but otherwise they can be pretty informal, that is have no DTD or XSD. It is actually a power of XML that I can just start writing it and invent a notation as I go. Later I may want to add formal validation to it, but it is totally optional and manifests as a need only when the notation matures and needs to turn into a product.
What makes one XML a notation and another not a notation? Notations are about forming phrases. For example:
<func name="open">
<data type="int"/>
<args>
<addr mode="c">
<data type="char"/>
</addr>
<data type="int"/>
<varg/>
</args>
</func>
This is a description of a C function, 'open'. Of course, a conventional description is much more compact: int open(char const*, int, ...)
But let's ignore the verbosity for a moment and stay with XML a bit longer. What is grammatical about this form? 'func' has '@name' and contains 'data' and 'args'. 'data' is result type, 'args' are parameters. Either or both can be omitted, resulting in what C calls "void". Either can be 'data' or 'addr'. 'data' is final and has '@type'; addr may be final (point to unknown, 'void') or non-final and may point to 'data', 'func' or another 'addr', as deep as necessary. 'addr' has '@mode' that is a combination of 'c', 'v', 'r' to indicate 'const', 'volatile', 'restrict'. Last child of 'args' may be 'varg', indicating variable parameters.Do you see that these terms are used as words in a mechanically composed phrase? Change a word; omit a word; link words into a tree-like structure? This is the natural form of XML: the result is phrase-like, not data-like. It can, of course, be data-like when necessary but this is not using the strong side of XML. The power of XML comes when items start to interact with each other, like commands in Vim. Another example:
<aaaa>
<bbbb/>
</aaaa>
This would be some data. Now assume I want to describe changes to that data: <aaaa>
<drop>
<bbbb/>
</drop>
<make>
<cccc/>
</make>
</aaaa>
See those 'make' and 'drop'? Is it clear that they can enclose arbitrary parts of the tree? Again, what we do is that we write a phrase: we add a modifier, 'make' or 'drop' and the contents inside it get a different meaning.This only makes sense if XML is composed by hand. For machine-to-machine exchange all this is pure overhead. It is about as convenient as if programs talked to each other via shell commands. It is much more convenient to load a library and use it programmatically than to compose a command-line call.
But all this verbosity? Yes, it is more verbose. This is a no-go for code you write 8 hours a day. But for code that you write occasionally it may be fine. E.g. a build script. An interface specification. A diagram. (It is also perfect for anything that has human-readable text, such as documentation. This use is fine even for a 8-hour workday.) And all these will be compatible. All XML dialects can be processed with the same tools, merged, reconciled, whatever. This is powerful. They require no parsing. Parsing may appear a solved problem, but to build a parser you still must at least describe the grammar for a parser generator and it is not that simple. And all that this description gives you is that the parser will take a short form and convert it into an AST, which is exactly what XML starts with. The rest of the processing is still up to you. With XML you can build the grammar bottom up and experiment with it. Wrote a lot of XML in some grammar and then found a better way? Well, write a script to transform the old XML into the new grammar and continue. The transformer is a part of the common toolset.