FilterHN

Relax NG is a schema language for XML (2014)

34 points

by Frotag

8 hours ago

| past

| 2 comments

| relaxng.org

| HN

▲

jitl

6 hours ago

[-]

the compact non-xml syntax is neat: https://relaxng.org/compact-tutorial-20030326.html#id2814005

it reminds me of TypeScript.

As for XML itself, it seems like it was a huge buzzword fad in the late 90s/early 2000s, but it must not have lived up to the hype or we’d actually be using it today instead of JSON and Protobuf. I got to computer programming around when the web gave up on XHTML, so i’m not really sure what to make of the XML cultural moment. The vibe i get is of focus on precise data semantics for its own sake, very Cathedral, effort that didn’t end up delivering benefit to humans. What do you think?

▲

brainwipe

6 hours ago

[-]

XML was much better than what was there before - which was a different standard for every endpoint and often no structure at all.

XML allowed you to use tools to build automatically. We have other better tools now but back then it was like magic. Download an XSD (the more common option to Relax NG but not superior IMO), point a pre-built tool for it and it'd build strongly typed model classes and validation checkers. Then, when you called the service, chances are it would work first time. It could also be used to write the specification too. That was unheard of before. Often the spec you'd get for service was different to what the endpoint served because keeping documentation up to date was not a priority.

XML then got a little overplayed. For example, XSL transforms allowed you to turn one XML into another and because XHTML existed you have people building entire front ends in it (not recommended). You ended up in a weird hinterland where XML wasn't just for representing structured data but it had built in functionality too. It was not the right tool for that job!

I've not needed it in a long time as I prefer lighter weight formats and I don't miss it.

Just my take, others will have their own!

▲

riffraff

6 hours ago

[-]

XML as a document markup language was neat imvho.

Like, I remember working with DocBook XML[0] and it was fine. And the idea of being able to use different namespaces in a document (think MathML and SVG in XHTML) was neat too.

The problems arose from the fact that it was adopted for everything where it largely didn't make much sense. So people came to hate it because e.g. "a functional language to transform XML into other formats" is neat, but "a functional language written in XML tags" is a terrible idea"[1].

Likewise, "define a configuration in XML" seems a good idea, but "a build system based on XML plus interpolation you're supposed to edit by hand" is not great[2].

So people threw away all of the baby XML with the bathwater, only to keep reinventing the same things over and over, e.g. SOAP+WSDL became a hodgepodge of badly documented REST APIs, swagger yaml definitions and json schemas, plus the actual ad-hoc encoding.

And I mean, it's not like SOAP+WSDL actually worked well either, it was always unreliable. And even the "mix up namespaces" idea didn't work out, cause clients never really parsed more than one thing at a time, so it was pointless (with notable small exceptions). XML-RPC[3] did work, but you still needed to have the application model somewhere else anyway.

Still, JSON has seen just as much abuse as a "serialization" format which ended up abused as configuration, schema definitions, rules language... It's the circle of life.

[0] https://docbook.org/

[1] https://developer.mozilla.org/en-US/docs/Web/XML/XSLT

[2] https://ant.apache.org/manual/using.html

[3] https://en.wikipedia.org/wiki/XML-RPC

▲

chriswarbo

31 minutes ago

[-]

I think XML fit well into the turn-of-the-millenium zeitgeist: GUIs would hide the verbosity; the proliferation of bespoke tags would map cleanly to OOP representations; middleware could manipulate and transform data generically, allowing anything to plug into anything else (even over the Internet!).

Whilst lots of impressive things were built, the overall dream was always just out of reach. Domain-specific tooling is expensive to produce and maintain, and often gives something that's not quite what we want (as an extreme example, think of (X)HTML generated by Dreamweaver or FrontPage); generic XML processors/editors don't offer much beyond avoiding syntax/nesting errors; so often it was simplest to interact directly with the markup, where the verbosity, namespacing, normalisation, etc. wouldn't be automated-away.

XML's tree model was also leaky: I've worked with many data formats which look like XML, but actually require various (bespoke!) amounts of preprocessing, templating, dereferencing, etc. which either don't fit the XML model (e.g. graphs or DAGs), or just avoid it (e.g. sprinkling custom grammar like `${foo.bar}` in their text, rather than XML elements like `<ref name="foo.bar" />`). Of course, it was hard to predict how those systems would interact with XML features like namespaces, comments, etc. which made generic processing/transforming middleware less plug-and-play. That, plus billion-laugh mitigation, etc. contributed to a downward spiral of quality, where software would not bother supporting the full generality of XML, and only allowed its own particular subset of functionality, written in one specific way that it expected. That made the processors/transformers even less useful; and so on until eventually we just had a bunch of bespoke, incompatible formats again. At which point, many just threw up their hands and switched to JSON, since at least that was simpler, less verbose and easier to parse... depending whether you support comments... and maybe trailing commas...; or better yet, just stick to YAML. Or TOML.....

(My favourite example: at an old job, I maintained an old server that sent orders from our ecommerce site to third party systems, using a standard "cXML" format. Another team built a replacement for it, I helped them by providing real example documents to test with, and eventually the switch was made. Shortly after, customers were receiving dozens of times what they ordered! It turned out that a third-party was including an XML declaration like `<?xml>` at the start of their response, which caused the new system to give a parse failure: it treated that as an error, assumed the order had failed, and retried; over and over again!)

▲

imtringued

5 hours ago

[-]

>And I mean, it's not like SOAP+WSDL actually worked well either, it was always unreliable.

I don't think it ever worked. See this [0]. It's pretty crazy that people build one of the most complex and verbose data exchange formats in the world and then it turns out that duplicating the open and close tag and including the parameter name and type in the attributes bought you nothing, because implementations are treating your SOAP request as an array of strings.

[0] https://snook.ca/archives/other/soap_request_pa

▲

rapnie

1 hour ago

[-]

> it's not like SOAP+WSDL actually worked well either, it was always unreliable

This is comparable to saying that "multiplayer distributed architecture at scale" never worked well and was unreliable. All depends what your needs are and how the design and implementation satisfies them. SOAP+WSDL were part or a larger technology vision of Service Oriented Architecture (SOA) with all the complexities of distributed architecture. And the attempt was to make all of that open standards based.

I worked in Print at the time at one of the largest companies (now gone bust), and can confidentally say that SOAP+WSDL worked perfectly for us, and made it way more reliable to tie all these very specialized printing equipment with archaic languages and interfaces together, increasing productivity and efficiency of the entire print process.

▲

arethuza

5 hours ago

[-]

SOAP always seemed to mostly work then if something did fail it was an utter nightmare to work out what the problem was - WSDL really wasn't much fun to read.

Whereas when REST APIs came out (using JSON or XML) they were much easier to dive into at the command line with curl and work out how to get things started and diagnose problems when they inevitably came up.

▲

lolive

1 hour ago

[-]

I still cannot tell which one I hate the most: CSV or JSON. These really are hacks that should never have gotten the attention of the world, for data exchange.

▲

riffraff

5 hours ago

[-]

that seems like a particularly bad implementation :) IME things worked ok 70% of the time, but I do recall big matrixes of "does client library X work with server Y" with a lot of red cells.

▲

vbezhenar

3 hours ago

[-]

XML is fundamentally incompatible with commonly used programming data structures, namely lists/arrays and structs/maps. That fundamental mismatch caused a lot of friction when people use XML for data exchange between programs. JSON is clear winner here.

XML is absolutely fine for data that maps naturally to it. For example text markup. While HTML technically is not XML, it's very close to it and XHTML still is a thing. Probably most people wouldn't enjoy using JSON to encode HTML pages.

▲

embedding-shape

3 hours ago

[-]

> Probably most people wouldn't enjoy using JSON to encode HTML pages.

Probably yeah, but also, people don't know how enjoyable it is to just use lists and elements within it to represent HTML. Hiccup is a great example of this where you can do something like:

  [:h1 "Hey"
    [:strong
      {:style {:color "red"}}
      "there"]]

And it just makes sense. I'm sure we've yet to find the best way of writing HTML.

▲

tannhaeuser

3 hours ago

[-]

> XML is fundamentally incompatible with commonly used programming data structures, namely lists/arrays and structs/maps.

Another way to say this is XML is a grammar formalism that deals purely with serialisation rather than higher-level structures that might be serialised such as co-inductive data structures.

> While HTML technically is not XML, it's very close to it and XHTML still is a thing.

XML and HTML-as-serialisation-format are both subsets of SGML.

▲

cess11

19 minutes ago

[-]

"XML is fundamentally incompatible with commonly used programming data structures, namely lists/arrays and structs/maps. That fundamental mismatch caused a lot of friction when people use XML for data exchange between programs. JSON is clear winner here."

I'm not so sure about this. When you have a schema it becomes possible to generate your object code, and then your only immediate interface with the XML file is a rather simple instruction to unmarshal it, the rest of the time you're dabbling within your programming language of choice.

▲

hnlmorg

3 hours ago

[-]

It isn’t incompatible. It’s just a massive superset of what is needed.

JSON offers simplicity

YAML offers readability

XML offers a massive feature set.

For what we need 99% of the time, simplicity and/or readability is a much higher requirement.

As for TOML, I honestly don’t understand why anyone likes that.

▲

dvdkon

2 hours ago

[-]

I don't think it's a superset. You can represent any structs-and-arrays data in XML, but you have to make non-trivial mappings to make it work.

The obvious way is to use elements for everything, but then you're mapping both structs and their fields (very different concepts in e.g. C) to elements. Attributes map nicely to struct fields, but they only admit string values.

▲

hnlmorg

2 hours ago

[-]

That’s why it’s a superset ;)

You can map anything to it but that flexibility means you then need to define schemas et al to ensure compliance

The schema thing isn’t actually unique to XML either. you can do the same thing with JSON and YAML too. But in my opinion, if you get into the realm of needing schemas in JSON then you’re at the point where you shouldn’t be using JSON any longer since you’re now violating the “simplicity” argument which is JSONs only real advantage over other formats.

▲

lolive

1 hour ago

[-]

Watch how dict2xml or xml2dict handle JSON to XML automatic mapping. Both format carry 99% of the same structural infos in their respective serialization.

▲

tclancy

3 hours ago

[-]

Huh? I always felt some of the failure/ bad reputation of XML was how it got tortured by devs who did not understand database normalization. If you “get” 3rd form normalization, xml works fine for the relations you describe, unless I am missing something.

To be clear, I am not being snide and would be interested in the cases you’re thinking of.

▲

tannhaeuser

5 hours ago

[-]

The XML spec starts like this:

> The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.

Where "generic SGML" refers to markup beyond the basic HTML vocabulary hardcoded into browsers, such as SVG and MathML. XML was specifically designed such that mere parsing doesn't require element-specific rules such as SGML-derived HTML tag omission/inference, empty elements, and attribute shortforms, by excluding these features from the XML subset of SGML. Original SGML always required a DTD schema to inform the parser about these things that HTML has to this day, and not just for legacy reasons either ie. new elements and attributes making use of these features are introduced all the time (cf. [1]).

Now XML Schema (W3C's XML schema language, and by far the most used one) isn't very beautiful, but is carefully crafted to be upwards compatible with DTDs in that it uses the same notion of automaton construction to decide admissability of content models (XSD's Unique Particle Attribution rule), rooted in SGML's zero lookahead design rationale that is also required for tag inference. Relax NG does away with this constraint, allowing a larger class of markup content models but only working with fully tagged XML markup.

XML became very popular for a while and, like JSON afterwards, was misused for all kind of things: service payloads in machine-to-machine communication, configuration files, etc., but these non-use cases shouldn't be held against its design. As a markup language, while XML makes a reasonable delivery or archival language, it's a failure as an authoring language due to its rigidity/redundancy and verbosity, as is evident by the massive use of markdown and other HTML short syntaxes supported by SGML but not XML.

[1]: https://sgmljs.sgml.net/docs/html5.html

▲

Devasta

4 hours ago

[-]

It is heavily used today, it's everywhere in finance and banking, it's just not on the web, so browser and web devs are forced endlessly to re implement shitty versions of things they could have had out of the box decades earlier.

▲

IshKebab

5 hours ago

[-]

I still think the reason XML failed is largely because it's a document markup language not an object serialisation language, and 99% of the time you really want the latter.

You don't need attributes, you probably don't need namespaces, you probably do want at least basic types.

Look at this for example: https://docs.rs/serde-xml-rs/0.8.2/serde_xml_rs/#caveats

JSON solves all of that for serialisation. The only problem with JSON is it has ended up being used for configuration, and then you really need at least comments. I wish JSON5 was as well supported as JSON is.

▲

masklinn

6 hours ago

[-]

I’ve become ambivalent about relaxng, I used it a bunch because I like(d) the model, and the “Compact” syntax is really quite readable, and it’s a lot simpler than XML schemas.

However the error messages, at least when doing rng validation via libxml2, are absolutely useless, so when you have a schema error finding out why tends to be quite difficult. I also recall that trying to allow foreign schema content inside your document without validating it, but still validating your own schema, is a bit of a hassle.