I'd also wish people would stop calling every element-specific behavior HTML parsers do "liberal and tag-soup"-like. Yes WHATWG HTML does define error recovery rules, and HTML had introduced historic blunders to accomodate inline CSS and inline JS, but almost always what's being complained about are just SGML empty elements (aka HTML void elements) or tag omission (as described above) by folks not doing their homework.
[1]: https://sgmljs.sgml.net/docs/html5.html#tag-omission (see also XML Prague 2017 proceedings pp. 101ff)
I just wish browsers weren't so anal about making you load things from http://localhost instead of file:// directly. Someone ought to look into fixing the security issues of file:// URLs so browsers can relax about that.
Opus and I have made a couple of really cool internal tools for work. It's really great.
Apparently JavaScript got grandfathered in as ok for direct access!
https://marketplace.visualstudio.com/items?itemName=carsho.h...
So <div id="hello"> becomes accessible as window["hello"], which means you can just directly write hello.innerText = "Hi!".
Since this may conflicts with any of the hundreds of other properties on window, it's generally not something that should be used.
Historically it wasn't too uncommon to see it, but since it doesn't work well with typescript, it's very rare now.
The DOM API may have been very messy at creation, but it is also very handy and powerful, especially for binding to a live programming visual environment with instant remote update capabilities.
If you mean full sandboxing of applications with a usable capability system, then yeah, someone ought to do that. But I wouldn't hold my breath, there's a reason why nobody did yet.
I think every dev should have a tools.TheirDomain.zzz where they put different tools they create. You can make so many static tools and I feel like everyone creates these from time to time when they are prototyping things. There's so many free options for static hosting and you can write bash deploy scripts so quickly with AI, so its literally just ./deploy.sh to deploy. (I also recommend writing some reusable logic for saving to local storage/indexedDB so its even nicer.)
Mine for example is https://tools.carsho.dev (100% offline/static tools, no monetization)
fetch("file:///C:/Users/You/Documents/secrets.txt") $ python -m http.serverThis could easily be solved by some simple contract like "webgame.html can only access files in a webpage/ subdirectory," but the powers that be deemed such thing not worth the trouble.
https://nvd.nist.gov/vuln/detail/CVE-2020-26870
https://sirre.al/2025/08/06/safe-json-in-script-tags-how-not...
https://bughunters.google.com/blog/5038742869770240/escaping...
None of those problems exist in XHTML.
> It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.
which unfortunately reaks of exactly the kind of roundabout HTML criticism that is not so helpful IMO. We have to face the possibility that most HTML documents have already been written at this point, at least if you value text by humans.
The CVEs you're referencing are due to said historic blunders allowing inline JS or otherwise tunneling foreign syntax in markup constructs (mutation XSSs are only triggered by serialising and reparsing HTML as part of bogus sanitizer libs anyway).
If you look at past comments of mine, you'll notice I'm staunchly criticizing inline JS and CSS (should always be placed in external "resources") and go as far as saying CSS or other ad-hoc item-value syntax should not even exist when attributes already serve this purpose.
The remaining CVE is made possible by Hickson's overly liberal rules for what's allowed or needs escaping in attributes vs SGML's much stricter rules.
I like the flexibility of being able to make one file HTML apps with inline resources when I'm not generating code. But there should be better protections against including inline scripts in generated code unintentionally.
Since then I always write <body> explicitly even though it is optional.
I learned HTML quite late, when HTML 5 was already all the rage, and I never understood why the more strict rules of XML for HTML never took off. They seem so much saner than whatever soup of special rules and exceptions we currently have. HTML 5 was an opportunity to make a clear cut between legacy HTML and the future of HTML. Even though I don't have to, I strive to adhere to the stricter rules of closing all tags, closing self-closing tags and only using lower-case tag names.
Internet Explorer failing to support XHTML at all (which also forced everyone to serve XHTML with the HTML media type and avoid incompatible syntaxes like self-closing <script />), Firefox at first failing to support progressive rendering of XHTML, a dearth of tooling to emit well-formed XHTML (remember, those were the days of PHP emitting markup by string concatenation) and the resulting fear of pages entirely failing to render (the so-called Yellow Screen of Death), and a side helping of the WHATWG cartel^W organization declaring XHTML "obsolete". It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.
I think most of those are actually no longer relevant, so I still kind of hope that XHTML could have a resurgence, and that the tag-soup syntax could be finally discarded. It's long overdue.
Meanwhile, in any other formal language (including JS and CSS!), the standard assumption is that syntax errors are fatal, the responsibility for fixing lies with the page author, but also that fixing those errors is not a difficult problem.
Why is this a problem for HTML - and only HTML?
The web owes its success to having low barriers to entry and very quickly became a mixture of pages hand coded by people who weren't programmers, content produced by CMS systems which included stuff the content author didn't directly control and weren't necessarily reliable at putting tags into the right place, and third party widgets activated by pasting in whatever code the third party had given you. And browsers became really good at attempting to rendering erroneous and ambiguous markup (and for that matter were usually out of date or plain bad at rigidly implementing standards)
There was a movement to serve XHTML as XML via the application/xhtml+xml MIME type but it never took off because browsers didn't do anything with it except loading a user-hostile error page if a closing tag was missed (or refusing to load it at all in the case of IE6 and older browsers), and if you wanted to do clever transformation of your source data, there were ways to achieve that other than formatting the markup sent to the browser as a subset of XML
Your premise is not correct because you're not aware that other data formats also have parsers that accept malformed content. Examples:
- pdf files: many files with errors can be read by Adobe Acrobat. And code PDF libraries for developers often replicate this behavior so they too can also open the same invalid pdf files.
- zip files. 7-Zip and WinRAR can open some malformed zip files that don't follow the official PKZIP specification. E.g. 7-Zip has extra defensive code that looks for a bad 2-byte sequence that shouldn't be there and skips over it.
- csv files. MS Excel can read some malformed csv files.
- SMTP email headers: Mozilla Thunderbird, MS Outlook, etc can parse fields that don't exactly comply with RFC 822 -- make some guesses -- and then successfully display the email content to the user
The common theme to the above, including HTML... the Raw Content is more important than a perfectly standards-compliant file format. That's why parsers across various domains make best efforts to load the file even when it's not 100% free of syntax errors.
>Meanwhile, in any other formal language (including JS and CSS!), the standard assumption is that syntax errors are fatal,
Parsing invalid CSS is not a fatal error. Example of validating HTML/CSS in a job listings webpage at Monster.com : https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.monster.c...
It has CSS errors such as:
Error: CSS: background-color: none is not a background-color value. From line 276, column 212; to line 276, column 215
Error: CSS: padding: 8x is not a padding value.
Job hunters in the real world want to see the jobs because the goal is to get a paycheck. Therefor, a web browser that didn't show the webpage just because the author mistakenly wrote CSS "none" instead "transparent" and "8x" instead of "8px" -- would be user hostile software.At work we have to parse CSV files which often have mixed encoding (Latin-1 with UTF-8 in random fields on random rows), occasionally have partial lines (remainder of line just missing) and other interesting errors.
We also have to parse fixed-width flat files where fields occasionally aren't fixed-width after all, with no discernible pattern. Customer can't fix the broken proprietary system that spits this out so we have to deal with it.
And of course, XML files with encoding mismatch (because that header is just a fixed string that bears no meaning on the rest of the content, right?) or even mixed encoding. That's just par for the course.
Just some examples of how fun parsing can be.
We could be more strict for new content, but why bother if you have to include the legacy parser anyway. And the HTML5 algorithm brings us most of the benefits (deterministic parsing) of a stricter syntax while still allowing the looseness.
Try going to any 1998 web page in a modern browser... It's generally so broken so as to be unusable.
As well as every page telling me to install flash, most links are dead, most scripts don't run properly (vbscript!?), tls versions now incompatible, etc.
We shouldn't put much effort into backwards compatibility if it doesn't work in practice. The best bet to open a 1998 web page is to install IE6 in a VM, and everything works wonderfully.
In CSS, a syntax error isn't fatal. Most of the time, an unrecognized property causes that selector and all its properties to be ignored.
:is() and :where() support forgiving selector list [1].
Only the erroneous properties are ignored; the rest work normally.
[1]: https://drafts.csswg.org/selectors-4/#typedef-forgiving-sele...
Because HTML is a content language, and at any given time the main purpose of the main engines using it will be to access a large array of content that is older than the newest revision of the language, and anything that creates significant incompatibilities or forces completely rewrites of large bodies of work to incorporate new features in a standard is simply not going to be implemented as specified by the major implementers (it will either not be implemented at all, or will be modified), because it is hostile what the implementations are used for.
Netscape started this. NCSA was in favor of XML style rules over SGML, but Netscape embraced SGML leniency fully and several tools of that era generated web pages that only rendered properly in Netscape. So people voted with their feet and went to the panderers. If I had a dollar for every time someone told me, “well it works in Netscape” I’d be retired by now.
Well, this is not entirely true: XML namespaces enabled attaching arbitrary data to XHTML elements in a much more elegant, orthogonal way than the half-assed solution HTML5 ended up with (the data-* attribute set), and embedding other XML applications like XForms, SVG and MathML (though I am not sure how widely supported this was at the time; some of this was backported into HTML5 anyway, in a way that later led to CVEs). But this is rather niche.
Original SGML was actually closer to markdown. It had various options to shorten and simplify the syntax, making it easy to write and edit by hand, while still having an unambiguous structure.
The verbose and explicit structure of xhtml makes it easier to process by tools, but more tedious for humans.
It’s kind of a huge deal that I can give a Markdown file of plain text content to somebody non-technical and they aren’t overwhelmed by it in raw form.
HTML fails that same test.
Previously its popularity was somewhat similar to RST.
People were just using other markup languages like rST.
Other attempts had already proven HTML to be a bad language for rough documentation. Someone then just needed to write a spec that was easy to implement and Markdown was that.
Especially for casual users of HTML.
“Always close your tags” is a simpler rule (and fewer rules, depending how you count) than “Close your tags, except possibly in situations A, B, C…”.
And markdown tables are harder to write than HTML tables. However, they are generally easier to read. Unless multi line cell.
Because of the vast quantity of legacy HTML content, largely.
> HTML 5 was an opportunity to make a clear cut between legacy HTML and the future of HTML.
WHATWG and its living standard that W3C took various versions of and made changes to and called it HTML 5, 5.1, etc., to pretend that they were still relevant in HTML, before finally giving up on that entirely, was a direct result of the failure of XHTML and the idea of a clear cut between legacy HTML and the future of HTML. It was a direct reaction against the “clear cut” approach based on experience, not an opportunity to repeat its mistakes. (Instead of a clear break, HTML incorporated the “more strict rules of XML” via the XML serialization for HTML; for the applications where that approach offers value, it is available and supported and has an object model 100% compatible with the more common form, and they are maintained together rather than competing.)
Besides, at this point technologies like tree-sitter make editor integration a moot point: once tree-sitter knows how to parse it, the editor does too.
A p or li tag, at least when used and nested properly, logically ends where either the next one begins or the enclosing block ends. Closing li also creates the opportunity for nonsensical content inside of a list but not in any list item. Of course all of these corner cases are now well specified because people did close their tags sometimes.
While this is true I’ve never liked it.
<p>blah<p>blah2</p>
Implies a closing </p> in the middle. But <p>blah<span>blah2</p>
Does not. Obviously with the knowledge of the difference between what span and p represent I understand why but in terms of pure markup it’s always left a bad taste in my mouth. I’ll always close tags whenever relevant even if it’s not necessary.In practice, modern HTML splits the difference with rigorous and well defined but not necessarily intuitive semantics.
So we'll add another syntax for browsers to handle.
But.
The future of HTML will forever contain content that was first handtyped in Notepad++ in 2001 or created in Wordpress in 2008. It's the right move for the browser to stay forgiving, even if you have rules in your personal styleguide.
XHTML came out at a time when Internet Explorer, the most popular browser, was essentially frozen apart from security fixes because Microsoft knew that if the web took off as a viable application platform it would threaten Windows' dominance. XHTML 1.1 Transitional was essentially HTML 4.01 except that if it wasn't also valid XML, the spec required the browser to display a yellow "parsing error" page rather than display the content. This meant that any "working" XHTML site might not display because the page author didn't test in your browser. It also meant that any XHTML site might break at any time because a content writer used a noncompliant browser like IE 6 to write an article, or because the developers missed an edge case that causes invalid syntax.
XHTML 2.0 was a far more radical design. Because IE 6 was frozen, XHTML 2.0 was written with the expectation that no current web browser would implement it, and instead was a ground-up redesign of the web written "the right way" that would eventually entirely replace all existing web browsers. For example, forms were gone, frames were gone, and all presentational elements like <b> and <i> were gone in favor of semantic elements like <strong> and <samp> that made it possible for a page to be reasoned about automatically by a program. This required different processing from existing HTML and XHTML documents, but there was no way to differentiate between "old" and "new" documents, meaning no thought was given to adding XHTML 2.0 support to browsers that supported existing web technologies. Even by the mid-2000s, asking everyone to restart the web from scratch was obviously unrealistic compared to incrementally improving it. See here for a good overview of XHTML 2.0's failure from a web browser implementor's perspective: https://dbaron.org/log/20090707-ex-html
It's still a little annoying to put <p> before each paragraph, but not by that much. By contrast, once you start adding closing tags, you're much closer to computer code.
I'm not sure if that makes sense but it's the way I think about it.
Any time I have to write Markdown I have to open a cheat sheet for reference. With HTML, which I have used for years, I just write it.
> On void elements, [the trailing slash] does not mark the start tag as self-closing but instead is unnecessary and has no effect of any kind. For such void elements, it should be used only with caution — especially since, if directly preceded by an unquoted attribute value, it becomes part of the attribute value rather than being discarded by the parser.
It was mainly added to HTML5 to make it easier to convert XHTML pages to HTML5. IMO using the trailing slash in new pages is a mistake. It makes it appear as though the slash is what closes the element when in reality it does nothing and the element is self-closing because it's part of a hardcoded set of void elements. See here for more information: https://github.com/validator/validator/wiki/Markup-%C2%BB-Vo...
The third way of a bare tag is where the confusion comes from.
Now, we can discuss whether we should optimize for the unfamiliar reader, and whether the illusion of actual meaning the trailing slash in HTML5 can be harmful.
I would note that exactly like trailing slashes, indentation doesn't mean anything for the parser in C-like languages and can be written misleadingly, yet we do systematically use it, even when no unfamiliar reader is expected.
At this point, writing a slash or not and closing all the tags is a coding style discussion.
Now, maybe someone writing almost-XHTML (closing all tags, putting trailing slashes, quoting all the attributes) should go all the way and write actual XHTML with the actual XHTML content type and benefit from the strict parser catching potential errors that can backfire and that nobody would have noticed with the HTML 5 parser.
Because browsers close some tags automatically. And if your closing tag is wrong, it'll generate empty element instead of being ignored. Without even emitting warning in developer console. So by closing tags you're risking introducing very subtle DOM bugs.
If you want to close tags, make sure that your building or testing pipeline ensures strict validation of produced HTML.
In that example, the image could be part of the first paragraph, as it is there, or if i moved the second <p> before the <img> it would be part of the second. but if I want neither, do I not have to close the first paragraph?
Here is a demo of what i mean on this random html paste site: https://htmlbin.online/closetagdemo
I don't know what "not required" means, but it makes a difference with <p> at least in my opinion. I think the author meant that if the succeeding element is of the same type, you don't need to close the previous one.
But even then, this is not a good feature, browsers aren't the only things processing html content, any number of tooling, or even human readers can get confused.
Netscape Navigator did, in fact, reject invalid HTML. Then along came Internet Explorer and chose “render invalid HTML dwim” as a strategy. People, my young naive self included, moaned about NN being too strict. NN eventually switched to the tag soup approach. XHTML 1.0 arrived in 2000, attempting to reform HTML by recasting it as an XML application. The idea was to impose XML’s strict parsing rules: well-formed documents only, close all your tags, lowercase element names, quote all attributes, and if the document is malformed, the parser must stop and display an error rather than guess. XHTML was abandoned in 2009. When HTML5 was being drafted in 2004-onwards, the WHATWG actually had to formally specify how browsers should handle malformed markup, essentially codifying IE’s error-recovery heuristics as the standard.
Leaving out closing tags is possible when the parsing is unambigous. E.g <p>foo<p>bar is unambiguous becuse p elements does not nest, so they close automatically by the next p.
The question about invalid HTML is a sepearate issue. E.g you can’t nest a p inside an i according to the spec, so how does a browser render that? Or lexical error like illegal characters in a non-quoted attribute value.
This is where it gets tricky. Render anyway, skip the invalid html, or stop rendering with an error message? HTML did not specify what to do with invalid input, so either is legal. Browsers choose to go with the “render anyway” approach, but this lead to different outputs in different browsers, since it wasn’t agreed upon how to render invald html.
The difference between Netscape and IE was that Netscape in more cases would skip rendering invalid HTML, where IE would always render the content.
The oldest public HTML documentation there is, from 1991, demonstrates that <li>, <dt>, and <dd> tags don't need to be closed! And the oldest HTML DTD, from 1992, explicitly specifies that these, as well as <p>, don't need closing. Remember, HTML is derived from SGML, not XML; and SGML, unlike XML, allows for the possibility of tags with optional close. The attempt to make HTML more XML-like didn't come until later.
This is clear in Tim Berners-Lee's seminal, pre-Netscape "HTML Tags" document [0], through HTML 4 [4] and (as you point out) through the current living standard [5].
[0] https://www.w3.org/History/19921103-hypertext/hypertext/WWW/...
[4] https://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.1
[5] https://html.spec.whatwg.org/multipage/syntax.html#optional-...
Because table layout was common, a missing </table> was a common error that resulted in a blank page in NN. That was a completely unintentional bug.
Optional closing tags were inherited from SGML, and were always part of HTML. They're not even an error.
Around 2000, I was meeting with Tim Berners-Lee, and I mentioned I'd been writing a bunch of Web utility code. He wanted to see, so I handed him some printed API docs I had with me. (He talked and read fast.)
Then I realized he was reading the editorializing in my permissive parser docs, about how browser vendors should've put a big error/warning message on the window for invalid HTML.
Which suddenly felt presumptuous of me, to be having opinions about Web standards, right in front of Tim Berners-Lee at the time.
(My thinking with the prominent warning message that every visitor would see, in mid/late-'90s, was that it would've been compelling social pressure at the time. It would imply that this gold rush dotcom or aspiring developer wasn't good at Web. Everyone was getting money in the belief that they knew anything at all about Web, with little way to evaluate how much they knew.)
<p>
text1
<p>
text2
</p>
</p>edit: Indeed, it creates three: the </p> seems to create an empty paragraph tag. Not the first time I've been surprised by tag soup rules.
Why?
> You may think that's invalid HTML, but browser will parse it and won't indicate any kind of error.
It isn’t an opinion, it literally is invalid HTML.
What you’re responding to is an assumption that I was suggesting browsers couldn’t render that. Which isn’t what I claimed at all. I know full well that browsers will gracefully handle incorrect HTML, but that doesn’t mean that the source is magically compliant with the HTML specification.
Because the second open p-tag closes the first p-tag and then the last closing p has no matching starting p-tag and creates one thus resulting in 3 p-elements.
> It isn’t an opinion, it literally is invalid HTML.
the only "invalid" part is the last closing p.
I don't know why. Try it out. That's the way browsers are coded.
> It isn’t an opinion, it literally is invalid HTML.
It matters not. You're writing HTML for browser to consume, not for validator to accept. And most of webpages are invalid HTML. This very HN page contains 412 errors and warnings according to W3C validator, so the whole point of HTML validness is moot.
I’m not saying you’re wrong, but I’d need more than that to be convinced. Sorry.
> It matters not. You're writing HTML for browser to consume, not for validator to accept.
It matters because you’re arguing a strawman argument.
We weren’t discussing what a browser can render. We were discussing the source code.
So your comment wasn’t a rebuttal of mine. It was a related tangent or addition.
So basically my point is:
1. You can avoid closing some tags, letting browser to close tags for you. It won't do any harm.
2. You can choose to explicitly close all tags. It won't do anything for valid HTML, but it'll introduce subtle and hard to find DOM bugs by adding empty elements.
So you're trying to improve HTML source readability by risking to introduce subtle bugs.
If you want to do that, I'd recommend to implement HTML validation for build or test pipeline at least.
Another alternative is to use HTML comments to close tags, as this closing tag is supposed to be documentation-only and won't be used by browser in a proper code.
You posted a terse comment with some HTML. I responded specifically about that comment and HTML. And you’re now elaborating on things as a rebuttal to my comment despite the fact that wasn’t the original scope of my comment.
Another example of that is how you’ve quoted my reply to the 2 vs 3 elements, and then answered a completely different question (one I didn’t even ask).
I don’t think you’re being intentionally obtuse but it’s still a very disingenuous way to handle a discussion.
I'm not a web programmer, but shouldn't one program against the specified interface instead of some edge case behavior of an implementation?
It doesn't make the code valid according to the specifications.
So I think your argument here is tough to take at face value. It feels a lot more like you’re arguing personal preference as fact.
Though if a linter is formatting the whole codebase on its own in an homogeneous way, and someone else will deal with the added parsing complexity, that might feel okayish also to me.
Generally speaking, the less clutter the better. A bit like with a js codebase which is semicolon free where possible.
For pleasant experience of read and write, html in a simple text editor is very low quality. Pug for example is bringing far less clutter, though mandatory space indentation could be avoided with some alternative syntactic choices.
There being nesting is just implied by the closing tags and indentation. But it is not actually there. I think this is the point of the example: Adding the closing tags just confuses the reader, by implying nesting that is not actually there, and even introduces a third empty paragraph. It might be better left out entirely.
The syntax is invalid, but that's because the final </p> has no opening <p> that it can close.
I think this is the point of the example, afaiui: The closing tags don’t clarify anything, quite the contrary, actually. They serve only to confuse the reader.
<large><li></large> item text
to get a bigger bullet on a list item which worked fine in Netscape but broke other browsers (and since I was on OS/2 at the time, it was an issue for me).Really, in 2025 people should just write XHTML and better yet, shouldn’t be generating HTML by hand at all except for borderline cases not handled by their tools.
As for generating all HTML, that's simply not possible given the current state (of open-source at least) WYSIWYG HTML editors.
Some tags do require ending tags, others do not. Personally I find it hard to remember which ones, so I just close things out of caution. That way you’re always spec-correct.
That said, your linter is going to drive you crazy if you don't close tags, no?
I write virtually zero HTML anymore, but the one time this sort of thing comes up is in writing PR descriptions in GitHub using Markdown. Sometimes I want to add a <br> or two for space. I guess I’ve never stopped to notice that I never close those tags after adding them, or wondered why in my head it makes sense not to!
For example, I generate numbered lists of URLs something like
<li><a href=http://example.com>http://example.com</a>
This is for text-only browserIf I am viewing in graphical browser I wrap the lists in <pre> tags
I don't think I've ever closed <br> tags
Correction: there was also the issue of Ä and Ö. Those were &AUML; and &OUML; I think.
Payload size is a moot point given gzip.
But honestly no answer to "what does the browser do with this sort of thing" fits into an HN comment anymore. I'm glad there's a standard, but there's a better branch of the multiverse where the specification of what to do with bad HTML was written from the beginning and is much, much simpler.
https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
I hand write my HTML sometimes, and in those cases it’s often very basic documents consisting of maybe an outer container div, a header and a nav with a ul of li for the navigation items and then an inner container div and maybe an article element, and then the contents are mostly p and figure elements and various level headings.
In this case, there is no mental overhead of omitting closing li and closing p at the end of the line, and I omit them because I am allowed to and it’s still readable and fine.
You should close your tags. It’s good hygiene. It helps IDEs help you. But. Trust me, you do not want the browser enforcing it at runtime, lest your idea of fun is end users getting helpful error messages like an otherwise blank screen saying “Invalid syntax”.
For fun, imagine that various browsers are not 100.00% compatible (“Inconceivable!”), so that it wasn’t possible to write HTML that every browser agreed was completely valid. Now it’s guaranteed that some of your users will get the error page, even when you’re sure your page is valid.
Conceptually, XHTML and its analogs are better. In practice, they’re much, much worse.
Also it annoys me when people are still closing tags with '/>'.
Some rules of thumb, perhaps:
— Do not omit if it is a template and another piece of HTML is included in or after this tag. (The key fact, as always, is that we all make errors sometimes—and omitting a closing tag can make an otherwise small markup error turn your tree into an unrecognisable mess.)
— Remember, the goal in the first place is readability and improved SNR. Use it only if you already respect legibility in other ways, especially the lower-hanging fruit like consistent use of indentation.
— Do not omit if takes more than a split-second to get it. (Going off the HTML spec, as an example, you could have <a> and <p> as siblings in one container, and in that case if you don’t close some <p> it may be non-obvious if an <a> is phrasing or flow content.)
The last thing you want is to require the reader of your code to be more of an HTML parser than they already have to be.
For me personally this makes omitting closing tags OK only in simpler hand-coded cases with a lot of repetition, like tables, lists, definition lists (often forgotten), and obviously void elements.
<nav id=main-nav>
<ul>
<li><a href="/">Home</a>
<li><a href="/hamburgers/">Hamburgers</a>
<li><a href="/sausages/">Sausages</a>
</ul>
</nav>Laziness doesn't play a role. This isn't XML where you need to repeat yourself over and over again or abusing a bug in the rendering logic; it's following the definitions markdown language you're writing content in.
If you're not too familiar with the HTML language then it's always a safe bet to close your tags, of course.
If you don't close your <p> and <li> tags, you risk accidentally having content in the wrong place.
It's something to avoid because it can have bad consequences, not because it (somehow?) makes you a bad person.
(This is especially relevant with "void" tags. E.g. if someone wrote "<img> hello </img>" then the "hello" is not contained in the tag. You could use the self closing syntax to make this more obvious -- Edit: That's bad advice, see below.)
e.g. how do you think a browser will interpret this markup?
<div />
<img />
A lot of people think it ends up like this (especially because JSX works this way): <div></div>
<img>
but it's actually equal to this: <div>
<img>
</div> <table>
<tr> <td> A1 <td> B1 <td> C1
<tr> <td> A2 <td> B2 <td> C2
<tr> <td> A3 <td> B3 <td> C3
</table>
is valid and reads better than if the row and data elements were closed (and on separate rows because it would be too much noise otherwise) (of course the whitespaces are different, if they matter for some reason). For a 3x3 table 5 lines vs ~15 lines.The problem is when you have long cells that you’d normally word wrap inside the cell, everything else ends up misaligned in your markup language. Or when you need to add styling to text in a cell, suddenly it’s unreadable again. Or when there’s more than a small few number of columns thus causing each row to word wrap inside your IDE, etc
I think it makes far more sense to just acknowledge that tables are going to ugly, compose them elsewhere, and then export them to your markup language following that language’s specification strictly.
In fact, most web browsers now automatically insert these closing tags for the user.
This feature has been around for many years now.
However, I have found that many organizations still require that the closing tags be included explicitly.
I am curious how other organizations determine when to use the "the spec allows it" as a reason to not include the closing tags.
What point do developers cross from merely allowing this to being considered a technical debt?
Have you ever utilized the feature of the specification that caused you a problem later?
<p><p></p>
Should the second <p> be nested or not?Just because it worked on the one browser you tested it on, doesn't mean it's always worked that way, or that it will always work that way in the future...
Every browser treats html/etc differently... I've run into css issues before on Chrome for android, because I was writing using Chrome for desktop as a reference.
You'd think they should be the same because they come from the same heritage, but no...
All browsers have worked this way for decades. It’s standard HTML that has been in widespread use since the beginning of the web. The further back you go, the more normal it was to write HTML in this style. You can see in this specification from 1992 that <p> and <li> don’t have closing tags at all:
https://info.cern.ch/hypertext/WWW/MarkUp/Tags.html
Maybe there were obscure browsers that had bugs relating to this back in the mid 90s, but I don’t recall any from the late 90s onwards. Can you name a browser released this millennium that doesn’t understand optional closing tags?
Literally saving four bytes.
aaaaa<b>aaaa<i>aaaaa</b>aaaa</i>aaaaa
I just tried this! <table>
<tr><td>aaaaa<td>bbb
<tr><td>ccccccccccc<td>dddddddddd
<tr><td>eeeeeeeeeeeeeeeee
</table>
Which works and is much cleaner than the usual table tag soup. It makes sense as on their own <td> and <tr> tags have no meaning.(But it might be better if you make a habit of doing so.)
Or am I pointing out that closing tags is a human social issue, with aspects ranging from practical & reasonable, to ridiculous & widely exploited?
> XHTML, being based on XML as opposed to SGML, is notorious for being author-unfriendly due to its strictness
This strictness is a moot point. Most editors will autocomplete the closing tag for you, so it's hardly "unfriendly". Besides, if anything, closing tags are reader-friendly (which includes the author), since they make it clear when an element ends. In languages that don't have this, authors often add a comment like `// end of ...` to clarify this. The article author even acknowledges this in some of their examples ("explicit end tags added for clarity").
But there were other potential benefits of XHTML that never came to pass. A strict markup language would make documents easier to parse, and we wouldn't have ended up with the insanity of parsing modern HTML, which became standardized. This, in turn, would have made it easier to expand the language, and integrate different processors into the pipeline. Technologies like XSLT would have been adopted and improved, and perhaps we would have already had proper HTML modules, instead of the half-baked Web Components we have today. All because browser authors were reluctant to force website authors to fix their broken markup. It was a terrible tradeoff, if you ask me.
So, sure, feel free to not close HTML tags if you prefer not to, and to "educate" everyone that they shouldn't either. Just keep it away from any codebases I maintain, thank you very much.
To be fair, I don't mind not closing empty elements, such as `<img>` or `<br>`. But not closing `<p>` or `<div>` is hostile behavior, for no actual gain.
<br/>This non-closing talisman means that <div/> or <script/> are not closed, and will mess up nesting of elements.
> if the element is one of the void elements, or if the element is a foreign element, then there may be a single U+002F SOLIDUS character (/)
If you're going to be pedantic, at least be correct about it.
[1]: https://html.spec.whatwg.org/multipage/syntax.html#start-tag...
> On void elements, it does not mark the start tag as self-closing but instead is unnecessary and has no effect of any kind. For such void elements, it should be used only with caution — especially since, if directly preceded by an unquoted attribute value, it becomes part of the attribute value rather than being discarded by the parser.
(The void elements are listed here: https://developer.mozilla.org/en-US/docs/Glossary/Void_eleme... )
If that's a comment you get, write better code. It does not matter to me whether closing p-tags is mandatory or optional. If you don't do it, I don't want you working on the same code base as me.
This kind of knowledge makes for fun blog posts, but if people direct these kind of comments to me. You're obviously using your knowledge to just patronize and lecture people.
This may have been relevant 9 years ago, but today, just pick and auto-formatter like prettierjs and have it close these tags for you.
1. The autoclose syntax does not exist in HTML5, and a trailing slash after a tag is always ignored. It's therefore recommended to avoid this syntax. I.e write <br> instead of <br />. For details and a list of void elements, see https://developer.mozilla.org/en-US/docs/Glossary/Void_eleme...
2. It's not mandatory to close tags when the parser can guess where they end. E.g. a paragraph cannot contain any line-block, so <p>a<div>b</div> is the same as <p>a</p><div>b</div>. It depends on the context, but putting an explicit end tag is usually less error-prone.