I ran into this because I have a little userscript I inject everywhere that helps me copy text in hovered elements (not just links). It does:
[...document.querySelectorAll(":hover")].at(-1)
to grab the innermost hovered element. It works fine on standards-mode pages, but it's flaky on quirks-mode pages.
Question: is there any straightforward & clean way as a user to force a quirks-mode page to render in standards mode? I know you can do something like:
document.write("<!DOCTYPE html>" + document.documentElement.innerHTML);
but that blows away the entire document & introduces a ton of problems. Is there a cleaner trick?
At quick glance, it looks like they're still using the same CSS that was made public ~13 years ago:
https://github.com/wting/hackernews/blob/5a3296417d23d1ecc90...
Modern design trends are going backwards. Tons of spacing around everything, super low information density, designed for touch first (i.e. giant hit-targets), and tons of other things that were considered bad practice just ten years ago.
So HN has its quirks, but I'd take what it is over what most 20-something designers would turn it into. See old.reddit Vs. new.reddit or even their app.
perhaps try using a user agent that remembers your settings? e.g. firefox
To test, try setting your browser's font size larger or smaller and note which websites update and which do not. And besides helping to support different user preferences, it's very useful for accessibility.
[1] After testing, it looks like the "Reply" and "Help" links respect large browser font sizes.
None of the ”content needs white space and large fonts to breathe“ stuff or having to click to see a reply like on other sites. That just complicates interactions.
And I am posting this on an iPhone SE while my sight has started to degrade from age.
HN is the only site I have to increase the zoom level, and others below are doing the same thing as me. But it must be us with the issues. Obviously PG knew best in 2006 for decades to come.
16px is just massive.
HN has a good amount of white space. Much more would be too much, much less would be not enough.
1920x1080 24" screen here, .274mm pitch which is just about 100dpi. Standard text size in HN is also about 2mm across, measured by the simple method of holding a ruler up to the screen and guessing.
If you can't read this, you maybe need to get your eyes checked. It's likely you need reading glasses. The need for reading glasses kind of crept up on me because I either work on kind of Landrover-engine-scale components, or grain-of-sugar-scale components, the latter viewed down a binocular microscope on my SMD rework bench and the former big enough to see quite easily ;-)
On what devices (or browsers?) it renders "insanely small" for you? CSS pixels are not physical pixels, they're scaled to 1/96th of an inch on desktop computers, for smartphones etc. scaling takes into account the shorter typical distance between your eyes and the screen (to make the angular size roughly the same), so one CSS pixel can span multiple physical pixels on a high-PPI display. Font size specified in px should look the same on various devices. HN font size feels the same for me on my 32" 4k display (137 PPI), my 24" display with 94 PPI, and on my smartphone (416 PPI).
It has been changed since then for sure though. A couple of years ago the mobile experience was way worse than what it is today, so something has clearly changed. I think also some infamous "non-wrapping inline code" bug in the CSS was fixed, but can't remember if that was months, years or decades ago.
On another note, they're very receptive to emails, and if you have specific things you want fixed, and maybe even ideas on how to do in a good and proper way, you can email them (hn@ycombinator.com) and they'll respond relatively fast, either with a "thanks, good idea" or "probably not, here's why". That has been my experience at least.
On mobile it’s fine, on Mac with a Retina display it’s fine; the only one where it isn’t is a 4K display rendering at native resolution - for that, I have my browser set to 110% zoom, which is perfect for me.
So I have a workaround that’s trivial, but I can see the benefit of not needing to do that.
There is no such thing as a reasonable default size if we stop calibrating to physical dimensions. If you choose to use your phone at a scaling where what is supposed to be 1" is 0.75" then that's on you, not on the website to up the font size for everyone.
There's a trend to make fonts bigger but I never understood why. Do people really have trouble reading it?
I prefer seeing more information at the same time, when I used Discord (on PC), I even switched to IRC mode and made the font smaller so that more text would fit.
I'm not asking for some major, crazy redesign. 16px is the browser default and most websites aren't using tiny, small font sizes like 12px any longer.
The only reason HN is using it is because `pg` made it that in 2006, at a time when it was normal and made sense.
Maybe the issue is not scaling according to DPI?
OTOH, people with 30+ inch screens probably sit a bit further away to be able to see everything without moving their head so it makes sense that even sites which take DPI into account use larger fonts because it's not really about how large something is physically on the screen but about the angular size relative to the eye.
I don't really have to do the same elsewhere, so I think the 12px font might be just a bit too small for modern 4k devices.
The better option is to create and hold a reference to the old nodes (as easy as `var old = document.documentElement`) and then after blowing everything away with document.write (with an empty* html element; don't serialize the whole tree), re-insert them under the new document.documentElement.
* Note that your approach doesn't preserve the attributes on the html element; you can fix this by either pro-actively removing the child nodes before the document.write call and rely on document.documentElement.outerHTML to serialize the attributes just as in the original, or you can iterate through the old element's attributes and re-set them one-by-one.
No need to have the default be compatible with a dead browser.
further thoughts: I just read the mdn quirks page and perhaps I will start shipping Content-Type: application/xhtml+xml as I don't really like putting the doctype in. It is the one screwball tag and requires special casing in my otherwise elegant html output engine.
<div id="root"></div>
<script src="bundle.js"></script>
but I feel there is a last tag missing: <main>...</main>
that will ensure screenreaders skip all your page "chrome" and make life much easier for a lot of folks. As a bonus mark any navigation elements inside main using <nav> (or role="navigation").Navigation should come early in document and tab order. Screen readers have shortcuts for quickly jumping around the page and skipping things. It's a normal part of the user experience. Some screen readers and settings de-prioritize navigation elements in favor of reading headings quickly, so if you don't hear the navigation right away, it's not necessarily a bug, and there's a shortcut to get to it. The most important thing to test is whether the screen reader says what you expect it to for dynamic and complex components, such as buttons and forms, e.g. does it communicate progress, errors, and success? It's usually pretty easy to implement, but this is where many apps mess up.
Correction: each screen reader + os + browser combo does things a bit differently, especially on multilanguage React sites. It is a full time job to test web sites on screen readers.
If only there was a tool that would comprehensively test all combos on all navigation styles (mouse, touch, tabbing, screen reader controls, sip and puff joysticks, chin joysticks, eye trackers, Braille terminals, etc)… but there isn’t one.
I remember HTML has an way to create global shortcuts inside a page, so you press a key combination and the cursor moves directly to a pre-defined place. If I remember that right, it's recommended to add some of those pointing to the menu, the main content, and whatever other relevant area you have.
I'm…missing the joke – could someone explain, please? Thank you.
This is now "standard" but breaks any browser that doesn't (or can't) support javascript. It's also a nightmare for SEO, accessibility and many other things (like your memory, cpu and battery usage).
But hey, it's "modern"!
Edit: In the `minify-html` Rust crate you can specify "enable_possibly_noncompliant", which leads to such things. They are exploiting the fact that HTML parsers have to accept this per the (parsing) spec even though it's not valid HTML according to the (authoring) spec.
[0] https://html.spec.whatwg.org/multipage/parsing.html#parse-er...
Ah, that's what I was missing. Thanks! The relevant part of the spec:
> user agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification.
(https://html.spec.whatwg.org/multipage/parsing.html#parse-er...)
https://github.com/h5bp/html5-boilerplate/blob/main/dist/ind...
Maybe the name was never about the Metaverse at all...
https://en.wikipedia.org/wiki/Technical_standard
> A technical standard may be developed privately or unilaterally, for example by a corporation, regulatory body, military, etc.
PDF is now an international standard (ISO 32000) but it was invented by Adobe. HTML was invented at the CERN and is now controlled by W3C (a private consortium). OpenGL was created by SGI and is maintained by the Khronos Group.
All had different "ownership" paths and yet I'd say all of them are standards.
I probably should not admit this, but I have been using Lit Elements with raw JavaScript code. Because I stopped using autocomplete awhile ago.
I guess not using TypeScript at this point is basically the equivalent for many people these days of saying that I use punch cards.
[0]: https://basecamp.com/ [1]: https://stimulus.hotwired.dev/
It's also more complex to do JS builds in Ruby when Ruby isn't up to the task of doing builds performantly and the only good option is calling out to other binaries. That can also be viewed from the outside as "we painted ourselves into a corner, and now we will discuss the virtues of standing in corners". Compared to Bun, this feels like a dated perspective.
DHH has had a lot of opinions, he's not wrong on many things but he's also not universally right for all scenarios either and the world moved past him back in like 2010.
But regardless, I didn't mean to make any argument for or against this, I'm saying this was one of the points DHH made at some point.
I'm old enough to have a first hand experience of building a Flash website that required to load couple hundred tiny xml files for configuration only to find out that some ~300kb was taking couple of minutes to load because of limited connection pool in old http.
Back then bundling and overly complicated build steps were not yet invented, so instead of serving one large XML (which would work out of the box, as there was a root xml and certain nodes instead of having data were linking to external files) I quickly decided to implement zip compression and bundle the package that way.
Fast forward to 2025 when most devs need an external library to check if number isEven and the simplest project need a toolchain that's more complicated that the whole Apollo project.
I very much enjoy writing no-build, plain vanilla JS for the sake of simplicity and ability to simply launch a project by dragging HTML file onto a browser. Not to mention the power of making changes with notepad instead of needing whole toolchain on your system.
Yes! not only that but without ShadowDOM as well.
Even frameworks with more dependencies bundling/vendoring just your dependencies at package upgrade time and using an importmap to load them is a good experience.
I'm not giving up Typescript at this point, but Typescript configured to modern `"target"` options where it is doing mostly just type stripping is a good experience, especially in a simple `--watch` loop.
I could never.
I was introduced to this decision from the Lex Fridman/DHH podcast. He talked a lot about typescript making meta programming very hard. I can see how that would be the case but I don't really understand what sort of meta programming you can do with JS. The general dynamic-ness of it I get.
My browser window is 2560x1487. 80% of the screen is blank. I have to zoom in 170% to read the content. With older blogs, I don't have this issue, it just works. Is it on purpose or it is it bad css? Given the title of the post, i think this is somewhat relevant.
From a functional standpoint: Having to scan your eyes left to right a far distance to read makes it more uncomfortable. Of course, you could debate this and I'm sure there are user preferences, but this is the idea behind limiting the content width.
From a stylistic standpoint: It just looks “bad” if text goes all the way from the left to right because the paragraph looks "too thin" like "not enough volume" and "too much whitespace." It’s about achieving a ratio of background to text that’s visually pleasurable. With really wide widths, paragraphs can end really early on the left, leaving their last lines really “naked” where you see all this whitespace inconsistently following some paragraphs. I can't really explain why this looks bad any further though. It’s kind of like picking colors combinations, the deciding factor isn't any rule: it's just "does it look pretty?"
In the case of the site in question, the content width is really small. However, if you notice, each paragraph has very few words so it may have been tightened up for style reasons. I would have made the same choice.
That said, if you have to zoom in 170% to read the content and everything else is not also tiny on your screen, it may be bad CSS.
I get not having read all the way to the end and back, I even get having margins, but it should be relative to the screen size. Fixed width is the issue I think. To avoid paragraphs looking too thin, maybe increasing the font relative to the screen size makes sense? I'd think there is a way to specify a reference screen resolution to the browser so that it can auto increase the font sizes and/or adjust the div's width.
--step-0: clamp(1.125rem, 1.0739rem + 0.2273vw, 1.25rem);
Taken from https://utopia.fyi/type/calculator?c=360,18,1.2,1240,20,1.25...Another method I like to use is to adjust the amount of words per paragraph depending on the medium. I will literally write more or less just to attain my personal favorite of 3-6 visual lines per paragraph.
Or sometimes I will more readily join paragraphs or split them more often in a text just to achieve my target.
Decreasing width is actually just really easy and also works really well when the type of content can vary.
All of this seems like some serious overkill attention to detail I know, but I guess it's a big deal for some people. For example, most people don't really care about dressing nice regularly anymore when they get older and marry because it frankly doesn't matter anymore (and they're totally right), but people who like fashion still care up until the end.
This wouldn't be the first thing I'm just weird about. Similarly, I find reading justified text to be just horrible, as I constantly lose track of what line I'm on. This one I believe has been debunked and raised as a genuine accessibility concern, but not all parts of the world have gotten around to recognising that. I'm also not a fan of serifed fonts, even in books. I'm not sure if there have been any studies made about that, as the serifs are supposed to be there to aid reading when printed on paper, but I consistently find a good sans-serif font to be better in all cases.
If Wikipedia had 70 characters per line I would never read it.
It's rare to see a site as popular as HN which has made almost zero changes to the UI over its entire history.
Kinda like how HackerNews is, it's centered and doesn't scale to my full width of the monitor.
You would think browsers themselves would handle the rest, if the website simply specified "center the content div with 60% width" or something like that.
<meta name="viewport" content="width=device-width,initial-scale=1.0">
width=device-width is actually redundant and cargo culting. All you need is initial-scale. I explain in a bit more detail here: https://news.ycombinator.com/item?id=36112889Where do the standards say it ought to work?
<meta name="color-scheme" content="light dark">
which gives you a nice automatic dark theme "for free"s/lange/lang/
> <meta name="viewport" content="width=device-width,initial-scale=1.0">
Don’t need the “.0”. In fact, the atrocious incomplete spec of this stuff <https://www.w3.org/TR/css-viewport-1/> specifies using strtod to parse the number, which is locale dependent, so in theory on a locale that uses a different decimal separator (e.g. French), the “.0” will be ignored.
I have yet to test whether <meta name="viewport" content="width=device-width,initial-scale=1.5"> misbehaves (parsing as 1 instead of 1½) with LC_NUMERIC=fr_FR.UTF-8 on any user agents.
And it's really irritating when you have the computer read something out to you that contains numbers. 53.1 km reads like you expect but 53,1 km becomes "fifty-three (long pause) one kilometer".
This makes a lot of sense when you recognize that Excel formulas, unlike proper programming languages, aren't necessarily written by people with a sufficient grasp of the English language, especially when it comes to more abstract mathematical concepts, which aren't taught in secondary English language classes at school, but it in their native language mathematics classes.
The day I found that Intellij has a built in CSV tabular editor and viewer was the best day.
[0] https://en.wikipedia.org/wiki/Decimal_separator#Conventions_...
The author might consider instead:
`<html lang="en-US">`
Additionally, it's kind of crazy we are not able to write any language with any keyboard, as nowadays we just don't know the idiom the person who sits behind the keyboard needs.
[1] https://slate.com/human-interest/2011/05/logical-punctuation...
Windows distributes ISOs labelled English (en-US) and English International (en-GB) along this divide.
It’s also a valuable divide for reasons beyond language, because the USA really does have a habit of doing its own thing, even when pretty much the rest of the world has agreed on something different. Your US English locale can default to Fahrenheit, miles, pounds, Letter, and their bizarre middle-endian date format, while International English can default to Celsius, kilometres, kilograms, A4, and DD/MM/YYYY. It doesn’t sort out everything, but it gives a much better starting point. Not every non-American prefers DD/MM/YYYY, but even if they’d prefer something like DD.MM.YY or YYYY-MM-DD, DD/MM/YYYY is a whole lot better than MM/DD/YYYY.
A dedicated one for International English, or heck, even just EU-English, would be great.
The EU websites just use en from what I can tell, but they also just use de, fr, sv, rather than specifying country (except pt-PT, which makes sense, since pt-BR is very common, but not relevant for the EU).
From what I can tell this allows some screen readers to select specific accents. Also the browser can select the appropriate spell checker (US English vs British English).
Also, wrapping the <head> tags in an actual <head></head> is optional.
You also don't need the quotes as long the attribute doesn't have spaces or the like; <html lang=en> is OK.
(kind of pointless as the average website fetches a bazillion bytes of javascript for every page load nowadays, but sometimes slimming things down as much as possible can be fun and satisfying)
What this achieves is making the syntax more irregular and harder to parse. I wish all these tolerances wouldn't exist in HTML5 and browsers simply showed an error, instead of being lenient. It would greatly simplify browser code and HTML spec.
> I wish all these tolerances wouldn't exist in HTML5 and browsers simply showed an error, instead of being lenient.
They (W3C) tried that with XHTML. It was soundly rejected by webpage authors and by browser vendors. Nobody wants the Yellow Screen of Death. https://en.wikipedia.org/wiki/File:Yellow_screen_of_death.pn...
Well, to parsing it for machines yes, but for humans writing and reading it they are helpful. For example, if you have
<p> foo
<p> bar
and change it to <div> foo
<div> bar
suddenly you've got a syntax error (or some quirks mode rendering with nested divs).The "redundancy" of closing the tags acts basically like a checksum protecting against the "background radiation" of human editing. And if you're writing raw HTML without an editor that can autocomplete the closing tags then you're doing it wrong anyway. Yes that used to be common before and yes it's a useful backwards compatibility / newbie friendly feature for the language, but that doesn't mean you should use it if you know what you're doing.
But my summarization is that the reason it doesn't work is that strict document specs are too strict for humans. And at a time when there was legitimate browser competition, the one that made a "best effort" to render invalid content was the winner.
> And at a time when there was legitimate browser competition, the one that made a "best effort" to render invalid content was the winner.
Yes, my point is that there is no reason to still write "invalid" code just because it's supported for backwards compatibility reasons. It sounds like you ignored 90% of my comment, or perhaps you replied to the wrong guy?
Close tags for <script> are required. But if people start treating it like XML, they write <script src="…" />. But that fails, because the script element requires closure, and that slash has no meaning in XML.
I think validity matters, but you have to measure validity according to the actual spec, not what you wish it was, or should have been. There's no substitute for actually knowing the real rules.
Including closing tags as a general rule might make readers think that they can rely on their presence. Also, in some cases they are prohibited. So you can't achieve a simple evenly applied rule anyway.
And I do think there's an evenly applied rule, namely: always explicitly close all non-void elements. There are only 14 void elements anyway, so it's not too much to expect readers to know them. In your own words "there's no substitute for actually knowing the real rules".
I mean, your approach requires memorizing for which 15 elements the closing tag can be omitted anyway (otherwise you'll mentally parse the document wrong (i.e. thinking a br tag needs to be closed is equally likely as thinking p tags can be nested)).
The risk that somebody might be expecting a closing tag for an hr element seems minuscule and is a small price to pay for conveniences such as (as I explained above) being able to find and replace a p tag or a li tag to a div tag.
I'm not opposed to closing <li> tags as a general a general practice. But I don't think it provides as much benefit as you're implying. Valid HTML has a number of special rules like this. Like different content parsing rules for <textarea> and <script>. Like "foreign content".
If you try to write lint-passing HTML in the hopes that you could change <li> to <div> easily, you still have to contend with the fact that such a change cannot be valid, except possibly as a direct descendant of <template>.
In contrast, paragraphs and lists do enclose content, so IMO they should have clear delineations - if nothing else, to make visually understanding the code more clear.
I’m also sure that someone will now reference another HTML attribute I didn’t think about that breaks my analogy.
It actually the XHTML 2.0 specification [1] that discarded backwards compatibility with HTML 4 was the straw that broke the camel's back. No more forms as we knew them, for example; we were supposed to use XFORMS.
That's when WHATWG was formed and broke with the W3C and created HTML5.
Thank goodness.
XHTML 2.0 didn't even really discard backwards-compatibility that much: it had its compatibility story baked in with XML Namespaces. You could embed XHTML 1.0 in an XHTML 2.0 document just as you can still embed SVG or MathML in HTML 5. XForms was expected to take a few more years and people were expecting to still embed XHTML 1.0 forms for a while into XHTML 2.0's life.
At least from my outside observer perspective, the formation of WHATWG was more a proxy war between the view of the web as a document platform versus the view of the web as an app platform. XHTML 2.0 wanted a stronger document-oriented web.
(Also, XForms had some good ideas, too. Some of what people want in "forms helpers" when they are asking for something like HTMX to standardized in browsers were a part of XForms such as JS-less fetch/XHR with in-place refresh for form submits. Some of what HTML 5 slowly added in terms of INPUT tag validation are also sort of "backports" from XForms, albeit with no dependency on XSD.)
That said, actually writing HTML that can be parsed via an XML parser is generally a good, neighborly thing to do, as it allows for easier scraping and parsing through browsers and non-browser applications alike. For that matter, I will also add additional data-* attributes to elements just to make testing (and scraping) easier.
The fact XHTML didn't gain traction is a mistake we've been paying off for decades.
Browser engines could've been simpler; web development tools could've been more robust and powerful much earlier; we would be able to rely on XSLT and invent other ways of processing and consuming web content; we would have proper XHTML modules, instead of the half-baked Web Components we have today. Etc.
Instead, we got standards built on poorly specified conventions, and we still have to rely on 3rd-party frameworks to build anything beyond a toy web site.
Stricter web documents wouldn't have fixed all our problems, but they would have certainly made a big impact for the better.
And add:
Yes, there were some initial usability quirks, but those could've been ironed out over time. Trading the potential of a strict markup standard for what we have today was a colossal mistake.
Consider JSON and CSV. Both have formal specs. But in the wild, most parsers are more lenient than the spec.
I doubt it would make a dent - e.g. in the "skipping <head>" case, you'd be replacing the error recovery mechanism of "jump to the next insertion mode" with "display an error", but a) you'd still need the code path to handle it, b) now you're in the business of producing good error messages which is notoriously difficult.
Something that would actually make the parser a lot simpler is removing document.write, which has been obsolete ever since the introduction of the DOM and whose main remaining real world use-case seems to be ad delivery. (If it's not clear why this would help, consider that document.write can write scripts that call document.write, etc.)
Who would want to use a browser which would prevent many currently valid pages from being shown?
Also obviously that's unfortunately not the case today in our real world. Doesn't mean I cannot wish things were different.
You might want always consistently terminate all tags and such for aesthetic or human-centered (reduced cognitive load, easier scanning) reasons though, I'd accept that.
You monster.
<title>Shortest valid doc</title>
<p>Body text following here
(cf explainer slides at [1] for the exact tag inferences SGML/HTML does to arrive at the fully tagged doc)[1]: https://sgmljs.sgml.net/docs/html5-dtd-slides-wrapper.html (linked from https://sgmljs.sgml.net/blog/blog1701.html)
`<thead>` and `<tfoot>`, too, if they're needed. I try to use all the free stuff that HTML gives you without needing to reach for JS. It's a surprising amount. Coupled with CSS and you can get pretty far without needing anything. Even just having `<template>` with minimal JS enables a ton of 'interactivity'.
I almost always use thead.
<bgsound src="test.mid" loop=3>
code, pre, tt, kbd, samp {
font-family: monospace, monospace;
}
But I vaguely remember there are other broken CSS defaults for links, img tags, and other stuff. An HTML 5 boilerplate guide should include that too, but I don't know of any that do.There's also this short reset: https://www.joshwcomeau.com/css/custom-css-reset/
If a site won't update itself you can... use a user stylesheet or extension to fix things like font sizes and colors without waiting for the maintainer...
BUT for scripts that rely on CSS behaviors there is a simple check... test document.compatMode and bail when it's not what you expect... sometimes adding a wrapper element and extracting the contents with a Range keeps the page intact...
ALSO adding semantic elements and ARIA roles goes a long way for accessibility... it costs little and helps screen readers navigate...
Would love to see more community hacks that improve usability without rewriting the whole thing...
If only we had UTF-8 as a default encoding in HTML5 specs too.
I’ve had my default encoding set to UTF-8 for probably 20 years at this point, so I often miss some encoding bugs, but then hit others.
And <!DOCTYPE html> if you want polyglot (X)HTML.
But in case of modern compression algorithms, some of them come with a pre-defined dictionary for websites. These usually contain the common stuff like <!DOCTYPE html> in its most used form. So doing it like everybody else might even make the compression even more effective.
This should be fixed, though.
Another funny thing here is that they say “but not limited to” (the listed encodings), but then say “must not support other encodings” (than the listed ones).
> the encodings defined in Encoding, including, but not limited to
where "Encoding" refers to https://encoding.spec.whatwg.org (probably that should be a link.) So it just means "the other spec defines at least these, but maybe others too." (e.g. EUC-JP is included in Encoding but not listed in HTML.)
*, *:before, *:after {
box-sizing: border-box;
}What if the page has mixed language content?
e.g. on the /r/france/ reddit. The page says lang="en" because every subreddit shares the same template. But actual content were generated by French speaking users.
https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
For example, you can have CSS generate the appropriate quotation marks even in nested contexts so you can painlessly use <q> tags to markup scholarly articles even if the site itself is translated and thus would have different nested quotation marks for, say, the French version embedding an English quote including a French quote or vice versa.
In your Reddit example, the top level page should be in the user’s preferred site language with individual posts or other elements using author’s language: <html lang=en>…<div lang=fr>
https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
But again there's mixed language issue
Or do users even bother to choose the correct lang?
> Browsers, search engines, assistive technologies, etc. can leverage it to:
> - Get pronunciation and voice right for screen readers
> - Improve indexing and translation accuracy
> - Apply locale-specific tools (e.g. spell-checking)
1) Pronounciation is either solved by a) automatic language detection, or b) doesn't matter. If I am reading a book, and I see text in a language I recognize, I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader. There's no benefit to my screen reader pronouncing Hungarian correctly to me, a person who doesn't speak Hungarian. On the off chance that the screen reader gets it wrong, even though I do speak Hungarian, I can certainly tell that I'm hearing english-pronounced hungarian. But there's no reason that the screen reader will get it wrong, because "Mit csináljunk, hogy boldogok legyünk?" isn't ambiguous. It's just simply Hungarian, and if I have a Hungarian screen reader installed, it's trivial to figure that out.
2) Again, if you can translate it, you already know what language it is in. If you don't know what language it is in, then you can't read it from a book, either.
3) See above. Locale is mildly useful, but the example linked in the article was strictly language, and spell checking will either a) fail, in the case of en-US/en-UK, or b) be obvious, in the case of 1) above.
The lang attribute adds nothing to the process.
Even if language identification was very simple, you're still putting the burden on the user's tools to identify something the writer already knew.
And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user. Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.
Do you have any reference of that or are you implying we shouldn't support the other thousands[0] of languages in use just because they don't have a big enough user base?
> And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user.
In the case of Hacker News or other pages with user submitted and multi-language content, you can just mark the comments' lang attribute to the empty string, which means unknown and falls back to detection. Alternatively, it's possible to let the user select the language (defaulting to their last used or an auto-detected one), Mastodon and BlueSky do that. For single language forums and sites with no user-generated content, it's fine to leave everything as the site language.
> Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.
There's also no "screen reader" nor "auto translation" in other written language. Setting the content language helps to improve accessibility features that do not exist without computers.
> It states the cargo culted reasons, but not the actual truth
This dismisses existing explanations without engaging with the mentioned reasons. The following text then doesn't provide any arguments for this.
> Pronunciation is either solved by a) automatic language detection, or b) doesn't matter.
There are more possibilities than a and b. For example, it may matter for other things than pronunciation only. Also it may improve automatic detection or make automatic detection superfluous.
> If I am reading a book [...] I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader.
A generalization of your own experience to all users and systems. Screen readers aim to convey information accessibly, not mirror human ignorance.
> There's no reason that the screen reader will get it wrong, because <hungarian sentence> isn't ambiguous
This is circular reasoning. The statement is based on the assumption that automatic detection is always accurate - which is precisely what is under debate.
> If you can translate it, you already know what language it is in.
This a non sequitur. Even if someone can translate text, that doesn't mean software or search engines can automatically identify that language.
> The lang attribute adds nothing to the proces.
This absolute claim adds nothing to the logic.
I know what you’re thinking, I forgot the most important snippet of them all for writing HTML:
<div id="root"></div> <script src="bundle.js"></script>
Lol.
-> Ok, thanx, now I do feel like I'm missing an inside joke.
To quote the alt text: "Saying 'what kind of an idiot doesn't know about the Yellowstone supervolcano' is so much more boring than telling someone about the Yellowstone supervolcano for the first time."
I had a teacher who became angry when a question was asked about a subject he felt students should already be knowledgeable about. "YOU ARE IN xTH GRADE AND STILL DON'T KNOW THIS?!" (intentional shouting uppercase). The fact that you learned it yesterday doesn't mean all humans in the world also learned it yesterday. Ask questions, always. Explain, always.