I'm all for Graham's pyramid of disagreement: we should focus on the core argument, rather than superfluous things like tone, or character, or capitalisation.
But this is too much for me personally. I just realised I consider the complete lack of capitalisation on a piece of public intellectual work to be obnoxious. Sorry, it's impractical, distracting and generates unnecessary cognitive load for everyone else.
You're the top comment right now, and it's not about the content of the article at all, which is a real shame. All the wasted thought cycles across so many people :(
It's the new black turtleneck that everyone is wearing, but will swear upon their mother's life isn't because they're copying Steve Jobs.
Sam still uses capitalization in all of his essays, as do most people (including young people). In essays, like this one, it's distracting without it. I predict in 10 years the vast majority of people will all-lowercase on places like Twitter but almost no one will do it for essays.
Half true. In SMS it was just easier, but in IM it mostly was a thing because the IM client's message boundaries acted as markers for beginning/end of sentence, making the formatting unnecessary. That's why using correct capitalization and periods for single sentences came to be associated with a more formal/serious tone, it was unnecessary so including it meant you wanted to emphasize it.
Even back then we'd use regular formatting outside of IM or when sending multiple sentences in a single message.
> He's adopting the native culture rather than setting a trend.
If this was the intent, it's really coming off as that "Hello, fellow kids" meme, rather than genuine.
Guess what… the people who used AIM in 1999 are now middle aged…
Just looked at the algorithmic feed on Twitter to makes sure trends haven't shifted overnight, and zero people in that sample of hundreds of tweets used all lower case in the tweets. Not in science. Not in AI. Not in maths or politics or entertainment or media.
Sam is trying to bE dIFFERENT. He isn't adopting a norm but instead he's trying to make one. It looks ridiculous.
That is the professional Twitter class and is not at all representative of the norm. Click through to the replies. Sentence case is probably in the vast minority.
I promise you that it is neither a fad nor started with the AI bro.
wasn't aware that this makes me a steve jobs copier :(
EDIT: people are seriously so emotionally invested in capitalization that i get downvoted into minus, jeez.
I think there are legitimate reasons to struggle with things like capital letters, and you've named a few: non-native language and interface device limitations. There's other accessibility reasons too, like I have some dyslexic family members who use less capitalisation than most. Also, direct or casual communication with individuals, the impact of the extra cognitive load is minimal - 1 or 2 people - so again, no real issue.
The problem I have with this piece is that it's clearly meant to be an intellectual or academic-adjacent piece, and it's clearly meant to be public/read by many people - that's why we're reading it on Hackernews. The author is not putting in the extra few seconds required to fix the problem when writing, and as a result, many thousands of people lose a few seconds each when reading. I feel there must be a point where the cost of the extra reading time to humanity outweighs the benefits of the intellectual contribution - I can't really tell because even if I overlook the capitalisation, I'm not smart enough to understand it anyway.
does it make my comment so hard to read just because i don't start my sentences with big letters and don't capitalize myself(i)? really don't get the fuzz.
of course i capitalize letters in "official" texts, but we're in a comment section.
i find it doubly funny because english doesn't capitalize lots of things, anyways.
type in multiple languages constantly and all of these helpers constantly default to english usage. plus it would be weird to me if every sentence starts with a capital letter but the rest is left as it is. seems like such an arbitrary solution.
I find it weird that you would be surprised that people care about the quality of textual communication
i don't see it as a "i don't agree with this comment"-button. opinions differ, i guess :)
I know this is true but does anyone understand why they do it? It is actually cognitively disruptive when reading content because many of us are trained to simultaneously proof read while reading.
So I also consider it a type of cognitive attack vector and it annoys me extremely as well.
I'm a bit confused about this. Do people turn off auto capitalisation on their phones? I very rarely have to press shift on my phone
Using the chat/IM style outside of that context just doesn't work and looks really odd, like it's obviously someone who didn't learn those norms and is now mimicking them without understanding them.
Or another example: "Call me" is a just a regular "let's chat about something", but "Call me." is "something bad happened I need to tell you about, so prepare yourself".
Interestingly, you're actually partially doing what I described on 2 of your 3 messages in this chain - you left out the last period because HN formatting makes it obvious where the sentence ends. So even if this norm did apply here (it doesn't really), you're not using the serious tone of voice.
For me and I guess most people I communicate with on e.g. Whatsapp. "Call me." is normal, expected, everything is fine, just need a phone call. "call me" is more like something has gone so horribly wrong (or someone is so incredibly pissed off) they've lost the ability to communicate normally. I wouldn't be offended, more like concerned.
What does this mean?
I can literally feel it assaulting my reading speed.
If people wanted to read formal-looking formatted text, the author has linked to one in the second paragraph:
https://arxiv.org/abs/1511.07916 - Natural Language Understanding with Distributed Representation
I 100% agree lowercase in longform essays is ridiculous, but I think for everything aside from essays, articles, papers, long emails, and some percentage of multi-paragraph site comments, lowercase is absolutely going to be the default online in 20 years.
That’s already the only stuff worth reading and always has been. No loss then
call me old-fasahioned, but two spaces after a period will solve this problem if people insist on all-lower-case. this also helps distinguish between abbreviations such as st. martin's and the ends of sentences.
i'll bet that the linguistics experimentalists have metrics that quantify reading speed measurements as determined by eye tracking experiments, and can verify this.
( or alternatively use nested sexp to delineate paragraphs, square brackets for parentheticals [( this turned out to be an utterly cursed idea, for the record )] )
You appear to be trolling for the sake of trolling, but for reference: reading speed is determined by familiarity with the style of the text. Diverging from whatever people are used to will make them slower.
There is no such thing as "two spaces" in HTML, so good luck with that.
Code point 160 followed by 32. In other words ` ` will do it.
edit: well I tried to give an example, but hn seems to replace it with regular space. Here's a copy paste version: https://unicode-explorer.com/c/3000
But...the vertical dimensions don't scale so well, at least in my browser. It causes a slight downward shift.
From what I recall, the size of a typical interword space is ⅓ of an em, and traditionally a bit more than this is inserted between sentences (but less than ⅔ of an em). The period itself introduces a fair amount of negative space, and only a skosh more is needed if any.
at least for now, maybe this is the best way to tell if a text is written by an llm or a person. an llm will capitalize!
It's a better equilibrium this way and one of the main reasons I don't care much for transhumanism.
I'll likely continue using Capitalization as a preference and that we use it to express conventions in programming, but I totally understand the movement to drop it and frankly its logical enough.
This is a merely showing off your personal style which, when writing a technical article, I don't care about.
Interestingly programming is the one place where I ditch it almost entirely (at least in my personal code bases).
In contrast, softmax has a very deep grounding in statistical physics - where it is called the Boltzmann distribution. In fact, this connection between statistical physics and machine learning was so fundamental that it was a key part of the 2024 Nobel Prize in Physics awarded to Hopfield and Hinton.
Thermodynamics can absolutely be studied through both a statistical mechanics and an information theory lens, and many physicists have found this to be quite productive and enlightening. Especially when it gets to tricky cases involving entropy, like Maxwell's Demon and Landauer's Eraser, one struggles not to do so.
Note: I am the author
The author gives a really clean explanation for why that’s hard for a network to learn, starting from first principles.
Eg. Using Bayesian stats, if i assume an even prior (pretend i have no assumptions about how biased it is), i see a coin flip heads 4 times in a row, what's the probability of it being heads?
Via a long winded proof using the dirichlet distribution Bayesian stats will say "add one to the top and two to the bottom". Here we saw 4/4 heads. So we guess 5/6 chance of being heads (+1 to the top, +2 to the bottom) the next time or a 1/6 chance of being tails. This represents that the statistical model is assuming some bias in the coin.
That's normalized as a probability against 1 which is what we want. It works for multiple probabilities as well, you add to the bottom as many different outcomes as you have. The Dirichlet distribution allows for real numbers and you can support this too. If you feel this gives too much weight to the possibility of the coin being biased you can actually simply add more to the top and bottom which is the same as accounting for this in your prior, eg. add 100 to the top and 200 to the bottom instead.
Now this has a lot of differences with outcomes compared to softmax. It actually gives everything a non-zero chance rather than using the classic sigmoid activation function that softmax has underneath which moves things to almost absolute 0 or 1. But... other distributions like this are very helpful in many circumstances. Do you actually think the chance of tails becomes 0 if you see heads flipped 100 times in a row? Of course not.
So anyway the softmax function fits things to a particular type of distribution but you can actually fit pretty much anything to any distribution with good old statistics. Choose the right one for your use case.
E[sample in region i] = (1 + # outcomes in the region) / (n + k)
But, the indicator variable "sample in region i" is always either zero or one, so this must equal the probability!
In particular, the assumption that |a_k| ≈ 0 initially is incorrect, since in the original paper https://arxiv.org/abs/2502.01628 the a_k are distances from one vector to multiple other vectors, and they're unlikely to be initialized in such a way that the distance is anywhere close to zero. So while the gradient divergence near 0 could certainly be a problem, it doesn't have to be as fatal as the author seems to think it is.
But in machine learning, it has no significance at all. In particular, to fix the average weight, you need to vary the temperature depending on the individual weights, but machine learning practicioners typically fix the temperature instead, so that the average weight varies wildly.
So softmax weights (logits) are just one particular way to parameterize a categorical distribution, and there's nothing precluding another parameterization from working just as well or better.
For details, the keyword is Lagrange multiplier [0]. The specific application here is maximizing f as the entropy with the constraint g the expectation value.
If you're like me at all, the above will be a nice short rabbit hole to go down!
[0]:https://tutorial.math.lamar.edu/classes/calciii/lagrangemult...
(After I wrote this I saw the sibling comment from xelxebar which is a better way of saying the same thing.)
But it's 2025, and HTML and Word and the APA and MLA and basically everyone agree that times and style guides have changed.
I agree that not capitalizing the first letter in a sentence is a step too far.
For a counter-example, I personally don't care whether they use the proper em-dash, en-dash, or hyphen--I don't even know when or how to insert the right one with my keyboard. I'm sure there are enthusiasts who care very deeply about using the right ones, and feel that my lack of concern for using the right dash is lazy and unrefined. Culture is changing as more and more communication happens on phone touchscreens, and I have to ask myself - am I out of touch? No, it's the children who are wrong. /s
But I strongly disagree that the author should pass everything they write through Grammarly or worse, through ChatGPT.
/s