FilterHN

roger_

4 months ago

[-]

An aside: please use proper capitalization. With this article I found myself backtracking thinking I’d missed a word, which was very annoying. Not sure what the authors intention was with that decision but please reconsider.

1dom

4 months ago

[-]

I agree.

I'm all for Graham's pyramid of disagreement: we should focus on the core argument, rather than superfluous things like tone, or character, or capitalisation.

But this is too much for me personally. I just realised I consider the complete lack of capitalisation on a piece of public intellectual work to be obnoxious. Sorry, it's impractical, distracting and generates unnecessary cognitive load for everyone else.

You're the top comment right now, and it's not about the content of the article at all, which is a real shame. All the wasted thought cycles across so many people :(

treetalker

4 months ago

[-]

Graham's Hierarchy, in "How to Disagree": https://paulgraham.com/disagree.html

MrMcCall

4 months ago

[-]

Yeah, people should wake up to what people are really saying.

jiggawatts

4 months ago

[-]

It's a fad associated with AI, popularised by Sam Altman especially.

It's the new black turtleneck that everyone is wearing, but will swear upon their mother's life isn't because they're copying Steve Jobs.

4 months ago

[-]

Twitter and all forms of instant messaging (SMS, WhatsApp, Discord, and the older ones like AIM/MSN/ICQ) have normalized it for many years. Sam is just one of the few large company CEOs to tweet in the style other Twitter users usually use. He's adopting the native culture rather than setting a trend.

Sam still uses capitalization in all of his essays, as do most people (including young people). In essays, like this one, it's distracting without it. I predict in 10 years the vast majority of people will all-lowercase on places like Twitter but almost no one will do it for essays.

llm_nerd

4 months ago

[-]

>Sam is just one of the few large company CEOs to tweet in the style other Twitter users usually use.

Just looked at the algorithmic feed on Twitter to makes sure trends haven't shifted overnight, and zero people in that sample of hundreds of tweets used all lower case in the tweets. Not in science. Not in AI. Not in maths or politics or entertainment or media.

Sam is trying to bE dIFFERENT. He isn't adopting a norm but instead he's trying to make one. It looks ridiculous.

4 months ago

[-]

Most people on my feed do. It's true that professionals, executives, and public intellectuals generally don't use lowercase and that Sam is trying to normalize that, but my main point was that the medium matters. Lowercase looks ridiculous for longform essays and normal for short messages and microblog posts.

albedoa

4 months ago

[-]

> Not in science. Not in AI. Not in maths or politics or entertainment or media.

That is the professional Twitter class and is not at all representative of the norm. Click through to the replies. Sentence case is probably in the vast minority.

I promise you that it is neither a fad nor started with the AI bro.

jiggawatts

4 months ago

[-]

Black turtlenecks existed before Steve Jobs wore them.

albedoa

4 months ago

[-]

Nobody is arguing that Sam Altman doesn't type in lowercase or that Steve Jobs didn't wear black turtlenecks.

albedoa

4 months ago

[-]

Oh you are the one who made the original claim lmao. What would it take for you to ever admit that you are wrong?

Izkata

4 months ago

[-]

> Twitter and all forms of instant messaging (SMS, WhatsApp, Discord, and the older ones like AIM/MSN/ICQ) have normalized it for many years.

Half true. In SMS it was just easier, but in IM it mostly was a thing because the IM client's message boundaries acted as markers for beginning/end of sentence, making the formatting unnecessary. That's why using correct capitalization and periods for single sentences came to be associated with a more formal/serious tone, it was unnecessary so including it meant you wanted to emphasize it.

Even back then we'd use regular formatting outside of IM or when sending multiple sentences in a single message.

> He's adopting the native culture rather than setting a trend.

If this was the intent, it's really coming off as that "Hello, fellow kids" meme, rather than genuine.

tabony

4 months ago

[-]

Look at AOL Instant Messenger in 1999 and this is how everyone in school wrote.

Guess what… the people who used AIM in 1999 are now middle aged…

renewiltord

4 months ago

[-]

the article has lots of caps it’s a branding and stylistic choice very e e cummings and so on not everyone likes it but clearly the author sees some utility in using caps since they name papers appropriately they also say “UPDATE” in all caps

4 months ago

[-]

> He's adopting the native culture rather than setting a trend.

Maybe he should consider there are different registers of language, and code-switching is a thing. This now increasingly applies to written language, not just spoken language.

Writing this way in structured, formal discourse would be equivalent to a CEO using internet memes and trendy slang in board meetings.

tdeck

4 months ago

[-]

It's 2025 and my phone has been automatically capitalizing the first letter of each sentence for about a decade. Doesn't it take more effort to do it the wrong way?

4 months ago

[-]

I and most others I talk to disable autocapitalization on our phones. First thing I do when I get a new phone.

4 months ago

[-]

Well at least it makes it easy to know who to avoid

msikora

4 months ago

[-]

That is so incredibly dumb. I can get it in a tweet, but please, please, please properly capitalize in anything longer than a few words!

4 months ago

[-]

i don't want to press the shift-key everytime i need a capitalized letter on my phone and i disable auto-correct because it constantly messes with native languages etc.

wasn't aware that this makes me a steve jobs copier :(

EDIT: people are seriously so emotionally invested in capitalization that i get downvoted into minus, jeez.

marssaxman

4 months ago

[-]

When you consciously choose to save yourself effort in writing, at the expense of the readers who are trying to make sense of what you are saying, the people onto whom you've transferred the cognitive load are not likely to appreciate your laziness.

4 months ago

[-]

your comment contains one, single, capitalized letter. if the first W in your comment would have been small, would that have made your comment so much harder to read?

does it make my comment so hard to read just because i don't start my sentences with big letters and don't capitalize myself(i)? really don't get the fuzz.

of course i capitalize letters in "official" texts, but we're in a comment section.

i find it doubly funny because english doesn't capitalize lots of things, anyways.

1dom

4 months ago

[-]

Hey, sorry! I don't want you to feel bad, and I don't think you should.

I think there are legitimate reasons to struggle with things like capital letters, and you've named a few: non-native language and interface device limitations. There's other accessibility reasons too, like I have some dyslexic family members who use less capitalisation than most. Also, direct or casual communication with individuals, the impact of the extra cognitive load is minimal - 1 or 2 people - so again, no real issue.

The problem I have with this piece is that it's clearly meant to be an intellectual or academic-adjacent piece, and it's clearly meant to be public/read by many people - that's why we're reading it on Hackernews. The author is not putting in the extra few seconds required to fix the problem when writing, and as a result, many thousands of people lose a few seconds each when reading. I feel there must be a point where the cost of the extra reading time to humanity outweighs the benefits of the intellectual contribution - I can't really tell because even if I overlook the capitalisation, I'm not smart enough to understand it anyway.

4 months ago

[-]

> Hey, sorry! I don't want you to feel bad, and I don't think you should.

no worries, but apologies accepted, and sorry that i make you read my non-capitalized comments :)

> The problem I have with this piece is that it's clearly meant to be an intellectual or academic-adjacent piece

that's a good point i hadn't considered. i am of course for correct capitalization and grammar in "serious" documents etc., HN for me is more like a blog. but HN links to third-party, "serious" sites, where such things should matter.

> many thousands of people lose a few seconds each when reading.

i guess i can't really comprehend why some struggle with reading non-capitalized texts in english, because it doesn't matter to my brain. but valuable to know, that other people prefer it.

> I'm not smart enough to understand it anyway.

+1. at least we can discuss about language, so it's not that we're too dumb in general ;)

4 months ago

[-]

> EDIT: people are seriously so emotionally invested in capitalization that i get downvoted into minus, jeez.

I find it weird that you would be surprised that people care about the quality of textual communication

4 months ago

[-]

maybe because i use downvotes differently than others. downvote for me means, that someone either outright lies, is very disrespectful or adds nothing to the discussion.

i don't see it as a "i don't agree with this comment"-button. opinions differ, i guess :)

tomsmeding

4 months ago

[-]

For completeness: I also disable all autocorrect/autocomplete on mobile because it's more trouble than it's worth, but I leave auto-capitalisation on. This is a thing, they're independent settings.

4 months ago

[-]

i absolutely HATE every automated keyboard-"helper", ever.

type in multiple languages constantly and all of these helpers constantly default to english usage. plus it would be weird to me if every sentence starts with a capital letter but the rest is left as it is. seems like such an arbitrary solution.

alchemist1e9

4 months ago

[-]

> It's a fad associated with AI, popularised by Sam Altman especially.

I know this is true but does anyone understand why they do it? It is actually cognitively disruptive when reading content because many of us are trained to simultaneously proof read while reading.

So I also consider it a type of cognitive attack vector and it annoys me extremely as well.

nomel

4 months ago

[-]

It's not true at all. It was very common, nearly norm in all online communication until phones started auto correcting with capitalization. You could always tell who was a mac/phone user by their use of capitalization. Sam is older than when that happened. He almost certainly spent the majority of his online life typing in lowercase, as I did. Go look at any old IRC chat log, forum, etc from his era.

saghm

4 months ago

[-]

The sibling comment to yours mentions that this is pretty common on Twitter, and I'd guess that it started as a way to making firing off tweets from a phone easier (since the extra effort to hit shift when typing on a phone keyboard is a bit higher, and the additional effort to go back and fix any typos that happen due to trying to capitalize things is also higher compared to using a traditional keyboard). Once enough people were doing it there, the style probably became recognizable and evoked a certain "vibe" that people wanted to replicate elsewhere, including in places where the original context of "hitting the shift key is more work than it's worth" doesn't hold as well.

4 months ago

[-]

> since the extra effort to hit shift when typing on a phone keyboard is a bit higher, and the additional effort to go back and fix any typos that happen due to trying to capitalize things is also higher compared to using a traditional keyboard

I'm a bit confused about this. Do people turn off auto capitalisation on their phones? I very rarely have to press shift on my phone

4 months ago

[-]

I and everyone I know turns it off. On many platforms and in many cultures, capitalization often implies Solemness or even rudeness in 1-on-1 conversations, and otherwise comes across as being out of touch in other kinds of communication.

alchemist1e9

4 months ago

[-]

Wow then I guess everyone finds me very rude. I capitalize, use correct grammar and spelling to the best of my ability in text messages just like any written communication. I find it rude when people don’t as I interpret it as they don’t even care enough about our communication to take the small effort to make their writing easy to comprehend and understand!

4 months ago

[-]

I’ve never encountered anyone turning it off. I avoid socialising with people who think it’s rude to use capitalisation.

Izkata

4 months ago

[-]

It's not directly rude, it's more like a serious tone of voice. But it only works like that when used unnecessarily, like in chat or IM where the message boundary styling doubles as a sentence boundary.

Using the chat/IM style outside of that context just doesn't work and looks really odd, like it's obviously someone who didn't learn those norms and is now mimicking them without understanding them.

4 months ago

[-]

I only communicate seriously

Izkata

4 months ago

[-]

That's different from a "serious tone of voice" though. Think like the way a parent might start talking to their kid when they're angry but not yelling, with something like "You better get home right now."

Or another example: "Call me" is a just a regular "let's chat about something", but "Call me." is "something bad happened I need to tell you about, so prepare yourself".

Interestingly, you're actually partially doing what I described on 2 of your 3 messages in this chain - you left out the last period because HN formatting makes it obvious where the sentence ends. So even if this norm did apply here (it doesn't really), you're not using the serious tone of voice.

1dom

4 months ago

[-]

This is interesting. I didn't realise so many people disable it. A lot of what you're saying is completely backwards to my life experience.

For me and I guess most people I communicate with on e.g. Whatsapp. "Call me." is normal, expected, everything is fine, just need a phone call. "call me" is more like something has gone so horribly wrong (or someone is so incredibly pissed off) they've lost the ability to communicate normally. I wouldn't be offended, more like concerned.

saghm

4 months ago

[-]

I hadn't considered that. My best guess is that it was originally an intentional decision based on consistency with nouns that people might have mid-sentence that they can't rely on autocorrect to capitalize properly.

andai

4 months ago

[-]

>So I also consider it a type of cognitive attack vector

What does this mean?

alchemist1e9

4 months ago

[-]

I mean it seems very intentional and a passive aggressive technique to make the read feel disoriented while reading the content.

I can literally feel it assaulting my reading speed.

4 months ago

[-]

Capitalization and punctuation are to written language what pronunciation and stress are to spoken language. If someone was mispronouncing every word, using incorrect vowels, stressing the wrong syllables, etc., you'd have a really hard time understanding anything they're saying. Writing with incorrect punctuation and capitalization impedes comprehension in the same way.

omoikane

4 months ago

[-]

This looks like a personal blog post, in a blog where the author have avoided capitalization fairly consistently. The blog post was likely not meant to be a research paper, and reading it as a research paper is probably setting the wrong expectations.

If people wanted to read formal-looking formatted text, the author has linked to one in the second paragraph:

https://arxiv.org/abs/1511.07916 - Natural Language Understanding with Distributed Representation

https://www.theguardian.com/society/2025/feb/18/death-of-cap...

PhilippGille

4 months ago

[-]

4 months ago

[-]

Well I will fight this trend to the death. Thankfully I don't like to surround myself with philistines

4 months ago

[-]

The war is already over.

I 100% agree lowercase in longform essays is ridiculous, but I think for everything aside from essays, articles, papers, long emails, and some percentage of multi-paragraph site comments, lowercase is absolutely going to be the default online in 20 years.

marssaxman

4 months ago

[-]

So "everything that matters will continue to be written normally, but throwaway chatter will be written casually, where the specific features connoting casualness are a matter of ever-changing fashion"? Thinking back on '90s-era IRC chats, I suppose it was ever thus.

4 months ago

[-]

> for everything aside from essays, articles, papers, long emails, and some percentage of multi-paragraph site comment

That’s already the only stuff worth reading and always has been. No loss then

https://kyunghyuncho.me/bye-felix/

jgalt212

4 months ago

[-]

I think he knows he did something non-standard, as his previous post from seven weeks ago has correct capitalization.

mempko

4 months ago

[-]

At least for now, maybe this is the best way to tell if a text is written by an LLM or a person. An LLM will capitalize!

cassepipe

4 months ago

[-]

Please ChatGPT, decapitalize that comment above for me

    at least for now, maybe this is the best way to tell if a text is written by an llm or a person. an llm will capitalize!

svachalek

4 months ago

[-]

Unless you tell it not to...

handsclean

4 months ago

[-]

This is the norm for Gen Z. We don’t see it because children don’t set social norms where adults are present too, but with the oldest of Gen Z about to turn 30, you and I should expect to see this more and more, and get used to it. If every kid can handle it, I think we can, too.

aidenn0

4 months ago

[-]

It doesn't change the point of your comment necessarily, but as far as TFA goes, the author was teaching a University course in 2015, so is highly unlikely to be Gen Z.

4 months ago

[-]

Kids also pee in their pants at a rate vastly exceeding that of adults, but they usually stop doing it before they hit 30.

timdellinger

4 months ago

[-]

an opinion, and a falsifiable hypothesis:

call me old-fasahioned, but two spaces after a period will solve this problem if people insist on all-lower-case. this also helps distinguish between abbreviations such as st. martin's and the ends of sentences.

i'll bet that the linguistics experimentalists have metrics that quantify reading speed measurements as determined by eye tracking experiments, and can verify this.

thaumasiotes

4 months ago

[-]

> [I]'ll bet that the linguistics experimentalists have metrics that quantify reading speed measurements as determined by eye tracking experiments, and can verify this.

You appear to be trolling for the sake of trolling, but for reference: reading speed is determined by familiarity with the style of the text. Diverging from whatever people are used to will make them slower.

There is no such thing as "two spaces" in HTML, so good luck with that.

recursive

4 months ago

[-]

> There is no such thing as "two spaces" in HTML, so good luck with that.

Code point 160 followed by 32. In other words `  ` will do it.

nomel

4 months ago

[-]

There's: U+3000, ideographic space. It's conceptually fitting, with sentence separation being a good fit for "idea separation".

edit: well I tried to give an example, but hn seems to replace it with regular space. Here's a copy paste version: https://unicode-explorer.com/c/3000

agalunar

4 months ago

[-]

Belying the name somewhat, I believe U+3000 is specifically meant for use with Sinoform logographs, having the size of a (fullwidth character) cell, and so it makes little sense in other contexts.

nomel

4 months ago

[-]

The extended horizontal size is the only goal here. The dimensions for a sinoform is still related to pt size, so the relative spacing, compared to chr(32), at the same pt size, is reasonably larger.

But...the vertical dimensions don't scale so well, at least in my browser. It causes a slight downward shift.

agalunar

4 months ago

[-]

You’d perhaps be better off using U+2002 EN SPACE (or the canonically equivalent U+2000 EN QUAD).

From what I recall, the size of a typical interword space is ⅓ of an em, and traditionally a bit more than this is inserted between sentences (but less than ⅔ of an em). The period itself introduces a fair amount of negative space, and only a skosh more is needed if any.

fc417fc802

4 months ago

[-]

( do away with both capitalization and periods ( use tabs to separate sentences ( problem solved [( i'm only kind of joking here ( i actually think that would work pretty well ))] )))

( or alternatively use nested sexp to delineate paragraphs, square brackets for parentheticals [( this turned out to be an utterly cursed idea, for the record )] )

elevatedastalt

4 months ago

[-]

Yeah, things like these make me glad that humans don't live forever. By the time you are 30 you already hate the way so many things work around you. If you argue about it you are called a philistine luddite who can't stomach change. There's no right or wrong, but it's good you don't have to deal with stuff you find annoying indefinitely. You just... die eventually.

It's a better equilibrium this way and one of the main reasons I don't care much for transhumanism.

jppope

4 months ago

[-]

Language evolves. Capitalization is an artifact of a period where capitalizing the first letter made a lot of sense for the medium (parchment/paper). Modern culture is abandoning it for speed efficiency on keyboards or digital keyboards. A purist would say that we should still be using all capitals like they did in Greek/Latin which again was related to the medium.

I'll likely continue using Capitalization as a preference and that we use it to express conventions in programming, but I totally understand the movement to drop it and frankly its logical enough.

bicx

4 months ago

[-]

It's slower for sure, but capitalization does impart information: beginning of sentences, proper nouns, acronyms, and such. Sure, you could re-read the sentence until you figured all that out, but you are creating unnecessary hitches in the reading process. Capitalization is an optimization for the reader, and lack of capitalization is optimization for the writer.

jppope

4 months ago

[-]

Its just a convention. You just get used to it. Whether we like it or not, written language is heading that way. Readers don't generally read letters in words anyway, they read the whole word after a certain literacy level has been achieved.

4 months ago

[-]

I think capitalization will slowly go by the wayside in most media, but one hill I'll always die on is punctuation. "i'll go" vs. "ill go"... the latter is just too crude. Many gen Z/alpha do omit it, though.

sodapopcan

4 months ago

[-]

As much as I dislike it sometimes, language absolutely does evolve. Proper capitalization does not fit into this, though. It can completely change the meaning of something if it is not capitalized. It's not just at the beginning of sentences, it's proper nouns within a sentence. Unfortunately I don't have an example of this handy but it's happened to me several times in my life where I've been completely confused by this (mostly on Slack).

This is a merely showing off your personal style which, when writing a technical article, I don't care about.

4 months ago

[-]

> As much as I dislike it sometimes, language absolutely does evolve.

Pointing out that language evolves helps to explain how the current established conventions came to be, but it is not an argument that there are (or should be) no established conventions.

If you are speaking in a way that diverges from what most people understand, then you are miscommunication and are making demonstrable errors precisely because the language has evolved into what it currently is, and not something else.

fc417fc802

4 months ago

[-]

> we use it to express conventions in programming

Interestingly programming is the one place where I ditch it almost entirely (at least in my personal code bases).

scubbo

4 months ago

[-]

Be thankful you haven't encountered the clusterfuck that is the Go Language :P

maurits

4 months ago

[-]

For people interested in the softmax, log sum exp and energy models, have a look at "Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One" [1]

[1]: https://arxiv.org/abs/1912.03263

stared

4 months ago

[-]

There are many useful tricks - like cosine distance.

In contrast, softmax has a very deep grounding in statistical physics - where it is called the Boltzmann distribution. In fact, this connection between statistical physics and machine learning was so fundamental that it was a key part of the 2024 Nobel Prize in Physics awarded to Hopfield and Hinton.

LudwigNagasena

4 months ago

[-]

Study of thermodynamics gave rise to many concepts in information theory and statistics, but I wouldn't say that there is any direct connection per se between thermodynamics and any field where statistics or information theory are applicable. And the reasoning behind the 2024 Nobel Prize in Physics was... quite innovative.

mitthrowaway2

4 months ago

[-]

> I wouldn't say that there is any direct connection per se between thermodynamics and any field where statistics or information theory are applicable.

Thermodynamics can absolutely be studied through both a statistical mechanics and an information theory lens, and many physicists have found this to be quite productive and enlightening. Especially when it gets to tricky cases involving entropy, like Maxwell's Demon and Landauer's Eraser, one struggles not to do so.

[1] https://en.m.wikipedia.org/wiki/ESA_(company)

creakingstairs

4 months ago

[-]

Because the domain is a Korean name, I half-expected this to be about an old Korean game company[1] with the same name. They made some banger RPGs at the time and had really great art books.

jmlim00

4 months ago

[-]

That's what I thought too.

nobodywillobsrv

4 months ago

[-]

Softmax’s exponential comes from counting occupation states. Maximize the ways to arrange things with logits as energies, and you get exp(logits) over a partition function, pure Boltzmann style. It’s optimal because it’s how probability naturally piles up.

efavdb

4 months ago

[-]

I personally don’t think much of the maximum entropy principle. If you look at the axioms that inform it, they don’t really seem obviously correct. Further, the usual qualitative argument is only right in a certain lens: namely they say choosing anything else would require you to make more assumptions about your distribution than is required. Yet it’s easy to find examples where the max entropy solution suppresses some states more than is necessary etc., which to me contradicts that qualitative argument.

semiinfinitely

4 months ago

[-]

right and it should be totally obvious that we would choose an energy function from statistical mechanics to train our hotdog-or-not classifier

C-x_C-f

4 months ago

[-]

No need to introduce the concept of energy. It's a "natural" probability measure on any space where the outcomes have some weight. In particular, it's the measure that maximizes entropy while fixing the average weight. Of course it's contentious if this is really "natural," and what that even means. Some hardcore proponents like Jaynes argue along the lines of epistemic humility but for applications it really just boils down to it being a simple and effective choice.

yorwba

4 months ago

[-]

In statistical mechanics, fixing the average weight has significance, since the average weight i.e. average energy determines the total energy of a large collection of identical systems, and hence is macroscopically observable.

But in machine learning, it has no significance at all. In particular, to fix the average weight, you need to vary the temperature depending on the individual weights, but machine learning practicioners typically fix the temperature instead, so that the average weight varies wildly.

So softmax weights (logits) are just one particular way to parameterize a categorical distribution, and there's nothing precluding another parameterization from working just as well or better.

C-x_C-f

4 months ago

[-]

I agree that the choice of softmax is arbitrary; but if I may be nitpicky, the average weight and the temperature determine one another (the average weight is the derivative of the log of the partition function with respect to the inverse temperature). I think the arbitrariness comes more from choosing logits as a weight in the first place.

[0]:https://tutorial.math.lamar.edu/classes/calciii/lagrangemult...

xelxebar

4 months ago

[-]

The connection isn't immediately obvious, but it's simply because solving for the maximum entry distribution that achieves a given expectation value produces the Botlzmann distribution. In stat mech, our "classifier" over (micro-)states is energy; in A.I. the classifier is labels.

For details, the keyword is Lagrange multiplier [0]. The specific application here is maximizing f as the entropy with the constraint g the expectation value.

If you're like me at all, the above will be a nice short rabbit hole to go down!

Y_Y

4 months ago

[-]

The way that energy comes in is that you have a fixed (conserved) amount of it and you have to portion it out among your states. There's nothing inherently energy-related about, it just happens that we often want to look energy distributions and lots of physical systems distribute energy this way (because it's the energy distribution with maximal entropy given the constraints).

(After I wrote this I saw the sibling comment from xelxebar which is a better way of saying the same thing.)

semiinfinitely

4 months ago

[-]

i think that log-sum-exp should actually be the function that gets the name "softmax" because its actually a soft maximum over a set of values. And what we call "softmax" should be called "grad softmax" (since grad of logsumexp is softmax).

GistNoesis

4 months ago

[-]

softmax is badly named and should rather be called softargmax.

incognito124

4 months ago

[-]

How to sample from a categorical: https://news.ycombinator.com/item?id=42596716

Note: I am the author

lelag

4 months ago

[-]

I'm happy to see you repaired your keyboard.

littlestymaar

4 months ago

[-]

I think they mean they're the author of the post they link, not the author of the OP with his broken caps.

lelag

4 months ago

[-]

Oh, right. I misunderstood.

littlestymaar

4 months ago

[-]

Off topic: Unlike many out there I'm not usually bothered by lack of capitalization in comments or tweets, but for an essay like this, it makes the paragraphs so hard to read!

Validark

4 months ago

[-]

If someone can't even put in a minimal amount of effort for basic punctuation and grammar, I'm not going to read their article on something more sophisticated. If you go for the lowercase i's because you want a childish or slob aesthetic, that can be funny in context. But in math or computing, I'm not going to care what someone thinks if they don't know or don't care that 'I' should be capitalized. Grammarly has a free tier. ChatGPT has a free tier. Paste your word slop into one of those and it will fix the basics for you.

LeifCarrotson

4 months ago

[-]

We just had a similar discussion at work the other day when one of our junior engineers noticed that a senior engineer was reflexively tapping the space bar twice after each sentence. That, too, was good style back when we were writing on typewriters or using monospace fonts with no typesetting. Only a child or a slob would fail to provide an extra gap between sentences, it would be distracting to readers and difficult to locate full stops without that!

But it's 2025, and HTML and Word and the APA and MLA and basically everyone agree that times and style guides have changed.

I agree that not capitalizing the first letter in a sentence is a step too far.

For a counter-example, I personally don't care whether they use the proper em-dash, en-dash, or hyphen--I don't even know when or how to insert the right one with my keyboard. I'm sure there are enthusiasts who care very deeply about using the right ones, and feel that my lack of concern for using the right dash is lazy and unrefined. Culture is changing as more and more communication happens on phone touchscreens, and I have to ask myself - am I out of touch? No, it's the children who are wrong. /s

But I strongly disagree that the author should pass everything they write through Grammarly or worse, through ChatGPT.

dagss

4 months ago

[-]

Same here, I just had to stop reading.

janalsncm

4 months ago

[-]

This is a really intuitive explanation, thanks for posting. I think everyone’s first intuition for “how do we turn these logits into probabilities” is to use a weighted sum of the absolute values of the numbers. The unjustified complexity of softmax annoyed me in college.

The author gives a really clean explanation for why that’s hard for a network to learn, starting from first principles.

calebm

4 months ago

[-]

Funny timing, I just used softmax yesterday to turn a list of numbers (some of which could be negative) into a probability distribution (summing up to 1). So useful. It was the perfect tool for the job.

yorwba

4 months ago

[-]

The author admits they "kinda stopped reading this paper" after noticing that they only used one hyperparameter configuration, which I agree is a flaw in the paper, but that's not an excuse for sloppy treatment of the rest of the paper. (It would however, be an excuse to ignore it entirely.)

In particular, the assumption that |a_k| ≈ 0 initially is incorrect, since in the original paper https://arxiv.org/abs/2502.01628 the a_k are distances from one vector to multiple other vectors, and they're unlikely to be initialized in such a way that the distance is anywhere close to zero. So while the gradient divergence near 0 could certainly be a problem, it doesn't have to be as fatal as the author seems to think it is.

totalizator

4 months ago

[-]

That would be "welcome to the world of academia". My post-doc friends won't even read a blog post prior to checking author's resume. They are very dismissive every time they notice anything they consider sloppy etc.

throw_pm23

4 months ago

[-]

You seem to merge two different points. Not reading based on sloppiness is defensible. Not reading based on the author's resume less so.

aidenn0

4 months ago

[-]

When "sloppiness" is defined as "did anything on my personal list of pet peeves" (and it often is) then the defensibility of the two begin to converge.

lblume

4 months ago

[-]

Which is a problem with the reputation-based academic system itself ("publish or perish") and not individuals working in it.

AnotherGoodName

4 months ago

[-]

For the answer of is "is softmax the only way to turn unnormalized real values into a categorial distribution" you can just use statistics.

Eg. Using Bayesian stats, if i assume an even prior (pretend i have no assumptions about how biased it is), i see a coin flip heads 4 times in a row, what's the probability of it being heads?

Via a long winded proof using the dirichlet distribution Bayesian stats will say "add one to the top and two to the bottom". Here we saw 4/4 heads. So we guess 5/6 chance of being heads (+1 to the top, +2 to the bottom) the next time or a 1/6 chance of being tails. This represents that the statistical model is assuming some bias in the coin.

That's normalized as a probability against 1 which is what we want. It works for multiple probabilities as well, you add to the bottom as many different outcomes as you have. The Dirichlet distribution allows for real numbers and you can support this too. If you feel this gives too much weight to the possibility of the coin being biased you can actually simply add more to the top and bottom which is the same as accounting for this in your prior, eg. add 100 to the top and 200 to the bottom instead.

Now this has a lot of differences with outcomes compared to softmax. It actually gives everything a non-zero chance rather than using the classic sigmoid activation function that softmax has underneath which moves things to almost absolute 0 or 1. But... other distributions like this are very helpful in many circumstances. Do you actually think the chance of tails becomes 0 if you see heads flipped 100 times in a row? Of course not.

So anyway the softmax function fits things to a particular type of distribution but you can actually fit pretty much anything to any distribution with good old statistics. Choose the right one for your use case.

programjames

4 months ago

[-]

There's a rather simple proof for this "add one to the top, n to the bottom" posterior. Take a unit-perimeter circle, and divide it into n regions for each of the possible outcomes. Then lay out your k outcomes into their corresponding regions. You have n dividers and k outcomes for a total of n + k points. By rotational symmetry, the distance between any two points is equal in expectation. Thus, the expected size of any region is (1 + # outcomes in the region) / (n + k). So, if you were to take one more sample

E[sample in region i] = (1 + # outcomes in the region) / (n + k)

But, the indicator variable "sample in region i" is always either zero or one, so this must equal the probability!

bambax

4 months ago

[-]

OT: refusing to capitalize the first word of each sentence is an annoying posture that makes reading what you write more difficult. I tend to do it too when taking notes for myself because I'm the only reader and it saves picoseconds of typing; but I wouldn't dream on inflicting it upon others.

sodapopcan

4 months ago

[-]

It shows how laid back they are, man.