FilterHN

lxgr

3 months ago

[-]

With web search being available in all major user-facing LLM products now (and I believe in some APIs as well, sometimes unintentionally), I feel like the exact month of cutoff is becoming less and less relevant, at least in my personal experience.

The models I'm regularly using are usually smart enough to figure out that they should be pulling in new information for a given topic.

bredren

3 months ago

[-]

It still matters for software packages. Particularly python packages that have to do with programming with AI!

They are evolving quickly, with deprecation and updated documentation. Having to correct for this in system prompts is a pain.

It would be great if the models were updating portions of their content more recently than others.

For the tailwind example in parent-sibling comment, should absolutely be as up to date as possible, whereas the history of the US civil war can probably be updated less frequently.

rafram

3 months ago

[-]

> the history of the US civil war can probably be updated less frequently.

It's already missed out on two issues of Civil War History: https://muse.jhu.edu/journal/42

Contrary to the prevailing belief in tech circles, there's a lot in history/social science that we don't know and are still figuring out. It's not IEEE Transactions on Pattern Analysis and Machine Intelligence (four issues since March), but it's not nothing.

bredren

3 months ago

[-]

Let us dispel with the notion that I do not appreciate Civil War history. Ashokan Farewell is the only song I can play from memory on violin.

taydotis

3 months ago

[-]

this unlocked memories in me that were long forgotten. Ashokan Farewell !!

nickpeterson

3 months ago

[-]

I didn’t recognize it by name and thought, “I wonder if that’s the theme for pbs the civil war…”, imagine my satisfaction after pulling it up ;)

xp84

3 months ago

[-]

I started reading the first article in one of those issues only to realize it was just a preview of something very paywalled. Why does Johns Hopkins need money so badly that it has to hold historical knowledge hostage? :(

evanelias

3 months ago

[-]

Johns Hopkins is not the publisher of this journal and does not hold copyright for this journal. Why are you blaming them?

The website linked above is just a way to read journals online, hosted by Johns Hopkins. As it states, "Most of our users get access to content on Project MUSE through their library or institution. For individuals who are not affiliated with a library or institution, we provide options for you to purchase Project MUSE content and subscriptions for a selection of Project MUSE journals."

ordersofmag

3 months ago

[-]

The journal appears to be published by an office with 7 FTE's which presumably is funded by the money raised by presence of the paywall and sales of their journals and books. Fully-loaded costs for 7 folks is on the order of $750k/year. https://www.kentstateuniversitypress.com/

Someone has to foot that bill. Open-access publishing implies the authors are paying the cost of publication and its popularity in STEM reflects an availability of money (especially grant funds) to cover those author page charges that is not mirrored in the social sciences and humanities.

Unrelatedly given recent changes in federal funding Johns Hopkins is probably feeling like it could use a little extra cash (losing $800 million in USAID funding, overhead rates potential dropping to existential crisis levels, etc...)

arghwhat

3 months ago

[-]

> Open-access publishing implies the authors are paying the cost of publication and its popularity in STEM reflects an availability of money

No it implied the journal not double-dipping by extorting both the author and the reader, while not actually performing any valuable task whatsoever for that money.

drdeca

3 months ago

[-]

> while not actually performing any valuable task whatsoever for that money.

Like with complaints about landlords not producing any value, I think this is an overstatement? Rather, in both cases, the income they bring in is typically substantially larger than what they contribute, due to economic rent, but they do both typically produce some non-zero value.

vasco

3 months ago

[-]

They could pay them from the $13B endowment they have.

evanelias

3 months ago

[-]

Johns Hopkins University has an endowment of $13B, but as I already noted above, this journal has no direct affiliation with Johns Hopkins whatsoever so the size of Johns Hopkins' endowment is completely irrelevant here. They just host a website which allows online reading of academic journals.

This particular journal is published by Kent State University, which has an endowment of less than $200 million.

ChadNauseam

3 months ago

[-]

Isn’t john hopkins a university? I feel like holding knowledge hostage is their entire business model.

cempaka

3 months ago

[-]

Pretty funny to see people posting about "holding knowledge hostage" on a thread about a new LLM version from a company which 100% intends to make that its business model.

PeterStuer

3 months ago

[-]

I'd be ok with a $20 montly sub for access to all the world's academic journals.

mschuster91

3 months ago

[-]

So, yet another permanent rent seeking scheme? That's bad enough for Netflix, D+, YouTube Premium, Spotify and god knows what else that bleeds money every month out of you.

But science? That's something that IMHO should be paid for with tax money, so that it is accessible for everyone without consideration of one's ability to have money that can be bled.

slantview

3 months ago

[-]

This is exactly the problem that pay-per-use LLM access is causing. It's gating the people who need the information the most and causing a divide between the "haves" and "have nots" but at a much larger potential for dividing us.

Sure for me, $20/mo is fine, in fact, I work on AI systems, so I can mostly just use my employer's keys for stuff. But what about the rest of the world where $20/mo is a huge amount of money? We are going to burn through the environment and the most disenfranchised amongst us will suffer the most for it.

PeterStuer

3 months ago

[-]

The situation we had/have is arguably the result of the 'tax' money system. Governments lavishly funding bloated university administrations that approve equally lavish multi million access deals with a select few publishers for students and staff, while the 'general public' basically had no access at all.

spookie

3 months ago

[-]

The publishers are the problem. Your solution asks the publisher to extort less money.

Aka not happening.

pjmlp

3 months ago

[-]

Given that I am still coding against Java 17, C# 7, C++17 and such at most work projects, and more recent versions are still the exception, it is quite reasonable.

Few are on jobs where v-latest is always an option.

jinushaun

3 months ago

[-]

It’s not about the language. I get bit when they recommend old libraries or hallucinate non-existent ones.

pjmlp

3 months ago

[-]

Hallucination is indeed a problem.

As for the libraries, for using more modern libraries, usually it also requires more recent language versions.

brylie

3 months ago

[-]

I've had good success with the Context7 model context protocol tool, which allows code agents, like GitHub Copilot, to look up the latest relevant version of library documentation including code snippets: https://context7.com/

frodo999

3 months ago

[-]

We just launched an alternative called Docfork that just uses 1 API call and wraps up the request (Context7 generally uses 2) since speed was a big priority for us: https://docfork.com

3 months ago

[-]

I wonder how necessary that is. I've noticed that while Codex doesn't have any fancy tools like that (as it doesn't have internet access), it instead finds the source of whatever library you pulled in, so in Rust for example it's aware (or finds out) where the source was pulled down, and greps those files to figure out the API on the fly. Seems to work well enough and also works whatever library, private or not, updated 1 minute ago or not.

roflyear

3 months ago

[-]

It matters even with recent cutoffs, these models have no idea when to use a package or not (if it's no longer maintained, etc)

You can fix this by first figuring out what packages to use or providing your package list, tho.

3 months ago

[-]

> these models have no idea when to use a package or not (if it's no longer maintained, etc)

They have ideas about what you tell them to have ideas about. In this case, when to use a package or not, differs a lot by person, organization or even project, so makes sense they wouldn't be heavily biased one way or another.

Personally I'd look at architecture of the package code before I'd look at when the last change was/how often it was updated, and if it was years since last change or yesterday have little bearing (usually) when deciding to use it, so I wouldn't want my LLM assistant to value it differently.

andrepd

3 months ago

[-]

The fact that things from March are already deprecated is insane.

krzyk

3 months ago

[-]

sounds like npm and general js ecosystem

jordanbeiber

3 months ago

[-]

Cursor have a nice ”docs” feature for this, that have saved me from battles with constant version reversing actions from our dear LLM overlords.

MollyRealized

3 months ago

[-]

> whereas the history of the US civil war can probably be updated less frequently.

Depends on which one you're talking about.

alasano

3 months ago

[-]

The context7 MCP helps with this but I agree.

paulddraper

3 months ago

[-]

How often are base level libraries/frameworks changing in incomparable ways?

jinushaun

3 months ago

[-]

In the JavaScript world, very frequently. If latest is 2.8 and I’m coding against 2.1, I don’t want answers using 1.6. This happened enough that I now always specify versions in my prompt.

paulddraper

3 months ago

[-]

Geez

alwa

3 months ago

[-]

Normally I’d think of “geez” as a low-effort reply, but my reaction is exactly the same…

What on earth is the maintenance load like in that world these days? I wonder, do JavaScript people find LLMs helpful in migrating stuff to keep up?

zchrykng

3 months ago

[-]

The better solution would be the JavaScript people stop reinventing the world every few months.

weq

3 months ago

[-]

The more popular a library is, the more times its updated every year, the more it will suffer this fate. You always have refine prompts with specific versions and specific ways of doing things, each will be different on your use case.

hnlmorg

3 months ago

[-]

That depends on the language and domain.

MCP itself isn’t even a year old.

toomuchtodo

3 months ago

[-]

Does repo/package specific MCP solve for this at all?

aziaziazi

3 months ago

[-]

Kind of but not in the same way: the MCP option will increase the discussion context, the training option does not. Armchair expert so confirmation would be appreciated.

toomuchtodo

3 months ago

[-]

Same, I'm curious what it looks like to incrementally or micro train against, if at all possible, frequently changing data sources (repos, Wikipedia/news/current events, etc).

ironchef

3 months ago

[-]

Folks often use things like LoRAs for that.

jacob019

3 months ago

[-]

Valid. I suppose the most annoying thing related to the cutoffs, is the model's knowledge of library APIs, especially when there are breaking changes. Even when they have some knowledge of the most recent version, they tend to default to whatever they have seen the most in training, which is typically older code. I suspect the frontier labs have all been working to mitigate this. I'm just super stoked, been waiting for this one to drop.

drogus

3 months ago

[-]

In my experience it really depends on the situation. For stable APIs that have been around for years, sure, it doesn't really matter that much. But if you try to use a library that had significant changes after the cutoff, the models tend to do things the old way, even if you provide a link to examples with new code.

myfonj

3 months ago

[-]

For the recent resources it might matter: unless the training data are curated meticulously, they may be "spoiled" by the output of other LLM, or even the previous version of the one that is being trained. That's something what is generally considered dangerous, because it could potentially produce unintentional echo-chamber or even somewhat "incestuously degenerated" new model.

jgalt212

3 months ago

[-]

> The models I'm regularly using are usually smart enough to figure out that they should be pulling in new information for a given topic.

Fair enough, but information encoded in the model is return in milliseconds, information that needs to be scraped is returned in 10s of seconds.

GardenLetter27

3 months ago

[-]

I've had issues with Godot and Rustls - where it gives code for some ancient version of the API.

Aeolun

3 months ago

[-]

> some ancient version of the API

One and a half years old shudders

Tostino

3 months ago

[-]

When everything it is trying to use is deprecated, yeah it matters.

iLoveOncall

3 months ago

[-]

Web search isn't desirable or even an option in a lot of use cases that involve GenAI.

It seems people have turned GenAI into coding assistants only and forget that they can actually be used for other projects too.

lanstin

3 months ago

[-]

That's because between the two approaches "explain me this thing" or "write code to demonstrate this thing" the LLMs are much more useful on the second path. I can ask it to calculate some third derivatives, or I can ask it to write Mathematica notebook to calculate the same derivatives, and the latter is generally correct and extremely useful as is - the former requires me to scrutinize each line of logic and calculation very carefully.

It's like https://www.youtube.com/watch?v=zZr54G7ec7A where Prof. Tao uses claude to generate Lean4 proofs (which are then verifiable by machine). Great progress, very useful. While the LLM only approachs are still lacking utility for the top minds: https://mathstodon.xyz/@tao/113132502735585408

iLoveOncall

3 months ago

[-]

You have a narrow imagination. I'm talking about using GenAI for non-CS related applications, like it was advertised a year or so ago.

lanstin

3 months ago

[-]

Lacking a rigorous way to verify truth, I would be pretty wary of using it for truly important things without a human to validate.

And math research is a non-CS application, for the pedants :)

guywithahat

3 months ago

[-]

I was thinking that too, grok can comment on things that have only just broke out hours earlier, cutoff dates don't seem to matter much

3 months ago

[-]

Yeah, it seems pretty up-to-date with Elon's latest White Genocide and Holocaust Denial conspiracy theories, but it's so heavy handed about bringing them up out of the blue and pushing them in the middle of discussions about the Zod 4 and Svelte 5 and Tailwind 4 that I think those topics are coming from its prompts, not its training.

drilbo

3 months ago

[-]

while this is obviously a very damning example, tbf it does seem to be an extreme outlier.

3 months ago

[-]

Well Elon Musk is definitely an extremist, and he's certainly a bald faced out liar, and he's obviously the tin pot dictator of the prompt. So you have a great point.

Poor Grok is stuck in the middle of denying the Jewish Holocaust on one hand, while fabricating the White Genocide on the other hand.

No wonder it's so confused and demented, and wants to inject its cognitive dissonance into every conversation.

Kostic

3 months ago

[-]

It's relevant from an engineering perspective. They have a way to develop a new model in months now.

dzhiurgis

3 months ago

[-]

Ditto. Twitter's Grok is especially good at this.

lobochrome

3 months ago

[-]

It knows uv now

tzury

3 months ago

[-]

web search is an immediate limited operation training is a petabytes long term operation

BeetleB

3 months ago

[-]

Web search is costlier.

tristanb

3 months ago

[-]

Nice - it might know about Svelte 5 finally...

brulard

3 months ago

[-]

It knows about Svelte 5 for some time, but it particularly likes to mix it with Svelte 4 in very weird and broken ways.

rxtexit

3 months ago

[-]

I have experienced this for various libraries. I think it helps to paste in a package.json in the prompt.

All the models seem to struggle with React three fiber like this. Mixing and matching versions that don't make sense. I can see this being a tough problem given the nature of these models and the training data.

I am going to also try to start giving it a better skeleton to start with and stick to the particular imports when faced with this issue.

My very first prompt with claude 4 was for R3F and it imported a depreciated component as usual.

We can't expect the model to read our minds.

3 months ago

[-]

Or worse yet, React!

liorn

3 months ago

[-]

I asked it about Tailwind CSS (since I had problems with Claude not aware of Tailwind 4):

> Which version of tailwind css do you know?

> I have knowledge of Tailwind CSS up to version 3.4, which was the latest stable version as of my knowledge cutoff in January 2025.

threeducks

3 months ago

[-]

> Which version of tailwind css do you know?

LLMs can not reliably tell whether they know or don't know something. If they did, we would not have to deal with hallucinations.

redman25

3 months ago

[-]

They can if they've been post trained on what they know and don't know. The LLM can first been given questions to test its knowledge and if the model returns a wrong answer, it can be given a new training example with an "I don't know" response.

dingnuts

3 months ago

[-]

Oh that's a great idea, just do that for every question the LLM doesn't know the answer to!

That's.. how many questions? Maybe if one model generates all possible questions then

nicce

3 months ago

[-]

We should use the correct term: to not have to deal with bullshit.

dudeinjapan

3 months ago

[-]

I think “confabulation” is the best term.

“Hallucination” is seeing/saying something that a sober person clearly knows is not supposed to be there, e.g. “The Vice President under Nixon was Oscar the Grouch.”

Harry Frankfurt defines “bullshitting” as lying to persuade without regard to the truth. (A certain current US president does this profusely and masterfully.)

“Confabulation” is filling the unknown parts of a statement or story with bits that sound as-if they could be true, i.e. they make sense within the context, but are not actually true. People with dementia (e.g. a certain previous US president) will do this unintentionally. Whereas the bullshitter generally knows their bullshit to be false and is intentionally deceiving out of self-interest, confabulation (like hallucination) can simply be the consequence of impaired mental capacity.

[1] https://link.springer.com/article/10.1007/s10676-024-09775-5

nicce

3 months ago

[-]

I think the Frankfurt definition is a bit off.

E.g. from the paper ChatGPT is bullshit [1],

> Frankfurt understands bullshit to be characterized not by an intent to deceive but instead by a reckless disregard for the truth.

That is different than defining "bullshitting" as lying. I agree that "confabulation" could otherwise be more accurate. But with previous definition they are kinda synonyms? And "reckless disregard for the truth" may hit closer. The paper has more direct quotes about the term.

dudeinjapan

3 months ago

[-]

You're right. It's "intent to persuade with a reckless disregard for the truth." But even by this definition, LLMs are not (as far as we know) trying to persuade us of anything, beyond the extent that persuasion is a natural/structural feature of all language.

SparkyMcUnicorn

3 months ago

[-]

Interesting. It's claiming different knowledge cutoff dates depending on the question asked.

"Who is president?" gives a "April 2024" date.

ethbr1

3 months ago

[-]

Question for HN: how are content timestamps encoded during training?

https://docs.anthropic.com/en/release-notes/system-prompts#m...

cma

3 months ago

[-]

Claude 4's system prompt was published and contains:

"Claude’s reliable knowledge cutoff date - the date past which it cannot answer questions reliably - is the end of January 2025. It answers all questions the way a highly informed individual in January 2025 would if they were talking to someone from {{currentDateTime}}, "

polynomial

3 months ago

[-]

I thought best guesses were that Claude's system prompt ran to tens of thousands of tokens, with figures like 30,000 tokens being bandied about.

But the documentation page linked here doesn't bear that out. In fact the Claude 3.7 system prompt on this page clocks in at significantly less than 4,000 tokens.

3 months ago

[-]

they arent.

a model learns words or tokens more pedantically but has no sense of time nor cant track dates

svachalek

3 months ago

[-]

Yup. Either the system prompt includes a date it can parrot, or it doesn't and the LLM will just hallucinate one as needed. Looks like it's the latter case here.

manmal

3 months ago

[-]

Technically they don’t, but OpenAI must be injecting the current date and time into the system prompt, and Gemini just does a web search for the time when asked.

3 months ago

[-]

right but that's system prompting / in context

not really -trained- into the weights.

the point is you can't ask a model what's his training cut off date and expect a reliable answer from the weights itself.

closer you could do is have a bench with -timed- questions that could only know if had been trained for that, and you'd had to deal with hallucinations vs correctness etc

just not what llm's are made for, RAG solves this tho

stingraycharles

3 months ago

[-]

What would the benefits be of actual time concepts being trained into the weights? Isn’t just tokenizing the dates and including those as normal enough to yield benefits?

E.g. it probably has a pretty good understanding between “second world war” and the time period it lasted. Or are you talking about the relation between “current wall clock time” and questions being asked?

3 months ago

[-]

there's actually some work on training transformer models on time series data which is quite interesting (for prediction purposes)

see google TimesFM: https://github.com/google-research/timesfm

what i mean i guess is llms can -reason- linguistically about time manipulating language, but can't really experience it. a bit like physics. thats why they do bad on exercises/questions about physics/logic that their training corpus might not have seen.

3 months ago

[-]

OpenAI injects a lot of stuff, your name, sub status, recent threads, memory, etc

sometimes its interesting to peek up under the network tab on dev tools

Tokumei-no-hito

3 months ago

[-]

strange they would do that client side

3 months ago

[-]

Different teams who work backend/frontend surely, and the people experimenting on the prompts for whatever reason wanna go through the frontend pipeline.

3 months ago

[-]

its just like extra metadata associated with your account not much else

dawnerd

3 months ago

[-]

I did the same recently with copilot and it of course lied and said it knew about v4. Hard to trust any of them.

PeterStuer

3 months ago

[-]

Did you try giving it the relevant parts of the tailwind 4 documentation in the prompt context?

Phelinofist

3 months ago

[-]

Why can't it be trained "continuously"?

https://en.wikipedia.org/wiki/Catastrophic_interference

cma

3 months ago

[-]

Catastrophic forgetting

https://www.youtube.com/watch?v=amQtlkdQSfQ

3 months ago

[-]

Fascinating, thank for that link! I was reading the sub-sections of the Proposed Solutions / Rehearsal section, thinking it seemed a lot like dreaming, then got to the Spontaneous replay sub-section:

>Spontaneous replay

>The insights into the mechanisms of memory consolidation during the sleep processes in human and animal brain led to other biologically inspired approaches. While declarative memories are in the classical picture consolidated by hippocampo-neocortical dialog during NREM phase of sleep (see above), some types of procedural memories were suggested not to rely on the hippocampus and involve REM phase of the sleep (e.g.,[22] but see[23] for the complexity of the topic). This inspired models where internal representations (memories) created by previous learning are spontaneously replayed during sleep-like periods in the network itself[24][25] (i.e. without help of secondary network performed by generative replay approaches mentioned above).

The Electric Prunes - I Had Too Much To Dream (Last Night):

AlexCoventry

3 months ago

[-]

It's really not necessary, with retrieval-augmented generation. It can be trained to just check what the latest version is.

m3kw9

3 months ago

[-]

Even that, we don’t know what got updated and what didn’t. Can we assume everything that can be updated is updated?

3 months ago

[-]

> Can we assume everything that can be updated is updated?

What does that even mean? Of course an LLM doesn't know everything, so it we wouldn't be able to assume everything got updated either. At best, if they shared the datasets they used (which they won't, because most likely it was acquired illegally), you could make some guesses what they tried to update.

therein

3 months ago

[-]

> What does that even mean?

I think it is clear what he meant and it is a legitimate question.

If you took a 6 year old and told him about the things that happened in the last year and sent him off to work, did he integrate the last year's knowledge? Did he even believe it or find it true? If that information was conflicting what he knew before, how do we know that the most recent thing he is told he will take as the new information? Will he continue parroting what he knew before this last upload? These are legitimate questions we have about our black box of statistics.

aziaziazi

3 months ago

[-]

Interesting, I read GGP as:

If they stopped learning (=including) at march 31 and something popup on the internet on march 30 (lib update, new Nobel, whatever) there’s many chances it got scrapped because they probably don’t scrap everything in one day (do they ?).

That isn’t mutually exclusive with your answer I guess.

edit: thanks adolph to point out the typo.

adolph

3 months ago

[-]

Maybe I'm old school but isn't the date the last date for inclusion in the training corpus and not the date "they stopped training"?

simlevesque

3 months ago

[-]

You might be able to ask it what it knows.

minimaxir

3 months ago

[-]

So something's odd there. I asked it "Who won Super Bowl LIX and what was the winning score?" which was in February and the model replied "I don't have information about Super Bowl LIX (59) because it hasn't been played yet. Super Bowl LIX is scheduled to take place in February 2025.".

ldoughty

3 months ago

[-]

With LLMs, if you repeat something often enough, it becomes true.

I imagine there's a lot more data pointing to the super bowl being upcoming, then the super bowl concluding with the score.

Gonna be scary when bot farms are paid to make massive amounts of politically motivated false content (specifically) targeting future LLMs training

gosub100

3 months ago

[-]

A lot of people are forecasting the death of the Internet as we know it. The financial incentives are too high and the barrier of entry is too low. If you can build bots that maybe only generate a fraction of a dollar per day (referring people to businesses, posting spam for elections, poisoning data collection/web crawlers), someone in a poor country will do it. Then, the bots themselves have value which creates a market for specialists in fake profile farming.

I'll go a step further and say this is not a problem but a boon to tech companies. Then they can sell you a "premium service" to a walled garden of only verified humans or bot-filtered content. The rest of the Internet will suck and nobody will have incentive to fix it.

birn559

3 months ago

[-]

I believe identity providers will become even more important in the future as a consequence and that there will be an arm race (hopefully) ending with most people providing them some kind of official id.

gosub100

3 months ago

[-]

It might slow them down, but integration of the government into online accounts will have its own set of consequences. Some good, of course. But can chill free speech and become a huge liability for whoever collects and verifies the IDs. One hack (say of the government ID database) would spoil the whole system.

birn559

3 months ago

[-]

I agree, this would have very bad consequences regarding free speech and democracy. Next step after that would be a reestablishing of pseudonymously platforms, going full circle.

dr-smooth

3 months ago

[-]

I'm sure it's already happening.

krferriter

3 months ago

[-]

Why would you trust it to accurately say what it knows? It's all statistical processes. There's no "but actually for this question give me only a correct answer" toggle.

https://claude.ai/share/59818e6c-804b-4597-826a-c0ca2eccdc46

retrofuturism

3 months ago

[-]

When I try Claude Sonnet 4 via web:

>This is a topic that would have developed after my knowledge cutoff of January 2025, so I should search for information [...]

indigodaddy

3 months ago

[-]

Should we not necessarily assume that it would have some FastHTML training with that March 2025 cutoff date? I'd hope so but I guess it's more likely that it still hasn't trained on FastHTML?

jph00

3 months ago

[-]

Claude 4 actually knows FastHTML pretty well! :D It managed to one-shot most basic tasks I sent its way, although it makes a lot of standard minor n00b mistakes that make its code a bit longer and more complex than needed.

I've nearly finished writing a short guide which, when added to a prompt, gives quite idiomatic FastHTML code.

VectorLock

3 months ago

[-]

I'm starting to wonder if having more recent cut-off dates is more a bug than a feature.

dvfjsdhgfv

3 months ago

[-]

One thing I'm 100% is that a cut off date doesn't exist for any large model, or rather there is no single date since it's practically almost impossible to achieve that.

sib

3 months ago

[-]

But I think the general meaning of a cutoff date, D, is:

The model includes nothing AFTER date D

and not

The model includes everything ON OR BEFORE date D

Right? Definitionally, the model can't include anything that happened after training stopped.

dvfjsdhgfv

3 months ago

[-]

That's correct. However, it is almost meaningless in practice as it might as well mean that, say, 99,99% of the content is 2 years old and older, and only 0,01 was trained just before that date. So if you need functionality that's dependent on new information, you have to test it for each particular component you need.

Unfortunately I work with new APIs all the time and the cutoff date is of no much use.

koolba

3 months ago

[-]

Indeed. It’s not possible stop the world and snapshot the entire internet in a single day.

Or is it?

3 months ago

[-]

you would have an append only incremental backup snapshot of the world

gf000

3 months ago

[-]

You can trivially maximal bound it, though. If the training finished today, then today is a cutoff date.

dragonwriter

3 months ago

[-]

That's... not what a cutoff date means. Cutoff date is an upper bound, not a promise that the model is trained on every piece of information set in a fixed form before that date.

tonyhart7

3 months ago

[-]

its not a definitive "date" you cut off information, but more a "recent" material you can feed, training takes times

if you waiting for a new information, of course you are not going ever to train

cma

3 months ago

[-]

When I asked the model it told me January (for sonnet 4). Doesn't it normally get that in its system prompt?

SparkyMcUnicorn

3 months ago

[-]

Although I believe it, I wish there was some observability into what data is included here.

Both Sonnet and Opus 4 say Joe Biden is president and claim their knowledge cutoff is "April 2024".

Tossrock

3 months ago

[-]

Are you sure you're using 4? Mine says January 2025: https://claude.ai/share/9d544e4c-253e-4d61-bdad-b5dd1c2f1a63

[0] https://console.anthropic.com/workbench

SparkyMcUnicorn

3 months ago

[-]

100% sure. Tested in the Anthropic workbench[0] to double check and got the same result.

The web interface has a prompt that defines a cutoff date and who's president[1].

[1] https://docs.anthropic.com/en/release-notes/system-prompts#c...

mikeshi42

3 months ago

[-]

Can confirm the workbench does with `claude-sonnet-4-20250514` returns Biden (with a claimed April 2024 cutoff date) while Chat returns Trump (as encoded in the system prompt, with no cutoff date mention). Interesting

jaapz

3 months ago

[-]

They encoded that trump is president in the system prompt? That's afwully specific information to put in the system prompt

rafram

3 months ago

[-]

Most of their training data says that Biden is president, because it was created/scraped pre-2025. AI models have no concept of temporal context when training on a source.

People use "who's the president?" as a cutoff check (sort of like paramedics do when triaging a potential head injury patient!), so they put it into the prompt. If people switched to asking who the CEO of Costco is, maybe they'd put that in the prompt too.