https://docs.anthropic.com/en/docs/about-claude/models/overv...
The models I'm regularly using are usually smart enough to figure out that they should be pulling in new information for a given topic.
They are evolving quickly, with deprecation and updated documentation. Having to correct for this in system prompts is a pain.
It would be great if the models were updating portions of their content more recently than others.
For the tailwind example in parent-sibling comment, should absolutely be as up to date as possible, whereas the history of the US civil war can probably be updated less frequently.
It's already missed out on two issues of Civil War History: https://muse.jhu.edu/journal/42
Contrary to the prevailing belief in tech circles, there's a lot in history/social science that we don't know and are still figuring out. It's not IEEE Transactions on Pattern Analysis and Machine Intelligence (four issues since March), but it's not nothing.
The website linked above is just a way to read journals online, hosted by Johns Hopkins. As it states, "Most of our users get access to content on Project MUSE through their library or institution. For individuals who are not affiliated with a library or institution, we provide options for you to purchase Project MUSE content and subscriptions for a selection of Project MUSE journals."
Someone has to foot that bill. Open-access publishing implies the authors are paying the cost of publication and its popularity in STEM reflects an availability of money (especially grant funds) to cover those author page charges that is not mirrored in the social sciences and humanities.
Unrelatedly given recent changes in federal funding Johns Hopkins is probably feeling like it could use a little extra cash (losing $800 million in USAID funding, overhead rates potential dropping to existential crisis levels, etc...)
No it implied the journal not double-dipping by extorting both the author and the reader, while not actually performing any valuable task whatsoever for that money.
Like with complaints about landlords not producing any value, I think this is an overstatement? Rather, in both cases, the income they bring in is typically substantially larger than what they contribute, due to economic rent, but they do both typically produce some non-zero value.
This particular journal is published by Kent State University, which has an endowment of less than $200 million.
But science? That's something that IMHO should be paid for with tax money, so that it is accessible for everyone without consideration of one's ability to have money that can be bled.
Sure for me, $20/mo is fine, in fact, I work on AI systems, so I can mostly just use my employer's keys for stuff. But what about the rest of the world where $20/mo is a huge amount of money? We are going to burn through the environment and the most disenfranchised amongst us will suffer the most for it.
Aka not happening.
Few are on jobs where v-latest is always an option.
As for the libraries, for using more modern libraries, usually it also requires more recent language versions.
You can fix this by first figuring out what packages to use or providing your package list, tho.
They have ideas about what you tell them to have ideas about. In this case, when to use a package or not, differs a lot by person, organization or even project, so makes sense they wouldn't be heavily biased one way or another.
Personally I'd look at architecture of the package code before I'd look at when the last change was/how often it was updated, and if it was years since last change or yesterday have little bearing (usually) when deciding to use it, so I wouldn't want my LLM assistant to value it differently.
Depends on which one you're talking about.
What on earth is the maintenance load like in that world these days? I wonder, do JavaScript people find LLMs helpful in migrating stuff to keep up?
MCP itself isn’t even a year old.
Fair enough, but information encoded in the model is return in milliseconds, information that needs to be scraped is returned in 10s of seconds.
One and a half years old shudders
It seems people have turned GenAI into coding assistants only and forget that they can actually be used for other projects too.
It's like https://www.youtube.com/watch?v=zZr54G7ec7A where Prof. Tao uses claude to generate Lean4 proofs (which are then verifiable by machine). Great progress, very useful. While the LLM only approachs are still lacking utility for the top minds: https://mathstodon.xyz/@tao/113132502735585408
And math research is a non-CS application, for the pedants :)
Poor Grok is stuck in the middle of denying the Jewish Holocaust on one hand, while fabricating the White Genocide on the other hand.
No wonder it's so confused and demented, and wants to inject its cognitive dissonance into every conversation.
All the models seem to struggle with React three fiber like this. Mixing and matching versions that don't make sense. I can see this being a tough problem given the nature of these models and the training data.
I am going to also try to start giving it a better skeleton to start with and stick to the particular imports when faced with this issue.
My very first prompt with claude 4 was for R3F and it imported a depreciated component as usual.
We can't expect the model to read our minds.
> Which version of tailwind css do you know?
> I have knowledge of Tailwind CSS up to version 3.4, which was the latest stable version as of my knowledge cutoff in January 2025.
LLMs can not reliably tell whether they know or don't know something. If they did, we would not have to deal with hallucinations.
That's.. how many questions? Maybe if one model generates all possible questions then
“Hallucination” is seeing/saying something that a sober person clearly knows is not supposed to be there, e.g. “The Vice President under Nixon was Oscar the Grouch.”
Harry Frankfurt defines “bullshitting” as lying to persuade without regard to the truth. (A certain current US president does this profusely and masterfully.)
“Confabulation” is filling the unknown parts of a statement or story with bits that sound as-if they could be true, i.e. they make sense within the context, but are not actually true. People with dementia (e.g. a certain previous US president) will do this unintentionally. Whereas the bullshitter generally knows their bullshit to be false and is intentionally deceiving out of self-interest, confabulation (like hallucination) can simply be the consequence of impaired mental capacity.
E.g. from the paper ChatGPT is bullshit [1],
> Frankfurt understands bullshit to be characterized not by an intent to deceive but instead by a reckless disregard for the truth.
That is different than defining "bullshitting" as lying. I agree that "confabulation" could otherwise be more accurate. But with previous definition they are kinda synonyms? And "reckless disregard for the truth" may hit closer. The paper has more direct quotes about the term.
[1] https://link.springer.com/article/10.1007/s10676-024-09775-5
"Who is president?" gives a "April 2024" date.
"Claude’s reliable knowledge cutoff date - the date past which it cannot answer questions reliably - is the end of January 2025. It answers all questions the way a highly informed individual in January 2025 would if they were talking to someone from {{currentDateTime}}, "
https://docs.anthropic.com/en/release-notes/system-prompts#m...
But the documentation page linked here doesn't bear that out. In fact the Claude 3.7 system prompt on this page clocks in at significantly less than 4,000 tokens.
a model learns words or tokens more pedantically but has no sense of time nor cant track dates
not really -trained- into the weights.
the point is you can't ask a model what's his training cut off date and expect a reliable answer from the weights itself.
closer you could do is have a bench with -timed- questions that could only know if had been trained for that, and you'd had to deal with hallucinations vs correctness etc
just not what llm's are made for, RAG solves this tho
E.g. it probably has a pretty good understanding between “second world war” and the time period it lasted. Or are you talking about the relation between “current wall clock time” and questions being asked?
see google TimesFM: https://github.com/google-research/timesfm
what i mean i guess is llms can -reason- linguistically about time manipulating language, but can't really experience it. a bit like physics. thats why they do bad on exercises/questions about physics/logic that their training corpus might not have seen.
sometimes its interesting to peek up under the network tab on dev tools
>Spontaneous replay
>The insights into the mechanisms of memory consolidation during the sleep processes in human and animal brain led to other biologically inspired approaches. While declarative memories are in the classical picture consolidated by hippocampo-neocortical dialog during NREM phase of sleep (see above), some types of procedural memories were suggested not to rely on the hippocampus and involve REM phase of the sleep (e.g.,[22] but see[23] for the complexity of the topic). This inspired models where internal representations (memories) created by previous learning are spontaneously replayed during sleep-like periods in the network itself[24][25] (i.e. without help of secondary network performed by generative replay approaches mentioned above).
The Electric Prunes - I Had Too Much To Dream (Last Night):
What does that even mean? Of course an LLM doesn't know everything, so it we wouldn't be able to assume everything got updated either. At best, if they shared the datasets they used (which they won't, because most likely it was acquired illegally), you could make some guesses what they tried to update.
I think it is clear what he meant and it is a legitimate question.
If you took a 6 year old and told him about the things that happened in the last year and sent him off to work, did he integrate the last year's knowledge? Did he even believe it or find it true? If that information was conflicting what he knew before, how do we know that the most recent thing he is told he will take as the new information? Will he continue parroting what he knew before this last upload? These are legitimate questions we have about our black box of statistics.
If they stopped learning (=including) at march 31 and something popup on the internet on march 30 (lib update, new Nobel, whatever) there’s many chances it got scrapped because they probably don’t scrap everything in one day (do they ?).
That isn’t mutually exclusive with your answer I guess.
edit: thanks adolph to point out the typo.
I imagine there's a lot more data pointing to the super bowl being upcoming, then the super bowl concluding with the score.
Gonna be scary when bot farms are paid to make massive amounts of politically motivated false content (specifically) targeting future LLMs training
I'll go a step further and say this is not a problem but a boon to tech companies. Then they can sell you a "premium service" to a walled garden of only verified humans or bot-filtered content. The rest of the Internet will suck and nobody will have incentive to fix it.
https://claude.ai/share/59818e6c-804b-4597-826a-c0ca2eccdc46
>This is a topic that would have developed after my knowledge cutoff of January 2025, so I should search for information [...]
I've nearly finished writing a short guide which, when added to a prompt, gives quite idiomatic FastHTML code.
The model includes nothing AFTER date D
and not
The model includes everything ON OR BEFORE date D
Right? Definitionally, the model can't include anything that happened after training stopped.
Unfortunately I work with new APIs all the time and the cutoff date is of no much use.
Or is it?
if you waiting for a new information, of course you are not going ever to train
Both Sonnet and Opus 4 say Joe Biden is president and claim their knowledge cutoff is "April 2024".
The web interface has a prompt that defines a cutoff date and who's president[1].
[0] https://console.anthropic.com/workbench
[1] https://docs.anthropic.com/en/release-notes/system-prompts#c...
People use "who's the president?" as a cutoff check (sort of like paramedics do when triaging a potential head injury patient!), so they put it into the prompt. If people switched to asking who the CEO of Costco is, maybe they'd put that in the prompt too.