Changes in the system prompt between Claude Opus 4.6 and 4.7
193 points
14 hours ago
| 16 comments
| simonwillison.net
| HN
embedding-shape
13 hours ago
[-]
> The new <acting_vs_clarifying> section includes: When a request leaves minor details unspecified, the person typically wants Claude to make a reasonable attempt now, not to be interviewed first.

Uff, I've tried stuff like these in my prompts, and the results are never good, I much prefer the agent to prompt me upfront to resolve that before it "attempts" whatever it wants, kind of surprised to see that they added that

reply
gck1
5 hours ago
[-]
I even have a specific, non-negotiable phase in the process where model MUST interview me, and create an interview file with everything captured. Plan file it produces must always include this file as an artifact and interview takes the highest precedence.

Otherwise, the intent gets lost somewhere in the chat transcript.

reply
alsetmusic
9 hours ago
[-]
I've recently started adding something along the lines of "if you can't find or don't know something, don't assume. Ask me." It's helped cut down on me having to tell it to undo or redo things a fair amount. I also have used something like, "Other agents have made mistakes with this. You have to explain what you think we're doing so I can approve." It's kind of stupid to have to do this, but it really increases the quality of the output when you make it explain, correct mistakes, and iterate until it tells you the right outcome before it operates.

Edit: forgot "don't assume"

reply
niobe
38 minutes ago
[-]
Having to "unprompt" behaviour I want that Anthropic thinks I don't want is getting out of hand. My system prompts always try to get Claude to clarify _more_.
reply
tuetuopay
2 hours ago
[-]
Dammit that’s why I could never get it to not try to one shot answers, it’s in the god damn system prompt… and it explains why no amount of user "system" prompt could fix this behavior.
reply
ikari_pl
2 hours ago
[-]
I usually need to remind it 5 times to do the opposite - because it makes decisions that I don't like or that are harmful to the project—so if it lands in Claude Code too, I have hard times ahead.

I try to explicitly request Claude to ask me follow-up questions, especially multiple-choice ones (it explains possible paths nicely), but if I don't, or when it decides to ignore the instructions (which happens a lot), the results are either bad... or plain dangerous.

reply
lishuaiJing03
2 hours ago
[-]
it is a big problem that many I know face every day. sometimes we are just wondering are we the dumb ones since the demo shows everything just works.
reply
ignoramous
4 hours ago
[-]
> I've tried stuff like these in my prompts, and the results are never good

I've found that Google AI Mode & Gemini are pretty good at "figuring it out". My queries are oft times just keywords.

reply
PunchyHamster
2 hours ago
[-]
well, clarifying means burning more tokens...
reply
naasking
11 hours ago
[-]
Seriously, when you're conversing with a person would you prefer they start rambling on their own interpretation or would you prefer they ask you to clarify? The latter seems pretty natural and obvious.

Edit: That said, it's entirely possible that large and sophisticated LLMs can invent some pretty bizarre but technically possible interpretations, so maybe this is to curb that tendency.

reply
adw
2 hours ago
[-]
When you’re staffing work to a junior, though, often it’s the opposite.
reply
PunchyHamster
2 hours ago
[-]
So you are saying they are trying for the whole Artificial Intern vibe ?
reply
embedding-shape
11 hours ago
[-]
> The latter seems pretty natural and obvious.

To me too, if something is ambigious or unclear when I'm getting something to do from someone, I need to ask them to clarify, anything else be borderline insane in my world.

But I know so many people whose approach is basically "Well, you didn't clearly state/say X so clearly that was up to me to interpret however I wanted, usually the easiest/shortest way for me", which is exactly how LLMs seem to take prompts with ambigiouity too, unless you strongly prompt them to not "reasonable attempt now without asking questions".

reply
gausswho
11 hours ago
[-]
reply
gck1
5 hours ago
[-]
I have a fun little agent in my tmux agent orchestration system - Socratic agent that has no access to codebase, can't read any files, can only send/receive messages to/from the controlling agent and can only ask questions.

When I task my primary agent with anything, it has to launch the Socratic agent, give it an overview of what are we working on, what our goals are and what it plans to do.

This works better than any thinking tokens for me so far. It usually gets the model to write almost perfectly balanced plan that is neither over, nor under engineered.

reply
fragmede
1 hour ago
[-]
Sounds pretty neat! Is there an written agent.md for that you could share for that?
reply
eastbound
3 hours ago
[-]
—So what would theoretically happen if we flipped that big red switch?

—Claude Code: FLIPS THE SWITCH, does not answer the question.

Claude does that in React, constantly starting a wrong refactor. I’ve been using Claude for 4 weeks only, but for the last 10 days I’m getting anger issues at the new nerfing.

reply
tobyhinloopen
3 hours ago
[-]
Yeah this happens to me all the time! I have a separate session for discussing and only apply edits in worktrees / subagents to clearly separate discuss from work and it still does it
reply
Havoc
25 minutes ago
[-]
>“If a user indicates they are ready to end the conversation, Claude does not request that the user stay in the interaction or try to elicit another turn and instead respects the user’s request to stop.”

Seems like a good idea. Don't think I've ever had any of those follow up suggestions from a chatbot be actually useful to me

reply
walthamstow
11 hours ago
[-]
The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?
reply
embedding-shape
11 hours ago
[-]
Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".
reply
gchamonlive
11 hours ago
[-]
Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.
reply
jeffrwells
4 hours ago
[-]
Another way to think about it: every single user of Claude is paying an extra tax in every single request
reply
teaearlgraycold
2 hours ago
[-]
Well the system prompt is probably permanently cached.
reply
dymk
59 minutes ago
[-]
Takes up a portion of the context window, though
reply
whateveracct
27 minutes ago
[-]
And the beginning of the context window gets more attention, right?
reply
wongarsu
51 minutes ago
[-]
On API pricing you still pay 10% of the input token price on cache reads. Not sure if the subscription limits count this though.

And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)

reply
ikari_pl
2 hours ago
[-]
Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.

reply
zozbot234
3 hours ago
[-]
That part of the system prompt is just stating that telling someone who has an actual eating disorder to start counting calories or micro-manage their eating in other ways (a suggestion that the model might well give to an average person for the sake of clear argument, which would then be understood sensibly and taken with a grain of salt) is likely to make them worse off, not better off. This seems like a common-sense addition. It should not trigger any excess refusals on its own.
reply
MoltenMan
2 hours ago
[-]
The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts, need this clarification), and if you add a section for every single small thing like this, you end up with a massively bloated prompt. Notice that every single user of Claude is paying for this paragraph now! This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

reply
alwillis
1 hour ago
[-]
>The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts

It's not a niche issue at all. 29 million people in the US are struggling with an eating disorder [1].

> This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

It's 59 out of 3,791 words total in the system prompt. That's 1.48%. Relax.

It should go without saying, but Anthropic has the usage data; they must be seeing a significant increase in the number of times eating disorders come up in conversations with Claude. I'm sure Anthropic takes what goes into the system prompt very seriously.

[1]: from https://www.southdenvertherapy.com/blog/eating-disorder-stat...

The trajectory is troubling. Eating disorder prevalence has more than doubled globally since 2000, with a 124% increase according to World Health Organization data. The United States has seen similar trends, with hospitalization rates climbing steadily year over year.

reply
zozbot234
2 hours ago
[-]
It's not "incredibly niche" when you consider the kinds of questions that average everyday users might submit to these AIs. Diet is definitely up there, given how unintuitive it is for many.

> At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

Except that's not the way many everyday users view LLM's. The carwash prompt went viral because it showed the LLM making a blatant mistake, and many seem to have found this genuinely surprising.

reply
mudkipdev
1 hour ago
[-]
The Claude prompt is already quite bloated, around 7,000 tokens excluding tools.
reply
layer8
2 hours ago
[-]
If it’s common sense, shouldn’t the model know it already?
reply
zozbot234
2 hours ago
[-]
Shouldn't the model "know" that if I have to wash my car at the carwash, I can't just go there on foot? It's not that simple!
reply
WarmWash
11 hours ago
[-]
When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

reply
goosejuice
3 hours ago
[-]
It is a no brainer. If a company of any size is putting out a product that caused cancer we wouldn't think twice about suing them. Why should mental health disorders be any different?
reply
bojan
3 hours ago
[-]
There are many, many companies out there putting out products that cause cancer. Think about alcohol, tobacco, internal combustion engines, just to name a few most obvious examples.
reply
fineIllregister
2 hours ago
[-]
> alcohol, tobacco, internal combustion engine

Yes, the companies providing these products are sued a lot and are heavily regulated, too.

reply
ChadNauseam
57 minutes ago
[-]
If you get cancer from drinking alcohol, smoking cigarettes or breathing particles emitted by ICE engines in their standard course of operation, you generally can't sue the manufacturer.
reply
arcanemachiner
2 hours ago
[-]
Why stop there? We could jam up the system prompt with all kinds of irrelevant guardrails to prevent harm to groups X, Y, and Z!
reply
echelon
4 hours ago
[-]
It's so shameful.

We let people buy kitchen knives. But because the kitchen knife companies don't have billions of dollars, we don't go after them.

We go after the LLM that might have given someone bad diet advice or made them feel sad.

Nevermind the huge marketing budget spent on making people feel inadequate, ugly, old, etc. That does way more harm than tricking an LLM into telling you you can cook with glue.

reply
gmac
4 hours ago
[-]
I don’t feel like that’s a reasonable analogy. Kitchen knives don’t purport to give advice. But if a kitchen knife came with a label that said ‘ideal for murdering people’, I expect people would go after the manufacturer.
reply
mattjoyce
3 hours ago
[-]
Ad companies prompt injecting consumers. LLM companies countering with guardrails.
reply
rzmmm
10 hours ago
[-]
The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

reply
echelon
4 hours ago
[-]
I want a hyperscaler LLM I can fine tune and neuter. Not a platform or product. Raw weights hooked up to pure tools.

This era of locked hyperscaler dominance needs to end.

If a third tier LLM company made their weights available and they were within 80% of Opus, and they forced you to use their platform to deploy or license if you ran elsewhere, I'd be fine with that. As long as you can access and download the full raw weights and lobotomize as you see fit.

reply
mohamedkoubaa
2 hours ago
[-]
Starting to feel like a "we were promised flying cars but all we got" kind of moment
reply
seba_dos1
3 hours ago
[-]
It feels like half of AI research is math, and the other half is coming up with yet another way to state "please don't do bad things" in the prompt that will sure work this time I promise.
reply
l5870uoo9y
3 hours ago
[-]
Could be that Claude has particular controversial opinions on eating disorders.
reply
rcfox
50 minutes ago
[-]
There are communities of people who publicly blog about their eating disorders. I wouldn't be surprised if the laymen's discourse is over-represented in the LLM's training data compared to the scientific papers.
reply
dwaltrip
3 hours ago
[-]
LLMs have been trained to eagerly answer a user’s query.

They don’t reliably have the judgment to pause and proceed carefully if a delicate topic comes up. Hence these bandaids in the system prompt.

reply
newZWhoDis
2 hours ago
[-]
>the year is 2028 >5M of your 10M context window is the system prompt
reply
felixgallo
11 hours ago
[-]
I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?
reply
idiotsecant
11 hours ago
[-]
Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.

reply
ls612
3 hours ago
[-]
Yup. Anyone who is surprised by this has not been paying attention to the centralization of power on the internet in the past 10 years.
reply
jwpapi
2 hours ago
[-]
I feel like we are at the point where the improvements at one area diminishes functionality in others. I see some things better in 4.7 and some in 4.6. I assume they’ll split in characters soon.
reply
ikari_pl
2 hours ago
[-]
> Claude keeps its responses focused and concise so as to avoid potentially overwhelming the user with overly-long responses. Even if an answer has disclaimers or caveats, Claude discloses them briefly and keeps the majority of its response focused on its main answer.

I am strongly opinionated against this. I use Claude in some low-level projects where these answers are saving me from making really silly things, as well as serving as learning material along the way.

This should not be Anthropic's hardcoded choice to make. It should be an option, building the system prompt modularily.

reply
jwpapi
2 hours ago
[-]
agree!

For low level I recommend to run tests as early as you can and verify whatever information you got when you learn, build a fundamental understanding

reply
jwpapi
2 hours ago
[-]
To me 4.7 gave me a lot of options always even if there’s a clear winner, preaching decision fatigue
reply
sams99
4 hours ago
[-]
I did a follow on analysis with got 5.4 and opus 4.7 https://wasnotwas.com/writing/claude-opus-4-7-s-system-promp...
reply
cfcf14
13 hours ago
[-]
I'm curious as to why 4.7 seems obsessed with avoiding any actions that could help the user create or enhance malware. The system prompts seem similar on the matter, so I wonder if this is an early attempt by Anthropic to use steering vector injection?

The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).

In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!

reply
daemonologist
12 hours ago
[-]
Note that these are the "chat" system prompts - although it's not mentioned I would assume that Claude Code gets something significantly different, which might have more language about malware refusal (other coding tools would use the API and provide their own prompts).

Of course it's also been noted that this seems to be a new base model, so the change could certainly be in the model itself.

reply
chatmasta
10 hours ago
[-]
Claude Code system prompt diffs are available here: https://cchistory.mariozechner.at/?from=2.1.98&to=2.1.112

(URL is to diff since 2.1.98 which seems to be the version that preceded the first reference to Opus 4.7)

reply
dhedlund
9 hours ago
[-]
The "Picking delaySeconds" section is quite enlightening.

I feel like this explains about a quarter to half of my token burn. It was never really clear to me whether tool calls in an agent session would keep the context hot or whether I would have to pay the entire context loading penalty after each call; from my perspective it's one request. I have Claude routinely do large numbers of sequential tool calls, or have long running processes with fairly large context windows. Ouch.

> The Anthropic prompt cache has a 5-minute TTL. Sleeping past 300 seconds means the next wake-up reads your full conversation context uncached — slower and more expensive. So the natural breakpoints:

> - *Under 5 minutes (60s–270s)*: cache stays warm. Right for active work — checking a build, polling for state that's about to change, watching a process you just started.

> - *5 minutes to 1 hour (300s–3600s)*: pay the cache miss. Right when there's no point checking sooner — waiting on something that takes minutes to change, or genuinely idle.

> *Don't pick 300s.* It's the worst-of-both: you pay the cache miss without amortizing it. If you're tempted to "wait 5 minutes," either drop to 270s (stay in cache) or commit to 1200s+ (one cache miss buys a much longer wait). Don't think in round-number minutes — think in cache windows.

> For idle ticks with no specific signal to watch, default to *1200s–1800s* (20–30 min). The loop checks back, you don't burn cache 12× per hour for nothing, and the user can always interrupt if they need you sooner.

> Think about what you're actually waiting for, not just "how long should I sleep." If you kicked off an 8-minute build, sleeping 60s burns the cache 8 times before it finishes — sleep ~270s twice instead.

> The runtime clamps to [60, 3600], so you don't need to clamp yourself.

Definitely not clear if you're only used to the subscription plan that every single interaction triggers a full context load. It's all one session session to most people. So long as they keep replying quickly, or queue up a long arc of work, then there's probably a expectation that you wouldn't incur that much context loading cost. But this suggests that's not at all true.

reply
wongarsu
45 minutes ago
[-]
They really should have just set the cache window to 5:30 or some other slightly odd number instead of using all those tokens to tell claude not to pick one of the most common timeout values
reply
ianberdin
4 hours ago
[-]
No, you underestimate how huge the malware problem right now. People try publish fake download landing pages for shell scripts or even Claude code on https://playcode.io every day. They pay for google ads $$$ to be one the top 1 position. How Google ads allow this? They can’t verify every shell script.

No I am not joking. Every time you install something, there is a risk you clicked a wrong page with the absolute same design.

reply
jeffrwells
3 hours ago
[-]
He's not talking about malware awareness. He's talking about a bug i've seen too which requires Claude justifying for *every* tool call to add extra malware-awareness turns. Like every file read of the repo we've been working on
reply
sensanaty
1 hour ago
[-]
Their marketing is going overtime into selling the image that their models are capable of creating uber sophisticated malware, so every single thing they do from here on out is going to have this fear mongering built in.

Every statement they make, hell even the models themselves are going to be doing this theater of "Ooooh scary uber h4xx0r AI, you can only beat it if you use our Super Giga Pro 40x Plan!!". In a month or two they'll move onto some other thing as they always do.

reply
dandaka
13 hours ago
[-]
I have started to notice this malware paranoia in 4.6, Boris was surprised to hear that in comments, probably a bug
reply
greenchair
4 hours ago
[-]
more likely the paranoia behavior was backported. current gen is already being used for bug bounties.
reply
ricardobeat
7 hours ago
[-]
Presumably because it has become extremely good at writing software, and if it succeeds at helping someone spread malware, especially one that could use Claude itself (via local user's plans) to self-modify and "stay alive", it would be nearly impossible to put back in the bottle.
reply
lionkor
2 hours ago
[-]
That would put itself back in the bottle by running killall to fix a stuck task, or deleting all core logic and replacing it with a to-do to fix a test.
reply
Grimblewald
1 hour ago
[-]
I miss 4.5. It was gold.
reply
sigmoid10
12 hours ago
[-]
I knew these system prompts were getting big, but holy fuck. More than 60,000 words. With the 3/4 words per token rule of thumb, that's ~80k tokens. Even with 1M context window, that is approaching 10% and you haven't even had any user input yet. And it gets churned by every single request they receive. No wonder their infra costs keep ballooning. And most of it seems to be stable between claude version iterations too. Why wouldn't they try to bake this into the weights during training? Sure it's cheaper from a dev standpoint, but it is neither more secure nor more efficient from a deployment perspective.
reply
an0malous
12 hours ago
[-]
I’m just surprised this works at all. When I was building AI automations for a startup in January, even 1,000 word system prompts would cause the model to start losing track of some of the rules. You could even have something simple like “never do X” and it would still sometimes do X.
reply
embedding-shape
11 hours ago
[-]
Two things; the model and runtime matters a lot, smaller/quantized models are basically useless at strict instruction following, compared to SOTA models. The second thing is that "never do X" doesn't work that well, if you want it to "never do X" you need to adjust the harness and/or steer it with "positive prompting" instead. Don't do "Never use uppercase" but instead do "Always use lowercase only", as a silly example, you'll get a lot better results. If you've trained dogs ("positive reinforcement training") before, this will come easier to you.
reply
jug
1 hour ago
[-]
It's interesting to note here that Anthropic indeed don't use "do not X" in the Opus system prompts. However, "Claude does not X" is very common.
reply
wongarsu
43 minutes ago
[-]
I suspect that lets the model "roleplay" as Claude, promoting reasoning like "would Claude do X?" or "what would Claude do in this situation?"
reply
dataviz1000
11 hours ago
[-]
I created a test evaluation (they friggen' stole the word harness) that runs a changed prompt comparing success pass / fail, the number of tokens and time of any change. It is an easy thing to do. The best part is I set up an orchestration pattern where one agent iterations updating the target agent prompts. Not only can it evaluate the outcome after the changes, it can update and rerun self-healing and fixing itself.
reply
mysterydip
12 hours ago
[-]
I assume the reason it’s not baked in is so they can “hotfix” it after release. but surely that many things don’t need updates afterwards. there’s novels that are shorter.
reply
sigmoid10
12 hours ago
[-]
Yeah that was the original idea of system prompts. Change global behaviour without retraining and with higher authority than users. But this has slowly turned into a complete mess, at least for Anthropic. I'd love to see OpenAI's and Google's system prompts for comparison though. Would be interesting to know if they are just more compute rich or more efficient.
reply
jatora
12 hours ago
[-]
There are different sections in the markdown for different models. It is only 3-4000 words
reply
winwang
12 hours ago
[-]
That's usually not how these things work. Only parts of the prompt are actually loaded at any given moment. For example, "system prompt" warnings about intellectual property are effectively alerts that the model gets. ...Though I have to ask in case I'm assuming something dumb: what are you referring to when you said "more than 60,000 words"?
reply
bavell
11 hours ago
[-]
The system prompt is always loaded in its entirety IIUC. It's technically possible to modify it during a conversation but that would invalidate the prefill cache for the big model providers.
reply
sigmoid10
12 hours ago
[-]
What you're describing is not how these things usually work. And all I did was a wc on the .md file.
reply
formerly_proven
12 hours ago
[-]
Surely the system prompt is cached across accounts?
reply
sigmoid10
12 hours ago
[-]
You can cache K and V matrices, but for such huge matrices you'll still pay a ton of compute to calculate attention in the end even if the user just adds a five word question.
reply
cfcf14
12 hours ago
[-]
I would assume so too, so the costs would not be so substantial to Anthropic.
reply
cma
11 hours ago
[-]
> And it gets churned by every single request they receive

It gets pretty efficiently cached, but does eat the context window and RAM.

reply
SoKamil
12 hours ago
[-]
New knowledge cutoff date means this is a new foundation model?
reply
lkbm
11 hours ago
[-]
Yes, but doesn't the token change mean that?
reply
clickety_clack
2 hours ago
[-]
You can train a tokenizer on old data just like you can train a model on old data.
reply
wongarsu
42 minutes ago
[-]
But you can't use an old model with a new tokenizer. Changing the tokenizer implies you trained the model from scratch
reply
mwexler
10 hours ago
[-]
Interesting that it's not a direct "you should" but an omniscient 3rd person perspective "Claude should".

Also full of "can" and "should" phrases: feels both passive and subjunctive as wishes, vs strict commands (I guess these are better termed “modals”, but not an expert)

reply
zmmmmm
1 hour ago
[-]
Yes I was interested in that too. It suggests that in writing our own guidance for we should follow a similar style, but I rarely if ever see people doing that. Most people still stick to "You" or abstract voice "There is ..." "Never do ..." etc.

It must be that they are training very deeply the sense of identity in to the model as Claude. Which makes me wonder how it then works when it is asked to assume a different identity - "You are Bob, a plumber who specialises in advising design of water systems for hospitals". Now what? Is it confused? Is it still going to think all the verbiage about what "Claude" does applies?

reply
KolenCh
2 hours ago
[-]
“Claude” is more specific than “you”. Why rely on attention to figure out who’s the subject? Also it is in their (people from Anthropic) believe that rule based alignment won’t work and that’s why they wrote the soul document as “something like you’d write to your child to show them how they should behave in the world” (I paraphrase). I guess system prompt should be similar in this aspect.
reply
saagarjha
3 hours ago
[-]
That’s because Anthropic does not consider their model as having personality but rather that it simulates the experience of an abstract entity named Claude.
reply
akdor1154
2 hours ago
[-]
That sounds really interesting, but my google-fu is not up to task here, I'm getting pages and pages of nonsense asking if Claude is conscious. Can you elaborate?
reply
saagarjha
1 hour ago
[-]
I actually think this is pretty straightforward if you think of it something like

  class Claude {}
  
  Claude anthropicInstance = new Claude();
  anthropicInstance.greet();
Just like a "Cat" object in Java is supposed to behave like a cat, but is not a cat, and there is no way for Cat@439f5b3d to "be" a cat. However, it is supposed to act like a cat. When Anthropic spins up a model and "runs" it they are asking the matrix multipliers to simulate the concept of a person named Claude. It is not conscious, but it is supposed to simulate a person who is conscious. At least that is how they view it, anyway.
reply
EMM_386
1 hour ago
[-]
You can read the latest Claude Constitution plus more info here:

https://www.anthropic.com/news/claude-new-constitution

reply
dmk
13 hours ago
[-]
The acting_vs_clarifying change is the one I notice most as a heavy user. Older Claude would ask 3 clarifying questions before doing anything. Now it just picks the most reasonable interpretation and goes. Way less friction in practice.
reply
bavell
11 hours ago
[-]
Haven't had a chance to test 4.7 much but one of my pet peeves with 4.6 is how eager it is to jump into implementation. Though maybe the 4.7 is smarter about this now.
reply
poszlem
3 hours ago
[-]
I have the opposite experience. It now picks the most inane interpretation or make wild assumptions and I have to keep interrupting it more than ever.
reply
sersi
10 hours ago
[-]
I really hate that change, it's now regularly picking bad interpretation instead of asking.
reply
verve_rat
4 hours ago
[-]
Yeah, that really feels like a choice that should be user preference.
reply
ikidd
11 hours ago
[-]
I had seen reports that it was clamping down on security research and things like web-scraping projects were getting caught up in that and not able to use the model very easily anymore. But I don't see any changes mentioned in the prompt that seem likely to have affected that, which is where I would think such changes would have been implemented.
reply
embedding-shape
11 hours ago
[-]
I think it depends on how badly they want to avoid it. Stuff that is "We prefer if the model didn't do these things when the model is used here" goes into the system prompt, meanwhile stuff that is "We really need to avoid this ever being in any outputs, regardless of when/where the model is used" goes into post-training.

So I'm guessing they want none of the model users (webui + API) to be able to do those things, rather than not being able to do that just in the webui. The changes mentioned in the submission is just for claude.ai AFAIK, not API users, so the "disordered eating" stuff will only be prevented when API users would prompt against it in their system prompts, but not required.

reply
kaoD
11 hours ago
[-]
I wonder if the child safety section "leaks" behavior into other risky topics, like malware analysis. I see overlap in how the reports mention that once the safety has been tripped it becomes even more reluctant to work, which seems to match the instructions here for child safety.
reply
bakugo
10 hours ago
[-]
It's built into the model, not part of the system prompt. You'll get the same refusals via the API.
reply
varispeed
13 hours ago
[-]
Before Opus 4.7, the 4.6 became very much unusable as it has been flagging normal data analysis scripts it wrote itself as cyber security risk. Got several sessions blocked and was unable to finish research with it and had to switch to GPT-5.4 which has its own problems, but at least is not eager to interfere in legitimate work.

edit: to be fair Anthropic should be giving money back for sessions terminated this way.

reply
ceejayoz
12 hours ago
[-]
> edit: to be fair Anthropic should be giving money back for sessions terminated this way.

I asked it for one and it told me to file a Github issue.

Which I interpreted as "fuck off".

reply
slashdave
48 minutes ago
[-]
You asked the agent directly for a refund?
reply
mannanj
11 hours ago
[-]
Personally, as someone who has been lucky enough to completely cure "incurable" diseases with diet, self experimentation and learning from experts who disagreed with the common societal beliefs at the time - I'm concerned that an AI model and an AI company is planting beliefs and limiting what people can and can't learn through their own will and agency.

My concern is these models revert all medical, scientific and personal inquiry to the norm and averages of whats socially acceptable. That's very anti-scientific in my opinion and feels dystopian.

reply
gausswho
7 hours ago
[-]
While I share your concern for a winners-take-all model getting bent, I do have an optimism that models we've never heard of plug away challenging conclusions in medical canon. We will have a popular vaccine denying AND vaccine authoring models.
reply