The assistant axis: situating and stabilizing the character of LLMs
38 points
3 hours ago
| 5 comments
| anthropic.com
| HN
ctoth
11 minutes ago
[-]
Something I found really helpful when reading this was having read The Void essay:

https://github.com/nostalgebraist/the-void/blob/main/the-voi...

reply
t0md4n
16 minutes ago
[-]
Pretty cool. I wonder what the reduction looks like in the bigger SOTA models.

The harmful responses remind me of /r/MyBoyfriendIsAI

reply
devradardev
31 minutes ago
[-]
Stabilizing character is crucial for tool-use scenarios. When we ask LLMs to act as 'Strict Architects' versus 'Creative Coders', the JSON schema adherence varies significantly even with the same temperature settings. It seems character definition acts as a strong pre-filter for valid outputs.
reply
dataspun
30 minutes ago
[-]
Is the Assistant channeling Uncharles?
reply
aster0id
46 minutes ago
[-]
This is incredible research. So much harm can be prevented if this makes it into law. I hope it does. Kudos to the anthropic team for making this public.
reply