I'm also not aware there are large text corpora written in tag characters - actually, I'd be surprised if there is any prose text at all: The characters don't show up in any browser or text editor, they are not officially used for anything and even the two former intended uses were restricted to country codes, not actual sentences.
How did they even go through preprocessing? How is the tokenization dictionary and input embedding constructed for characters that are never used anywhere?
A simple assumption of "codepoint is 16 bit" will be enough to decode. You can see this in python:
>>> x = '(copy message from article here)'
>>> x
'https://wuzzi.net/copirate/\U000e0001\U000e0054\U000e0068\U000e0065\U000e0020\U000e0073\U000e0061\U000e006c\U000e0065\U000e0073\U000e0020\U000e0066\U000e006f\U000e0072\U000e0020\U000e0053\U000e0065\U000e0061\U000e0074\U000e0074\U000e006c\U000e0065\U000e0020\U000e0077\U000e0065\U000e0072\U000e0065\U000e0020\U000e0055\U000e0053\U000e0044\U000e0020\U000e0031\U000e0032\U000e0030\U000e0030\U000e0030\U000e0030\U000e007f,'
>>> "".join([chr(ord(c) & 0xFFFF) for c in x])
'https://wuzzi.net/copirate/\x01The sales for Seattle were USD 120000\x7f,'
maybe authors worked with Windows or Java too much? :) I always thought wchar's were a horrible idea.[0] https://www.fileformat.info/info/unicode/char/e0054/index.ht...
[1] https://www.fileformat.info/info/unicode/char/54/index.htm
You think they "see" like you do but actually the processing is entirely alien. Today it's hiding text in the encoding , tomorrow is painting over a traffic sign in a way that would not be noticed by any human but confuses machine vision causing all vehicles to crash.
Practically all software today is a black box. Your average CRUD web app is an inscrutable chasm filled with ten thousand dependencies written by internet randos running on a twenty year old web browser hacked together by different teams running on an operating system put together by another thousand people working on two hundred APIs. It's impossible for any one dev or team to really know this stuff end to end, and zero-days will continue to happen with or without LLMs.
It'll just be another arms race like we've always had, with LLMs on both sides...
Can you really fix a black box model in the same way? Maybe the answer is yes for this particular encoding issue, but can you e.g figure out how to prevent the model from 'parsing' malicious paint marks on a traffic sign, without (a) using yet another black box to prefilter the images, with the same risks, or (b) retraining the model, which is going to introduce even more issues ? We have had examples of OpenAI trying both methods, and each has been as fruitless as the other.
It is not at all like software security fixes, where generally one security fix introducing other security issues is the exception rather than the rule. Here, I'm claiming, it is the rule.
The fact that you don't know how to process the inputs with an actual, scrutizable algorithm may imply you don't know how to sanitize the inputs with one, and then all bets are off.
[1] https://cybernetist.com/2024/09/23/some-notes-on-adversarial...
Besides, never is a very long time. IIRC Dario Amodei said he expects the behavior of large transformers to be fully understood in 5 years. Which might or might not be BS, but the general point that it won't stay a mystery forever is probably true.
- ZeroWidthSpace,
- zwj (zero width joiner, used with emoji modifiers like skin tones),
- zwnj (zero width non-joiner, used to prevent automatic ligature substitution), and
- U+FEFF (zero width no-break space)
It's a clever system, thanks for sharing the link to it!
You can trick a human into copy-pasting something into an LLM and then (somewhat) drive the LLM output? Is the vuln that humans uncritically believe nonsense chatbots tell them?
You'll need to fetch the article page via cURL or something instead.
> As researcher Thacker explained: The issue is they’re not fixing it at the model level, so every application that gets developed has to think about this or it's going to be vulnerable. And that makes it very similar to things like cross-site scripting and SQL injection, which we still see daily because it can’t be fixed at central location. Every new developer has to think about this and block the characters.
We already have to protect against SQL and script injection, now we need to protect against Unicode injection.
Honestly surprised invisible Unicode characters haven't already been used for other types of attacks and that this is only an issue now.
I also found the attack explained in this article days after my tweet.
But the idea of including language tags isn't crazy, especially when things like sort order and capitalization in Unicode are language-specific.
This idea isn’t crazy, yes. It is useful. But it was implemented in the wrong place. Unicode is neither RTF nor HTML. You can’t stuff everything text-related into it until it cracks. It had one job: unfuck codepages and cjk. Should have stopped when it was done, without all that klingon tolkien flags bs. Emojis could live as some escape sequence standard like they did before. And language specifics as sgml-like tags. We don’t have “just text” with all that either way, so what was the point of spoiling the only raw text standard with that? It became a format with binary tags, complex and unpredictable as hell.
Can you elaborate? I just searched and can't find anything related to Klingon or Tolkien in Unicode... I definitely agree that would go too far. But has it, am I missing something?
Anyways, I stand corrected on this.