Unicode started with the mission to encode all characters needed for written communication in the world. This was already broad, but was not unusual for its time. Unlike Wikipedia, Unicode never went through a battle between inclusionists and deletionists. Moreover, with Han Unification it strayed from its core mission to "encode all characters needed for written communication in the world" (emphasis mine).
Instead it ended up as a fancy clip art library that every software somehow has to support, but with no way to implement the standard in its entirety.
And people were already doing emojis with phpBB, MSN Messenger, etc. The alternative to Unicode emojis would not be "no emojis", but "every platform with their own proprietary incompatible implementation".
Han Unification has been discussed a million times already. Originally Unicode only had 2 bytes and 65k characters. Maybe it was a mistake, maybe not – I don't speak these languages and those who do often disagree on this as well. However, changing it now would probably introduce be more pain than it solves.
Yes, and that's a good thing.
"The notion that emoji support impedes that is simply not true."
If it does not, why are there so many unresolved issues and shortcomings lingering around for years.
It's not that the issues around Han Unification will go away by ignoring them. There are related issues in western languages, like the umlaut/trema distinction. Pushing these topics, which are core to Unicode's original mission, into OpenType is not a solution.
Why do we continue adding pictures of random every day objects, like disco balls, when not even characters common in ordinary books can be represented?
Don't you think reallocating resources from emoji work to more serious issues would make sense?
I have no idea which "characters common in ordinary books" are missing; the explicit goal of Unicode is to encode exactly that sort of thing.
Han Unification fails this mission and that is not a matter of opinion.
"I have no idea which "characters common in ordinary books" are missing; the explicit goal of Unicode is to encode exactly that sort of thing."
In
«Günther a souligné l’ambigüité de son discours.»
there is an umlaut and a dieresis.
They are different characters with different function. In traditional book printing they used to look differently and quality fonts do still have both. Unfortunately Unicode does not encode both of them.
You can disagree how Unicode does this (or how other encodings do it, for that matter) but this is just an utterly disingenuous thing to say. I no longer believe you are engaging in good faith. You have either not understood Unicode or you're intentionally misrepresenting it. Good bye.
I did not. In every book printed before 1950 and every quality book printed now the different characters would actually look differently. This is not about rendering variations but about different characters (linguistically and functionally, e.g. wrt collation) that coincidentally look similar and Unicode confuses.
Here is a source from DIN (Deutsches Institut für Normung) with more background:
https://www.unicode.org/L2/L2003/03215-n2593-umlaut-trema.pd...
If you think its just crazy Germans arguing a moot point Yannis Haralambous has a paragraph specifically about the umlaut/trema issue in his O'Reilly book "Fonts & Encodings".
(Not telling I see this as a good thing or anything: it is way beyond my expertise; I definitely can see the motivation for introducing as many variants in the Unicode register as there are in the real world)
Isn't the umlaut vs trema/diaeresis in a similar situation?
[1] made me test it and cobble a demo. (Sadly, not speaking any of these languages, so cannot verify it is correct; just wanted to see the difference in practice):
data:text/html;charset=utf-8;verbatim,<style>
@import url("https://fonts.googleapis.com/css2?family=Noto+Sans:ital@0;1");
body { font-family: 'Noto Sans'; }
dl:hover i { font-style: normal; }
</style>
<dl>
<dt>lang="ru"
<dd lang="ru"><i>грипп, практика, график, типа</i>
<dt>lang="sr"
<dd lang="sr"><i>грипп, практика, график, типа</i>
</dl>
Arguably, depending on wide (physical text ↔ specific font ↔ rendering agent) ecosystem feels quite fragile, but cannot tell if there is any better alternative for this particular case.https://myfonj.github.io/sandbox.html#%3C!doctype%20html%3E%...
It's not just rendering variations. While they are etymlogically related they are made with different strokes and are incorrect to substitute for one another.
Technically Unicode has a variant selector that can be used for selecting between variations of the characters, but this does not have sufficient adoption. So that means pretty much everything has to annotate what language it is written in so it can be rendered correctly, else the system has to check the system settings to guess what language the user likely wants to see things rendered as.
MSN, Skype, etc used emoticons and not emoji, and there wasn’t a standard that I’m aware of.
[MSN emoticons] vs [Yahoo emoticons]
:-* "Secret telling" vs "kiss"
8-| "Nerd" vs "rolling eyes"
:| "Disappointed" vs "straight face"
[MSN emoticons] https://web.archive.org/web/20031206095746/http://messenger....[Yahoo emoticons] https://web.archive.org/web/20080408053458/http://messenger....
N.B. some of them are animated. Animated fonts coming when?
1F52B PISTOL = handgun, revolver
https://www.unicode.org/charts/PDF/U1F300.pdfThat’s often described as a flaw, e.g. to err is human, but it’s what we do. Some degree of chaos can help for efficient problem-solving.
Based on past history, we may never get perfect encoding for historical Earthlings, e.g. what about the following list looks well-planned and coordinated for the future?: ASCII, ISO 8859-1 (Latin-1), ISO 8859-2 (Latin-2), ISO 8859-3, ISO 8859-4, ISO 8859-5 (Cyrillic), ISO 8859-6 (Arabic), ISO 8859-7 (Greek), ISO 8859-8 (Hebrew), ISO 8859-8-I, ISO 8859-10, ISO 8859-13, ISO 8859-14, ISO 8859-15, ISO 8859-16, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258, KOI8-R, KOI8-U, KOI8-RU, Shift_JIS, EUC-JP, EUC-KR, GB2312, GBK, Big5, HZ-GB-2312, TIS-620, MacRoman, MacCyrillic, UTF-8, UTF-16 (BE/LE), UTF-32 (BE/LE), CESU-8, UTF-7, IBM866, IBM437, IBM850, IBM852, IBM855, IBM857, IBM862, IBM864, IBM866, KZ1048, IBM874 (TIS-620), VNI, Windows-874, Mac Thai, Mac Central European.
Why do you say that? Because Unicode now has become balkanised between various CJK regions?
Han Unification is just the most obvious case but the issues do not stop there. I'll give you a western example. In the sentence
«Günther a souligné l’ambigüité de son discours.»
there is an umlaut and a dieresis.
They are different things with different function. In traditional book printing they used to look differently.
With Unicode all this cultural nuance is lost. The characters necessary to communicate precisely simply have never been encoded, because Unicode forgot about its core mission.
Fixing things like that is where I want to see efforts go.
The iconic decoration reflects light in all directions and transforms every room - no matter how big its size - into a glamorous space in which people can dance or dream.
I never thought about dreaming in a room with a disco ball in it. I think the informality of emoji proposals is really special!I'd be curious to know how the actual usage stats aligned with their expectations.
Because it's a Japan only thing.
But even when you do know exactly who you're addressing, they might be a very diverse group.
Agreed. Do the kids even still know about him?
I also appreciate melting face, dotted outline face, and face with salute. Low battery & ID card.
Name 1 bad thing that came from the invention of emojis that is comparable to the others
𓂸 Out for harambe.
https://news.ycombinator.com/item?id=23659248
It's just that most people can't deal with them responsibly, so it has not been made easy.