Can text be made to sound more than just its words? (2022)
40 points
by tobr
13 days ago
| 9 comments
| arxiv.org
| HN
andai
8 hours ago
[-]
Many moons ago I became quite obsessed with analyzing spectrograms on my computer.

I would load up audio files in Audacity and look at them to see how the audio "looked", as a function of how intense each frequency is over time.

You can even set a track to spectrogram while recording which allowed you to see the sound in real time.

Music also tends to be very beautiful in the spectrogram! And birdsong also. Sometimes I would see a bird first, and only afterwards notice it in my field of hearing.

I noticed while analyzing a podcast that I began to recognize common words like "you." I also noticed that I was able to easily distinguish between different people's voices.

I had to wonder if I were deaf, or if I become deaf, I would suddenly have a strong motivation to learn how to read these things. To develop some kind of device which would show them to me 24 hours a day.

I have not done this, but the project has remained in the back of my mind for over a decade.

Does anyone else know more about this? Does such a device exist?

I think that only some linguists learn how to read spectrograms. But it seems like something that might be extremely useful to any hearing impaired person?

Relating to the article, I think one could quickly learn to read them fluently (e.g. as subtitles, perhaps overlaid on real life), and of course you get the tonal information built in for free—that's what a spectrogram is!

reply
kiicia
3 hours ago
[-]
There was a guy who was able to recognize music just by looking at grooves of vinyl recording https://en.wikipedia.org/wiki/Arthur_Lintgen
reply
AndrewOMartin
7 hours ago
[-]
You're on the fringe of an area which in academia is called Sensory Substitution. A simplification of which is experiencing one of the five senses using different sense organs than usual. Classic examples of this are video cameras which represent their image as a matrix of vibrations on the subjects skin or as a sound.
reply
wincy
5 hours ago
[-]
I knew a blind guy who did a trial where he could “see” using his tongue. Pretty neat!

https://news.wisc.edu/a-taste-of-vision-device-translates-fr...

reply
foofoo12
10 hours ago
[-]
Very interesting idea. I remember reading that in visual spoken communications, only 20% is the actual words. The rest is tone of voice, body language, context, emphasis, expressions, ... all that stuff.

I don't know if 20% is correct, but I feel it's very close to it. I also think a lot of internet arguments happen as a direct result of miscommunication. Emojis are great, but they get abused to the point that HN filters them out. Perhaps allow readers to toggle if they want to see emojis or not?

reply
Isognoviastoma
10 hours ago
[-]
Easy to check: try to speak with someone talking foreign language you don't know and estimate what percentage of what they said you understood from tone of voice etc. I would guess it's less than 80%.
reply
foofoo12
7 hours ago
[-]
That's very easy and very wrong. Let's say you have a 100 page book. Page 1 contains fundamental knowledge that allows you to understand the rest of it. If you skip page 1 then you won't understand the other 99.

How much of the book will you understand if you only read page 1?

reply
ethmarks
41 minutes ago
[-]
But tonal information can be parsed without lexical understanding and vice versa.

Somebody cursing in French can still be interpreted as anger even if you don't understand French, and written profanity can still be interpreted as anger even if you didn't hear it spoken.

Tone and language do complent each other, but neither is a prerequisite for the other like your book analogy would suggest.

reply
foofoo12
21 minutes ago
[-]
> but tonal information can be parsed without lexical understanding

Parsed perhaps, but it's so context sensitive that it's not useful, save for extremities. The same tone of voice can have so many meanings based on what's actually being said and yet another if you add context.

reply
cenamus
8 hours ago
[-]
Maybe also control for cultural similarity, but I definitely agree
reply
eszed
3 hours ago
[-]
There's an acting exercise (it's from Joan Littlewood via Clive Barker) where one speaks "gibberish" - making language sounds, but not words - which, almost automatically, once they drop their terror of doing it, opens students up to all of those other avenues of communication. Later, you can switch students back and forth between the script and gibberish, and it becomes plain that if you can't play a scene as clearly (to those in it, not considering the audience) in gibberish as you can with words then you don't fully understand it.
reply
shomp
10 hours ago
[-]
The book Understanding Comics by Scott McCloud is a tremendous study in this area, Scott shows how you can add abstract meanings to words and pictures through illustration.
reply
failrate
7 hours ago
[-]
Comic books already use changes in font, weight, size, of text and the shape of the word balloon to indicate tone and expression.
reply
OisinMoran
10 hours ago
[-]
Something like this would be great for karaoke! Especially for the long held notes https://x.com/TheOisinMoran/status/1614435041764859907
reply
pimlottc
9 hours ago
[-]
Another thing to look at would be how games like Rock Band and Guitar Hero show lyrics
reply
mati365
11 hours ago
[-]
Consider learning Polish. Kurwa sounds exactly as it looks.
reply
58937928709622
2 hours ago
[-]
może morze rzeka rzeka
reply
voxleone
11 hours ago
[-]
Emojis absolutely have their place here. They can add tone, nuance, and a bit of humanity where plain text can feel flat.
reply
embedding-shape
11 hours ago
[-]
I feel like emojis is the lazy persons way of adding tone, nuance and humanity, when you don't know how to do so by only writing. Don't want to imply it's wrong, it's valid to be lazy, especially when it comes to improving communication, but I find myself thinking "How can I make sure this comes across as the joke it is?" and after one or two minute I just end up slapping a wink emoji at the end and don't rewrite the text at all, as the lazy person I am.
reply
pnut
3 hours ago
[-]
An idea compressed down into a single character is elegant and efficient.
reply
jonplackett
11 hours ago
[-]
When you only want to write w a single word back though + and emoji, there’s not a lot of space to add tone!
reply
realty_geek
11 hours ago
[-]
I've always wondered about this.

In Akan languages it is not difficult to conceive of how the same word can be written in different ways to convey another dimension.

Anyone who speaks an akan language will understand that each of these words below means good but with a slightly different emphasis.

papa papaaapa papapapapapa

What is the linguistic term for this concept?

reply
pegasus
11 hours ago
[-]
Apparently, it's called partial reduplication or emphatic doubling.
reply
realty_geek
10 hours ago
[-]
Thanks, that is helpful.

Chatgpt also explained the concept of ideophones which was helpful:

https://chatgpt.com/share/69187b3e-7948-8001-9fea-2b4412d5a7...

reply
beepbooptheory
8 hours ago
[-]
Reminds me of how the captions were done in Tony Scott's Man on Fire (2004). It's a pretty great movie too.
reply