He has a PDF of his book about human hearing on his website: https://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018_smaller...
Because cochlear implants only rely on stimulating the places in the cochlea related to particular frequencies but do not play the actual frequencies themselves (for reasons unknown), people with cochlear implants can detect frequency differences but lose appreciation for music.
(I thought this was discussed at some point in Lyon's book but it's admittedly been many years since I read it, so I can't remember for sure.)
That author details how the "dawn chorus" is composed of a vast number of species making noise, but who are able to pick out mating calls and other signals due to evolving their vocalizations into unique sonic niches.
It's quite interesting but also a bit depressing as he documents the decline in intensity of this phenomenon with habitat destruction etc.
Maybe we don't have sonic variation, but temporal instead.
Even still, among the populations of birds not adapting to the city, they are being forcibly adapted in other ways. If the reach is too big, they die.
This is how evolution works, and has always worked. The world shifts, and those who can handle it thrive, while those who can't, suffer. It's the reason mammals are running the planet today when it was lizards just a couple million years ago.
Roughly half of the shifts in the last 11 evolutionary periods, over the last 500 million years, were caused by changes that occurred in a-few-hours-to-a-few-thousand-years with 75%-90% species lost.
Evolution did not fail to work then.
Perhaps the ear does someting more vaguely analogous to a discrete Fourier transforms on samples of data, which is what we do in a lot of signal processing.
In signal processing, we take windowed samples, and do discrete transforms on these. These do give us some temporal precision.
There is a trade off there between frequency and temporal precision, analgous to the Pauli exclusion principle in quantum mechanics. The better we know a frequency, the less precisely we know the timing. Only an infinite, periodic signal has a single precise frequency (or precise set of harmonics) which are infinitely narrow blips in the frequency domain.
The continuous Fourier transform deals with periodic signals only. We transform an entire function like sin(x) over the entire domain. If that domain is interpreted as time, we are including all of eternity, so to speak from negative infinite time to positive.
Sure, and the FFT isn't inherently biased towards one vs the other. If you take an FFT over a long time window (narrowband spectrogram) then you get good frequency resolution at the cost of time resolution, and vice versa for a short time window (wideband spectrogram).
For speech recognition ideally you'd want to use both since they are detecting different things. TFA is saying that this is in fact what our cochlea filter bank is doing, using different types of filter at different frequency ranges - better frequency resolution at lower frequencies where the formants are (carrying articulatory information), and better time resolution at the high frequencies generated by fricatives where frequency doesn't matter but accurate onset detection is useful for detecting plosives.
Did you mean the Heisenberg Uncertainty Principle instead? Or is there actually some connection of Pauli Exlusion Principle to conjugate transforms that I was’t aware of?
(I had put “sampling” in quotes as they’re actually “integration period” in this context of continuous time integration, though it would be less immediately evocative of the concept people are colloquially familiar with. If we actually further impose a constraint of finite temporal resolution so that it is honest-to-god “sampling” then it becomes Discrete Fourier Transform, of which the Fast Fourier Transform is one implementation of.)
It is this strict definition that the article title is rebuking, but it’s not quite what the colloquial usage loosely evokes in most people’s minds when we usually say Fourier Transform as an analysis tool.
So this article should have been comparing to Fourier Series analysis rather than Fourier Transform in the pedantic sense, albeit that’ll be a bit less provocative.
Regardless, it doesn’t at all take away from the salient points of this excellent article which are really interesting reframing of the concepts: what the ear does mechanistically is applying a temporal “weigting function” (filter) so it’s somewhere between Fourier series and Fourier transform. This article hits the nail on the head on presenting the sliding scale of conjugate domain trade offs (think: Heisenberg)
But yeah there is a strict vs colloquial collision here.
As the article briefly mentions, it's a tempting hypothesis that there is a relationship between the acoustic properties of human speech and the physical/neural structure of the auditory system. It's hard to get clear evidence on this but a lot of people have a hunch that there was some coevolution involved, with the ear's filter functions favoring the frequency ranges used by speech sounds.
This seems trivially true in the sense that human speech is intelligible by humans; there are many sounds that humans cannot hear and/or distinguish, and speech does not involve those.
The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.
A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.
Nobody who knows anything about signal processing has ever suggested that the ear performs a Fourier transform across infinite time.
But the ear does perform something very much akin to the FFT (fast Fourier transform), turning discrete samples into intensities at frequencies -- which is, of course, what any reasonable person means when they say the ear does a Fourier transform.
This article suggests it's accomplished by something between wavelet and Gabor. Which, yes, is not exactly a Fourier transform -- but it's producing something that is about 95-99% the same in the end.
And again, nobody would ever suggest the ear was performing the exact math that the FFT does, down to the last decimal point. But these filters still work essentially the same way as the FFT in terms of how they respond to a given frequency, it's really just how they're windowed.
So if anyone just wants a simple explanation, I would say yes the ear does a Fourier transform. A discrete one with windowing.
First, I think when you say FFT, you mean DFT. A Fourier transform is both non-discrete and infinite in time. A DTFT (discrete time fourier transform) is discrete, i.e. using samples, but infinite. A DFT (discrete fourier transform) is both finite (analyzed data has a start and an end) and discrete. An FFT is effectively an implementation of a DFT, and there is nothing indicating to me that hearing is in any way specifically related to how the FFT computes a DFT.
But more importantly, I'm not sure DFT fits at all? This is an analog, real-world physical process, so where is it discrete, i.e. how does the ear capture samples?
I think, purely based upon its "mode", what's happening is more akin to a Fourier series, which is the missing fourth category completing (FT, DTFT, DFT): Continuous (non-discrete), but finite or rather periodic in time.
But secondly, unlike Gabor transforms, wavelet transforms are specifically not just windowed Fourier anythings (whether FT/FS/DFT/DTFT). Those would commonly be called "short-time Fourier transforms" (STFT, existing again in discrete and non-discrete variants), and the article straight up mentions that they don't fit either in its footnotes.
Wavelet transforms use an entirely different shape (e.g. a haar wavelet) that is shifted and stretched for analysis, instead of windowed sinusoids over a windowed signal.
And I think those distinctions are what the article actually wanted to touch upon.
And the basilar membrane seems like a pretty un-discrete (in time, not in frequency) process to me. But I'm not 100% sure.
Sure, if you go small enough, you end up with discrete structures sooner or later (molecules, atoms, quantum if you go far down enough and everything breaks apart anyway), but without knowing anything, the sensitivity of this whole process still seems better modeled as continuous rather than discrete, the scale at which that happens seems just too small to me.
Yes, many neurons fire at discrete intervals set by their morphology. In fact, this DFT/FFT/Infinite-FT/whatever-FT is all the hell over neuroscience. Many neurons don't really 'communicate' in just a single action potential. They are mostly firing at each other all the time, and the rate of firing is what communicates information. So neuron A is always popping at neuron B, but that tone/rate of popping is what affects change/information.
Now, this is not nearly true of every single neuron-neuron interaction. Some do use a single action potential (your patella knee reflex), some communicate with hundreds of other neurons (pyramidal cells in your cerebellum), some inhibit the firing of other neurons (gap/dendrite junction/axon interactions), some transmit information in opposite ways. It's a giant mess and the exact sub system is what you have to specify to get a handle on things.
Also, you get whole brain wave activity during different periods of sleep and awake cycles. So all the neurons will sync up their firing rates in certain areas when you're dreaming or taking an SAT of something. And yes, you can influence mass cyclic firing with powerful magnets (TCMS).
For the cochlea here, these hair cells are mostly firing all the time and then when a sound/frequency that they are 'tuned' to is heard, then their firing pattern changes and that information is then transmitted toward the parietal lobes. To be clear too, there are a lot of other brain structures in the way before the info gets to a place where you can be conscious of it. Things like the medial nuclei, the trapezoidal bodies, the caleyx of Held, etc. Most of these areas are for discriminating sounds and the location of sounds in space. So like when your fan is on for a long while and you no longer hear it, that's because of the other structures.
this is believed to come from the shape of the cochlea, which is often modeled as a filterbank that can express this non-linearity in an intuitive way.
This description applies equally well to the discrete wavelet, discrete Gabor, and maybe even Hadamard transforms, which are definitely not, as you assert, "95–99% the same in the end" (how would you even measure such similarity?) So it is not something any reasonable person has ever meant by "the Fourier transform" or even "the discrete Fourier transform".
Also, you seem to be confused about what "discrete" means in the context of the Fourier transform. The ear functions in continuous time and does not take discrete samples.
this is the time-frequency uncertainty principle. intuitively it can be understood by thinking about wavelength. the more stretched out the waveform is in time, the more of it you need to see in order to have a good representation of its frequency, but the more of it you see, the less precise you can be about where exactly it is.
> but it does do a time-localized frequency-domain transform akin to wavelets
maybe easier to conceive of first as an arbitrarily defined filter bank based on physiological results rather than trying to jump directly to some neatly defined set of orthogonal basis functions. additionally, orthogonal basis functions cannot, by definition, capture things like masking effects.
> A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.
(4) size of the animal.
notably: some smaller creatures have supersonic vocalization and sensory capability, sometimes this is hypothesized to complement visual perception for avoiding predators, it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!
Now I'm imagining some alien shrew with vocal-cords (or syrinx, or whatever) that runs the entire length of its body, just so that it can emit lower-frequency noises for some reason.
Though of course, nature has plenty of other tricks, like how Koalas can go down to ~27hz. [2]
[0] https://acousticalengineer.com/fundamental-frequency-calcula...
I wonder if these could be used to better master movies and television audio such that the dialogue is easier to hear.
It's called the short-time Fourier transform (STFT).
Nobody who knows literally anything about signal processing thought the ear was doing a Fourier transform. Is it doing something like a STFT? Obviously yes and this article doesn't go against that.
The article is pretty much "cows aren't actually spheres guys".
But the title instead makes it sound (pun unintended) that what the ear does is not about frequency decomposition at all.
"We call this tonotopic organization, which is a mapping from frequency to space. This type of organization also exists in the cortex for other senses in addition to audition, such as retinotopy for vision and somatotopy for touch."
So the cochlea does frequency decomposition but not by performing a FT (https://en.wikipedia.org/wiki/Fourier_transform), but rather by a biomechanical process involving numerous sensors that are sensitive to different frequency ranges ... similar to how we have different kinds (only 3, or in birds and rare humans 4) of cones in the retina that are sensitive to different frequency ranges.
The claim that the title makes it sound like what the ear does is not about frequency decomposition at all is simply false ... that's not what it says, at all.
What would it mean for a sound to not be localized in time?
Zooming in to cartoonish levels might drive the point home a bit. Suppose you have sound waves
  |---------|---------|---------|
Let's zoom out a bit. What's the frequency over a longer period of time, capturing a few peaks?
Well...if you know there is only one frequency then you can do some math to figure it out, but as soon as you might be describing a mix of frequencies you suddenly, again, potentially don't have enough information.
That lack of information manifests in a few ways. The exact math (Shannon's theorems?) suggests some things, but the language involved mismatches with human perception sufficiently that people get burned trying to apply it too directly. E.g., a bass beat with a bit of clock skew is very different from a bass beat as far as a careless decomposition is concerned, but it's likely not observable by a human listener.
Not being localized in time means* you look at longer horizons, considering more and more of those interactions. Instead of the beat of a 4/4 song meaning that the frequency changes at discrete intervals, it means that there's a larger, over-arching pattern capturing "the frequency distribution" of the entire song.
*Truly time-nonlocalized sound is of course impossible, so I'm giving some reasonable interpretation.
Are you talking about a discrete signal or a continuous signal?
Of course, none of these are completely nonlocalized in time. Sooner or later there will be a blackout and the transformer will go silent. But it's a lot less localized than the chirp of a bird.
Imagine the dissonant sound of hitting a trashcan.
Now imagine the sound of pressing down all 88 keys on a piano simultaneously.
Do they sound similar in your head?
The localization is located at where the phase of all frequency components are aligned coherently construct into a pulse, while further down in time their phases are misaligned and cancel each other out.
We can make a short-time fourier transform or a wavelet transform in the same way either by:
- filterbank approach integrating signals in time
- take fourier transform of time slices, integrating in frequency
The same machinery just with different filters.
Well from an evolutionary perspective, this would be unsurprising, considering any other forms of language would have been ill-fitted for purpose and died out. This is really just a flavor of the anthropic principle.
Why do we need a summary in a post that adds nothing new to the conversation?
I'm no expert in these matters just speculating...
I spent a while reading up on that stuff because I was trying to figure what causes my tinnitus. My best guess is if the hairs over bend, that stuff can break and an ion channel get stuck open causing the cell to fire continually.
Another fun ear fact is they incorporate active amplification. You can hook an electrical signal to the loudspeaker type cell to make it vibrate around https://youtu.be/pij8a8aNpWQ
Most people in Physics only know sinus and maybe sometimes rectangles as a base for transformations, but mathematically you could use a lot of other things - maybe very similar to sinus, but different.
Ironic for a video about hearing.
The poor man's conversion of finite to equivalent infinite time is if you assume an infinite signal where the initial finite one is repeated infinately to the past and the future.
I know of vocoders in the military hardware that encode voices to resemble something more simple for compression (a low-tone male voice), smaller packets that take less bandwidth. This evolution of the ear to must also have evolved with our vocal chords and mouth to occupy available frequencies for transmission and reception for optimal communication.
The parallels with waveforms don't end there. Waveforms are also optimized for different terrains (urban, jungle).
Are languages organic waveforms optimized to ethnicity and terrain?
Cool article indeed.
It gave me a much better intuition than my math course.
People love to go on about how brilliant it is and they're probably right but that's how I understand it.
Phase matters for some wideband signals, but most folks struggle to tell apart audio from hilbert-90-degree-shifted-audio
It's very comprehensive, but it's also very well written and walks you through the mechanics of Fourier transforms in a way that makes them intuitive.
This video does a great job explaining what it is and how it works to the layman. 3blue1brown - https://www.youtube.com/watch?v=spUNpyF58BY
Perhaps finally I should learn too…
I have been told that reversing the process — creating a time-based waveform — will not resemble (visually) the original due to this phase loss in the round-tripping. But then our brain never paid phase any mind so it will sound the same to our ears. (Yay, MP3!)
Also, pedantic nit: phase would be the imaginary exponent of the spectrum rather than the imaginary part directly, i.e, you take the logarithm of the complex amplitude to get log-magnitude (real) plus phase (imag)
That being said, round-tripping works just fine, axiomatically so, until you go out of your way to discard the imaginary component.
https://en.wikipedia.org/wiki/Cepstrum
It’s literally a "backwards spectrum", and the authors in 1963 were having such jolly fun they reversed the words too: quefrency => frequency, saphe => phase, alanysis => analysis, liftering => filtering
The cepstrum is the "spectrum of a log spectrum," where taking the complex logarithm turns multiplicative spectral features into additive ones, laying the foundation of cepstral alanysis, and later, the physiologically tuned Mel-frequency cepstrum used in audio compression and speech recognition.
https://en.wikipedia.org/wiki/Mel_scale
>The mel scale (after the word melody)[1] is a perceptual scale of pitches judged by listeners to be equal in distance from one another. [...] Use of the mel scale is believed to weigh the data in a way appropriate to human perception.
As Tukey might say: once you start doing cepstral alanysis, there’s no turning back, except inversely.
Skeptics said he was just going through a backwards phase, but it turned out to work! ;)
https://news.ycombinator.com/item?id=24386845
DonHopkins on Sept 5, 2020 | parent | context | favorite | on: Mathematicians should stop naming things after eac...
I love how they named the inverse spectrum the cepstrum, which uses quefrency, saphe, alanysis, and liftering, instead of frequency, phase, analysis and filtering. It should not be confused with the earlier concept of the kepstrum, of course! ;)
https://en.wikipedia.org/wiki/Cepstrum
>References to the Bogert paper, in a bibliography, are often edited incorrectly. The terms "quefrency", "alanysis", "cepstrum" and "saphe" were invented by the authors by rearranging some letters in frequency, analysis, spectrum and phase. The new invented terms are defined by analogies to the older terms.
>Thus: The name cepstrum was derived by reversing the first four letters of "spectrum". Operations on cepstra are labelled quefrency analysis (aka quefrency alanysis[1]), liftering, or cepstral analysis. It may be pronounced in the two ways given, the second having the advantage of avoiding confusion with "kepstrum", which also exists (see below). [...]
>The kepstrum, which stands for "Kolmogorov-equation power-series time response", is similar to the cepstrum and has the same relation to it as expected value has to statistical average, i.e. cepstrum is the empirically measured quantity, while kepstrum is the theoretical quantity. It was in use before the cepstrum.[12][13]
https://news.ycombinator.com/item?id=43341806
DonHopkins 7 months ago | parent | context | favorite | on: What makes code hard to read: Visual patterns of c...
Speaking of filters and clear ergonomic abstractions, if you like programming languages with keyword pairs like if/fi, for/rof, while/elihw, goto/otog, you will LOVE the cabkwards covabulary of cepstral quefrency alanysis, invented in 1963 by B. P. Bogert, M. J. Healy, and J. W. Tukey:
cepstrum: inverse spectrum
lifter: inverse filter
saphe: inverse phase
quefrency alanysis: inverse frequency analysis
gisnal orpcessing: inverse signal processing
https://en.wikipedia.org/wiki/Cepstrum
https://news.ycombinator.com/item?id=44062022
DonHopkins 5 months ago | parent | context | favorite | on: The scientific “unit” we call the decibel
At least the Mel-frequency cepstrum is honest about being a perceptual scale anchored to human hearing, rather than posing as a universally-applicable physical unit.
https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
>Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal spectrum. This frequency warping can allow for better representation of sound, for example, in audio compression that might potentially reduce the transmission bandwidth and the storage requirements of audio signals.
https://en.wikipedia.org/wiki/Psychoacoustics
>Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, and music. Psychoacoustics is an interdisciplinary field including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.
I found this quite interesting, as I have noticed that I can detect voices in high-noise environments. E.g. HF Radio where noise is almost a constant if you don't use a digital mode.
Neuroanatomy, Auditory Pathway
https://www.ncbi.nlm.nih.gov/books/NBK532311/
Cochlear nerve and central auditory pathways
https://www.britannica.com/science/ear/Cochlear-nerve-and-ce...
Molecular Aspects of the Development and Function of Auditory Neurons
neural signaling by action potential, is also a representation of intensity by frequency.
the cochlea is where you can begin to talk about bio-FT phenomenon.
however the format "changes" along the signal path, whenever a synapse occurs.
Cutting the hearing nerve does not cure tinnitus.
It develops due to a destruction of hearing cells that leads the brain to upregulate gain to catch a weak/absent signal, when the deprivation pattern is just right. (no tinnitus develops when the hearinf nerve is cut -> deprivation pattern matters)
Are you perhaps experiencing some high frequency hearing loss?
In the middle range (say, A2 through A6) neither of these issues apply, so it is - by far - the easiest to tune.
Which is why we can hear individual instruments in a mix.
And this ability to separate sources can be trained. Just as pitch perception can be trained, with varying results from increased acuity up to full perfect pitch.
A component near the bottom of all that is range-based perception of consonance and dissonance, based on the relationships between beat frequencies and fundamentals.
Instead of a vanilla Fourier transform, frequencies are divided into multiple critical bands (q.v.) with different properties and effects.
What's interesting is that the critical bands seem to be dynamic, so they can be tuned to some extent depending on what's being heard.
Most audio theory has a vanilla EE take on all of this, with concepts like SNR, dynamic range, and frequency resolution.
But the experience of audio is hugely more complex. The brain-ear system is an intelligent system which actively classifies, models, and predicts sounds, speech, and music as they're being heard, at various perceptual levels, all in real time.
That's a side note, the rest of what you wrote was very informative!
The computer does not do a Fourier transform (FFT computes the discrete Fourier transform)
Spectroscope dont do a Fourier transform (it's actually the short time FT)
The only thing that actually does Fourier transform is a mathematician, with a pen and some paper.
But, to the vast majority who don't really know or care about the math, "Fourier Transform" is, at best, a totem for the entire concept space of "frequency domain", "spectral decomposition", etc.
They are not making fine distinctions of tradeoffs among different methods. I'm not sure I'd even call it disinformation to tell this hand-wavy story and pique someone's interest in a topic they otherwise never thought about...
there appears to be no software for this, its all hardware, the signal format flips as it travels through the anatomy.
Owls use asymmetric skull structure which helps them in spatial perception of sound.
neurosynaptically, there is no phase, there is frequency shift corresponding to presynaptic intensity, and there is spatio-temporal integration of these signals. temporal integration is where "phase" matters
its all a mix of "digital" all or nothing "gates" and analog frequency shift propagation of the "gate" output.
its all made nebulous by the adaptive, and hysteretic nature of the elements in neural "circuitry"
As higher-order, statistically transparent abstract nudges of providence existing outside the confines of causality? Metaphysically interesting but philosophically futile.
The content is generally good but I'd argue that the ear is indeed doing very Fourier-y things.
On one corner of the square, you have Fourier Transforms, which are essentially contiguous and infinite. On the opposite corner, you have the DFT, which is both finite (or periodic) and discrete. Hearing is more akin to a Fourier Series, which is finite/periodic but contiguous. That's probably not what the article aims at addressing, though.
But then wavelet transforms are different from Fourier Series again, because you have shifted and stretched shapes (some of them quite weird) instead of sinusoids.
But yeah, colloquially, I agree, the ear is indeed doing very Fourier-y things.