Hints at Speech Inverse Filtering of Fricative Phonemes

Is it possible to invert fricatives by using Childers’ Toolboxes?

At first sight, I think that the answer is that you can’t. IIRC, Childers’ toolbox allowed for inversion of the sentence “we were away a year ago”. But that’s a very convenient sentence to invert, because most of its relevant acoustic information can be clearly seen with a formants analysis. Nevertheless, that’s not the case for fricatives (and nasals, for instance, have other interesting problems too).

For my thesis, I developed my own inversion toolbox. But no matter the toolbox, you require a “source” of information for inversion. That information may be spectral energy distribution, formants, etc. For fricatives, formants are out-of-question. Fricatives’ spectrum differs importantly from voiced phonemes’, as you know. When we utter fricatives, the oral tract naturally adopts a specific “constriction” configuration… and such configuration would yield a formantic structure. The problem is that turbulence generated in the oral tract hides resonances, and that’s why formant tracking is misleading in such cases.

My approach then was to use other type of information. I analyzed the spectrum of fricatives, locating those frequencies with the higher concentration of energy. During inversion, I favored articulatory configurations which would yield a higher concentration of energy in those detected “zones” of energy. Acoustic energy follows some relatively evident patterns. For example, /f/ exhibits a relatively uniform distribution of energy, whereas /s/ reveals an energy peak about 5kHz.

You could improve inversion further by harnessing a simple observation. Remember that short acoustic tubes have resonances in higher frequencies. You could tell your articulatory model to adopt configurations yielding short frontal acoustic tubes (which should lead the model to adopt an “anterior constriction” configuration). The only problem this approach has is that you’re providing the inversion method with some “extra” information. It depends on the goals of your research. If you don’t have to invert by using information coming from the analysis of “real” utterances (which is rarely the case, though) you may directly instruct your inversion algorithm to invert directly a frequency of 5kHz for the /s/ and of 18 kHz for the /f/… that is a cheap inversion, but would confirm your model’s relative suitability for inverting such fricative phonemes.

This entry was posted in programming and tagged , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

One Comment