Abstract
Speech is exceedingly nonlinear. Efforts to propose non-linear models of its dynamics are worth to be made but difficult to implement since nonlinearity is not easily handled from an engineering and mathematical point of view. This paper is an attempt to make accessible to untrained people the notion of nonlinearity in speech, revising several nonlinear speech phenomena and the engineering endeavour for modeling them.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albrecht, D.G., Geisler, W.S.: Motion Selectivity and the Contrast Response Function of Simple Cells in the Visual Cortex. Visual Neuroscience 7(6), 531–546 (1991)
Atal, B.S., Hanauer, S.L.: Speech Analysis and Synthesis by Linear Prediction of Speech Wave. J. Acoustic. Soc. Amer. 50(2), 637–655 (1971)
Bastari, A., Squartini, S., Piazza, F.: Underdetermined Blind Separation of Speech Signals with Delays in Different Time-Frequency Domain. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 136–163. Springer, Heidelberg (2005)
Bekesy, G.V.: Experiments in Hearing. McGraw-Hill, New York (1960)
Bekesy, G.V.: Sensory Inhibition. Princeton University Press, Princeton (1967)
Bell, C.G., Fujisaki, H., Heinz, J.M., Stevens, K.N., House, A.S.: Reduction of Speech Spectra by Analysis.by.Synthesis Techniques. J. Acoustic. Soc. Amer. 33, 1725–1736 (1961)
Chollet, G., McTait, K., Petrovska-Delacretaz, D.: Data Driven Approaches to Speech and Languages Processing. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 164–198. Springer, Heidelberg (2005) (to be Published)
Cosi, P., De Mori, R., Vagges, K.: A Neural Network Architecture for Italian Vowel Recognition. In: Proceedings of VERBA 1990, Rome, Italy, pp. 22–24 (1990)
Cosi, P., Bengio, Y., De Mori, R.: Phonetically-Based Multi-Layered Neural Networks for Vowel Classification. Speech Comm 9(1), 15–29 (1990)
Cosi, P., Ferrero, F.: Applicazione di un Modello del Sistema Uditivo Periferico alla Segmentazione Automatica del Segnale Vocale. In: AIA Proceedings, Atti del XX Convegno Nazionale di Acustica, Roma ( April 1992)
Cosi, P., Frasconi, P., Gori, M., Griggio, N.: Phonetic Recognition Experiments with Recurrent Neural Networks. In: Proc. ICSLP, pp. 1335–1338 (1992)
Cosi, P.: Auditory Modelling for Speech Analysis and Recognition. In: Cooke, M., Beet, S., Crawford, M. (eds.) Visual Representation of Speech Signals, pp. 205–212. Wiley & Sons, Chichester (1993)
Cosi, P.: Auditory Modeling and Neural Networks. In: Chollet, G., Di Benedetto, M.G., Esposito, A., Marinaro, M. (eds.) Speech Processing, Recognition, and Artificial Neural Networks, pp. 54–84. Springer, Berlin (1999)
Cummiskey, P., Jayant, N.S., Flanagan, J.L.: Adaptive Quantization in Differential PCM Coding of Speech. Bell Syst. Tech. J., 1105–1118 (1973)
Delgutte, B.: Representation of Speech-like Sounds in the Discharge Patterns of Auditorynerve Fibers. J. Acoustic. Soc. Amer. 68, 843–857 (1980)
Delgutte, B., Kiang, N.Y.S.: Speech Coding in the Auditory Nerve: I Vowel-like Sounds. J. Acoustic. Soc. Amer. 75, 866–878 (1984)
Delgutte, B., Kiang, N.Y.S.: Speech Coding in the Auditory Nerve: II Processing Schemes for Vowel-like Sounds. J. Acoustic. Soc. Amer. 75, 879–886 (1984)
Delgutte, B., Kiang, N.Y.S.: Speech Coding in the Auditory Nerve: III Voiceless Fricative Consonants. J. Acoustic. Soc. Amer. 75, 887–896 (1984)
Delgutte, B., Kiang, N.Y.S.: Speech Coding in the Auditory Nerve: IV Sounds with Consonant- Like Dynamic Characteristics. J. Acoustic. Soc. Amer. 75, 897–907 (1984)
Esposito, A., Rampone, S., Stanzione, C., Tagliaferri, R.: A Mathematical Model for Speech Processing. In: Proceedings of IEEE on Neural Networks for Signal Processing, pp. 194–203 (1992)
Esposito, A., Rampone, S., Stanzione, C., Tagliaferri, R.: Experimental Results on a Model of the Peripheral Auditory Apparatus. In: Proceedings of International Workshop on Neural Networks for Speech Recognition, Lint, Trieste, pp. 163–177 (1992)
Esposito, A., Aversano, G.: Text Independent Methods for Speech Segmentation. In: Chollet, G., Esposito, A., Faundez-Zauny, M., Marinaro, M. (eds.) Advances in Nonlinear Speech Modeling and Applications. LNCS, Springer, New York (2005) (to be Published)
Fant, G.: Preliminaries to Analysis of the Human Voice Source. Speech Communication Group Working Papers. Research Laboratory of Electronics, Massachusetts Institute of Technology 3 (1983)
Faundez-Zanuy, M.: Nonlinear Speech Processing: Overview and Possibilities in Speech Coding. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 15–42. Springer, Heidelberg (2005)
Flanagan, J.L., Golden, R.M.: Phase Vocoder. Tech J. 45, 1493–1509 (1966)
Fletcher, H.: Auditory Patterns. Review of Modern Physics 13, 47–65 (1940)
Gauffin, J., Hammarberg, B., Imaizumi, S.: A Microcomputer Based System for Acoustic Analsyis of Voice Characteristics. In: Proceedings of ICASSP 1986, Tokyo, vol. 1, pp. 681–684 (1986)
Gold, B.: Note on Buzz.Hiss Detection. J. Acoustic. Soc. Amer. 36, 1659–1661 (1964)
Gold, B., Rabiner, L.R.: Parallel Processing Technique for Estimating Pitch Periods of Speech in the Time Domain. J. Acoustic. Soc. Amer. 46(2), 442–449 (1969)
Gold, B., Rader, C.M.: Digital Processing of Signals. McGraw-Hill, New York (1969)
Gold, B., Rader, C.M.: System for Compressing the Bandwidth of Speech. IEEE Trans. Audio Electroacoustic AU.15, 131–135 (1967)
Goldhor, R.S.: Representation of Consonants in the Peripheral Auditory System: A Modeling Study of the Correspondence between Response Properties and Phonetic Features. RLE Technical Report N. 505, MIT press (1985)
Haykin, S.: Signal Processing in Nonlinear Nongaussian and Nonstationary World. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 43–53. Springer, Heidelberg (2005)
Hussain, A., Durrani, T.S., Soraghan, J.J., Aikulaibi, A., Mterwa, N.: Nonlinear Adaptive Speech Enhancement Inspired by Early Auditory Processing. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 291–316. Springer, Heidelberg (2005) (to be Published)
Itakura, F.: Minimum Prediction Residual Principle Applied to Speech Recognition. IEEE Trans. Acoust., Speech, and Signal Process., ASSP 23, 67–72 (1975)
Jankowski Jr., C.R., Vo, H.-D.H., Lippmann, R.P.: A Comparison of Signal Processin Front Ends for Automatic Word Recognition. IEEE Trans Speech and Audio Processing SAP-3(3), 286–293 (1995)
Javkin, H.R., Antonanzas-Barroso, N., Maddieson, I.: Digital Inverse Filtering for Linguistic Research. Journal of Speech and Hearing Research 30, 122–129 (1987)
Jayant, N.S.: Digital Coding of Speech Waveform. Proc. IEEE 62, 611–632 (1964)
Johnson, D.H., Swami, A.: The Transmission of Signals by Auditory-Nerve Fiber Discharge Patterns. J. Acoustic. Soc. Amer. 74, 493–501 (1983)
Keller, E.: The Analysis of Voice Quality in Speech Processing. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 54–73. Springer, Heidelberg (2005)
Kim, D.O., Molnar, C.E.: A Population Study of Cochlear Nerve Fibers: Comparison of Spatial Distributions of Average-Rate and Phase Locking Measures of Responses to Single Tones. J. of Neurophysiology 42, 16–30 (1979)
Kim, D.O., Molnar, C.E., Matthews, J.W.: Cochlear Mechanics: Nonlinear Behaviour in Two-Tone Responses as Reflected in Cochlear-Nerve-Fiber Responses and in Ear-Canal Sound Pressure. J. Acoustic. Soc. Amer. 67, 1704–1721 (1980)
Kubin, G., Lainscsek, C., Rank, E.: Identification of Nonlinear Oscillator Models for Speech Analysis and Synthesis. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 74–113. Springer, Heidelberg (2005) (to be Published)
Lakoff, G., Johnson, M.: Metaphors We Live By, pp. 10–11. University of Chicago Press, Chicago (1980)
Lyon, R.F.: A Computational Model of Filtering, Detection, and Compression in the Cochlea. In: Proceedings of IEEE-ICASSP, pp. 1282–1285 (1982)
Murphy, P., Akande, O.: Cepstrum-Based Harmonics-to-Noise Ratio Measurements in Voiced Speech. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 199–218. Springer, Heidelberg (2005) (to be Published)
Markel, J.D., Gray, A.H., Wakita, H.: Linear Prediction of Speech Theory and Practice. Speech Communications. Santa Barbara, California, SCRL monograph 10 (1973)
Martin, T.: Acoustic Recognition of a Limited Vocabulary in Continuous Speech. Ph.D Thesis, Uni. Pennsylvania, Philadelphia (1970)
Meddis, R.: Simulation of Mechanical to Neural Transduction in the Auditory Receptor. J. Acoustic. Soc. Amer. 79, 702–711 (1986)
Mermelstein, P.: Computer Generated Spectrogram Displays for On.Line Speech Research. IEEE Trans. Audio Electroacoustic. AU.19, 44–47 (1971)
Noll, A.M.: Cepstrum Pitch Determination. J. Acoustic. Soc. Amer. 41, 293–309 (1967)
Oppenheim, A.V.: A Speech Analysis.Synthesis System Based on Homomorphic Filtering. J. Acoustic. Soc. Amer. 45, 458–465 (1969)
Oppenheim, A.V.: Speech Spectrograms Using the Fast Fourier Transform. IEEE Spectrum 7, 57–62 (1970)
Oppenheim, A.V., Schafer, R.W.: Homomorphic Analysis of Speech. IEEE Trans. Audio Electroacoust AU16, 221–226 (1968)
Oppenheim, A.V., Schafer, R.W., Stochham, S.: Nonlinear Filtering of Multiplied and Convolved Signals. Proc. IEEE 56, 1264–1291 (1968)
Oppenheim, A.V., Schafer, R.W.: Digital Signal Processing. Englewood Cliffs, N.J (1975)
Petek, B.: Predictive Connectionist Approach to Speech Recognition. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 219–243. Springer, Heidelberg (2005) (to be Published)
Reddy, D.R.: Computer Recognition of Connected Speech. J. Acoustic. Soc. Amer. 42(2), 329–347 (1967)
Rose, J.E., Brugge, J.F., Anderson, D.J., Hindi, J.E.: Patterns of Activity in Single Auditory Nerve Fibers of the Squirrel Monkey. In: de Reuck, A.V.S., Knight, J. (eds.) Hearing Mechanisms in Vertebrate, Churchill, London, pp. 144–168 (1968)
Rothenberg, M.: A New Inverse-Filtering Technique for Deriving the Glottal Airflow Waveform during Voicing. Journal of Acoustical Society of America 53, 1632–1645 (1973)
Rothenberg, M.: Measurement of Airflow in Speech. Journal of Speech and Hearing Research 20, 155–176 (1977)
Rothenberg, M.: Acoustic Interaction between the Glottal Source and Vocal Tract. In: Stevens, K.N., Hirano, H. (eds.) Vocal Fold Physiology, pp. 305–328. Tokyo Press (1981)
Rothenberg,M.: Inverse Filtering on your Laptop, http://www.rothenberg.org/contents.htm
Rouat, J., Pichevar, R., Loiselle, S.: Perceptive Nonlinear Speech Processing and Spiking Neural Networks. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 317–337. Springer, Heidelberg (2005) (to be Published)
Sachs, M.B., Young, E.D.: Encoding of Steady State Vowels in the Auditory Nerve: Representation in Terms of Discontinuities. J. Acoustic. Soc. Amer. 66, 470–479 (1979)
Schafer, R.W., Rabiner, L.R.: System for Automatic Formant Analysis of Voiced Speech. J. Acoustic. Soc. Amer. 47(2), 634–648 (1970)
Schafer, R.W., Rabiner, L.R.: Design of Digital Filter Banks for Speech Analysis. Bell Syst. Tech. Journ. 50(10), 3015–3097 (1971)
Schafer, R.W., Rabiner, L.R.: Design and Simulation of a Speech Analysis.Synthesis System Based on Short.Time Fourier Analysis. IEEE Trans. Audio Electroacoustic. AU.21, 165–174 (1973)
Schoentgen, J.: Speech Modeling based on Acoustic-to-Articulatory Mapping. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 114–135. Springer, Heidelberg (2005) (to be Published)
Schroeder, M.H., Hall, J.L.: Model for Mechanical to Neural Transduction in the Auditory Receptor. J. Acoustic. Soc. Amer. 55, 1055–1060 (1974)
Schroeder, M.R.: Vocoders, Analysis and Synthesis of Speech. Proc. IEEE 54, 720–754 (1966)
Schroeder, M.R.: Period Histogram and Product Spectrum: New Methods for Fundamental Frequency Measurements. J. Acoustic. Soc. Amer. 43(4), 829–834 (1968)
Seneff, S.: Pitch and Spectral Analysis of Speech Based on an Auditory Synchrony Model. Ph. D. Thesis of Speech Communication Group, MIT, Cambridge, MA (1985)
Seneff, S.: A Joint Synchrony/Mean-Rate Model of Auditory Speech Processing. Journal of Phonetics 16, 55–76 (1988)
Shannon, C.E., Weaver, W.: Mathematical Theory of Communication. University of Illinois Press, US (1949)
Silverman, H.R., Dixon, N.R.: A Parametrically Controlled Spectral Analysis System for Speech. IEEE Trans on Acoustic. Speech and Signal Processing ASSP.22(2), 362–381 (1974)
Smith, R.L., Brachman, M.L., Frisina, R.D.: Sensitivity of Auditory-Nerve Fibers to Changes in Intensity: A Dichotomy Between Decrements and Increments. J. Acoustic. Soc. Amer. 78, 1310–1316 (1985)
Smith, J.C., Zwislocki, J.J.: Short-Term Adaptation and Incremental Responses of Single Auditory-Nerve Fibers. Biol. Cybernetics 17, 169–182 (1975)
Sondhi, M.M.: New Methods of Pitch Detection. IEEE Trans. Audio Electroacoustic AU.16(2), 262–266 (1968)
Stewart, J.L.: The Bionic Ear. Covox Company, Santa Maria, California
Stylianou, Y.: Modeling Speech based on Harmonic plus Noise Models. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 244–260. Springer, Heidelberg (2005)
Trask, R.L.: A Dictionary of Phonetics and Phonology. Routledge, London,UK (1996)
Young, E.D., Sachs, M.B.: Representation of Steady-State Vowels in the Temporal Aspects of the Discharge Pattern of Populations of Auditory Nerve Fibers. J. Acoustic. Soc. Amer. 66, 1381–1403 (1979)
Zwicker, E.: Psychoacoustics. Springer, Berlin (1962)
Zwicker, E.: Suddivision of the Audible Frequency Range into Critical Bands. J. Acoustic. Soc. Amer. 88, 248–249 (1961)
Zwislocki, J.J.: On Intensity Characteristics of Sensory Receptors: A Generalized Function. Kybernetik 12, 169–183 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esposito, A., Marinaro, M. (2005). Some Notes on Nonlinearities of Speech. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_1
Download citation
DOI: https://doi.org/10.1007/11520153_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.