Estimating “Good” variability in Speech Production using Invertible Neural Networks

1 minute read

Published:


Variability is inherent in skilled human motor movements. Playing a piano or riding a bicycle requires skilled coordination of motor elements, such as arms and legs, to achieve a motor goal. Although the movements are skillful, the positions of the motor elements are not exactly the same regardless of how many times they are repeated or executed (“repetition without repetition”; Bernstein, 1967). This variability in the form of repeated limb movements can be understood as an informative biological feature in the human motor system due to its underlying structure and regularity (Latash et al., 2002; Riley & Turvey, 2002; Sternad, 2018; Whalen & Chen, 2019), which previously had been disregarded as noise. One such structure of the skilled motor movements is that it is highly synergistic and flexibly organized when decomposed into “good” and “bad” parts of variability (i.e., the uncontrolled manifold approach or the UCM; Latash et al., 2002; Scholz & Schöner, 1999, 2014). Whether variability in speech production can also be decomposed into the same principle, however, has been rarely examined to date. Specifically, this project aims to focus on the “good” part of variability in speech production and explore the use of invertible neural networks as a quantitative approach to understand how “good” variability is structured and can be learned by these neural-net models.


International Seminar on Speech Production (ISSP) Project Website

Below is iframe of the webpage.