Uni Stuttgart
---  Home ---  Events
---  Lehre ---  Jobs
---  Forschung ---  Kontakt
---  Resourcen ---  Englisch

[Unilogo]

 Universität Stuttgart 
 Institut für Maschinelle Sprachverarbeitung 
 Bernd Möbius 
Home - Teaching - Research - TTS - Publications - Curriculum Vitae
 

Graduiertenkolleg

Forschungsprogramm / Research Program

"Prosodic Representations in Language and Speech"
 

Ziele des Programms / Research Goals

My primary research interests are relevant for the research program of the Graduiertenkolleg in several respects. The common denominator for this link between my individual and the Kolleg's joint research programs may be formulated as the investigation of symbolic, acoustic, and cognitive representations of prosody in both language and speech.

One of my major goals in the Graduiertenkolleg is to establish a research paradigm that will facilitate the study of prosody by providing direct conceptual links between its manifestations and representations in speech production, speech acoustics, and speech perception. The DFG project "A computational model of target oriented production of prosody" (2001-2004, jointly coordinated by Grzegorz Dogil and myself) has aimed to provide experimental evidence for the validity of extending Guenther and Perkell's speech production model (Guenther, 1995; Guenther et al., 1998; Perkell et al., 2000; Guenther, 2003) to the domain of prosody; a certain degree of synergy from the partial overlap between this project and the present research program is expected.

My second main research goal is to investigate linguistic/symbolic representations of prosody, with the ultimate goal of building models that can be implemented in the linguistic components of natural language and speech systems. Such computational models will allow us to empirically verify our interpretations of prosodic representations. In his "blueprint of the speaker", Levelt (1999) has outlined an agenda that calls for the design and implementation of empirically viable working models of the processing components involved in the blueprint. The Prosody Generator is among the least elaborated modules in the blueprint, despite or because of the - generally recognized - important integrating function of prosody in the organization and production of speech (Levelt, 1989; Dogil, 2000). This part of my research program will overlap to some extent with Grzegorz Dogil's research, and in fact we expect to carry out jointly some of the pertinent research.

Finally, and in extension of the second goal outlined above, I am interested in the design of interface(s) between semantic, syntactic, phonological and phonetic representations of prosody. The representational design is meant to have its motivation in linguistic and phonetic theory, and its validity should be verifyable in an end-to-end generation system, such as a concept-to-speech system.

I intend to contribute to the general research program of the Graduiertenkolleg in the following ways:

  • Theoretical basis for the production and perception of prosody, and consideration of the symbolic, acoustic, and cognitive representation of prosody.
  • Theoretical competence as well as empirical and experimental tools for the study of prosodic representations.
  • Methodological guidance for the study of prosodic representations and their interpretations in the context of human and machine processing of prosody.

Forschungsschwerpunkte / Research Topics

1. Theoretical Framework for the Study of Prosody Acquisition, Perception, and Production

I am interested in investigating Exemplar Theory as a common theoretical framework for the study of prosody acquisition, perception, and production by humans, as well as prosody generation by machine. Whereas Pierrehumbert (2001) discusses primarily segmental evidence, I will argue that the key ingredients of the theory can be generalized and extended to the prosodic domain.

2. Prosody Production and Generation

I intend to explore to what extent Exemplar Theory, besides the speech production model (Guenther, 1995; Guenther et al., 1998; Perkell et al., 2000) that the DFG project "A computational model of target oriented production of prosody" (Dogil and Möbius, 2001c; see also Dogil and Möbius, 2001a, 2001b) builds on, is also a relevant theoretical framework for the study of prosody in speech production. More concretely, the theory might help explain how the multidimensional target regions of speech events are established and maintained in an individual speaker (production) and across speakers (perception). I predict that frequency effects, entrenchment, and exemplar cloud updates are crucial ingredients of such an explanation.

Exemplar Theory also provides a motivational link between human speech production and automatic speech generation. In speech production a particular exemplar is selected to realize a given category; Pierrehumbert (2001) discusses three models, with increasing complexity, of how the selection is performed. Currently, neither of these models take into account the context in which the target event is embedded. I suggest that the segmental, prosodic and positional contexts be considered explicitly in a model of exemplar selection in speech production.

Pierrehumbert's notion of an abstract, idealized exemplar prototype in speech production (Pierrehumbert, 2001) is compatible with the notion of an ideal point in a multidimensional target region as it is currently applied in automatic speech synthesis (Möbius, 2001a). Whereas prosody certainly contributes to the multidimensionality of target specifications, it is considered as secondary information in state-of-the-art unit selection synthesis systems (e.g., Balestri et al., 1999; Beutnagel et al., 1999; Taylor and Black, 1999), and one of my research goals is to exploit more explicitly the selectional restrictions imposed by appropriate prosodic representations.

3. Internal Representations of Prosody

Phonemic settings and the internal models that they represent are learned in the process of language and speech acquisition. Postural settings, in contrast, rely on continuous auditory monitoring and tend to break down quickly if this monitoring process is inhibited during speech production. Evidence presented in the literature indicates that stable internal models are mostly associated with segmental phonemic targets (Perkell et al., 2000), whereas prosodic features often display postural characteristics. I will argue that the dichotomy of phonemic and postural settings applies not only to segmental properties of speech but to prosodic features as well.

When compared to segmental characteristics of speech, which are best subserved by strong and stable internal representations (Perkell et al., 2000), prosodic properties may rely more strongly on a balanced mixture of continuous, auditory feedback-based update and learned internal models (Jones and Munhall, 2001). Based on evidence reported in the literature (e.g., Jilka, 2000) and on theoretical considerations I intend to test two hypotheses: first, that the relative importance of acquired internal models of phonemic targets, on the one hand, and of immediate adjustments of postural settings, on the other hand, is flexible and depends on the actual communicative and situative conditions; and second, that the speaker may have access to several internal models, each representing the most appropriate balance of phonemic and postural settings for a prototypical communicative and situative context (Möbius and Dogil, 2002).

4. Acquisition of Prosody

The DIVA model (Guenther, 1995) provides a simulation of the acquisition of internal phonemic models. An exemplar-based interpretation of the target regions also needs to explain how such internal models emerge during language and speech acquisition. In the DFG project "Ein exemplartheoretisches Modell zum Erwerb der akustischen Korrelate der Betonung" (2004-2006) I have started to study the acquisition of syllabic stress. The project will investigate when children start perceiving stress contrasts, when and how they start realizing syllabic stress, and whether they tend to produce stress by actively using the same acoustic correlates of stress as their parents do, as would be predicted by exemplar theory. The acquisition of prosodic targets should be investigated beyond the specific topic of syllabic stress.

The parts of my research program outlined above (1.-4.) will be carried out in close collaboration with Grzegorz Dogil.

5. Concept-to-Speech Generation

Concept-to-speech (CTS) systems (Alter et al., 1997; Teich et al., 1997) provide a direct link between language generation and acoustic-prosodic components (Möbius, 2001a; Batliner and Möbius, 2002). But to fully exploit the improvement to synthesized prosody potentially available in a CTS system, the optimal granularity of information needs to be defined. On the one hand, the acoustic-prosodic components will have to specify exactly which pieces of linguistic information are optimally required to produce naturally sounding prosody; on the other hand, this specification must be synchronized with the type of information that a language generation component can be reasonably expected to provide. The research challenge can thus be formulated in short-hand as the design of an optimal semantics/syntax-prosody interface.

This part of my research program is expected to benefit from collaboration with the semantics and discourse specialists in the Kolleg, in particular Hans Kamp and Uwe Reyle.

6. Methodological Issues

My approach to addressing the research topics outlined above is generally theory-driven but may be characterized additionally by the following methodological considerations:

Corpus-based methods. Language and speech corpora provide access to real data with realistic frequency distributions, and relevant features can be detected, learned and modeled by means of appropriate empirical and statistical methods, including machine learning (e.g., Prescher, 2002).

Statistical methods. I recognize the necessity to apply sophisticated statistical methods (e.g., van Santen, 1993; Baayen, 2001; Evert and Lüdeling, 2001) that can handle extremely uneven frequency distributions of language and speech events and the resulting sparse data problem (Möbius, 2001b; Möbius, 2003a).

Probabilistic models. I expect probabilistic information to be relevant at virtually all levels of prosodic description: entrenchment (acquisition), productive morphological and phonological processes (production, generation), well-formedness judgments (perception), unit selection (speech synthesis).

Computational models. Quantitative computational models will facilitate the evaluation of hypotheses and assumptions, by way of being implemented and integrated in natural language systems.
 

Stand der Forschung / State of the Art

Exemplar Theory was first introduced in psychology as a model of perception and categorization. Only with some delay was it extended to speech sounds by Johnson (1997) and Lacerda (in press) (cf. related work by Hintzman, 1986, and Goldinger, 1996). Pierrehumbert (2001) demonstrates how Exemplar Theory can provide a way to formalize the detailed phonetic knowledge that native speakers have about the categories of their language. The acquisition of this knowledge can be regarded as the acquisition of a large number of memory traces of speech-based experiences. Pierrehumbert discusses primarily segmental evidence, but there are good reasons to assume that the theory can be generalized and extended to the prosodic domain. For instance, it is posited that
  • each speech-related category is represented in memory by a large cloud of remembered tokens of that category;
  • these memories are organized in a cognitive map;
  • memories decay over time;
  • the parameter space in which the exemplars are represented is granularized (e.g., by "just noticeable differences (JND)");
  • an individual exemplar does not correspond to a single perceptual experience but to an equivalence class of perceptual experiences;
  • each exemplar has an associated strength, or a resting activation level;
  • systematic biases and entrenchment (i.e., decreasing variance by practice) may explain sound changes over time;
  • exemplar clouds are updated when communication is successful.
These assumptions are either compatible with or complementary to the speech production model proposed by Guenther and Perkell and colleagues (Guenther, 1995; Guenther et al., 1998; Perkell et al., 2000; Guenther, 2003). The DFG project "A computational model of target oriented production of prosody" (Dogil and Möbius, 2001c) provides a new paradigm for prosody research. It has aimed at generalizing Guenther and Perkell's speech production model by extending it from a predominantly segmental perspective to a new theory of the production of prosody (Dogil and Möbius, 2001a, 2001b). Speech movements in the prosodic domain are interpreted as intonational gestures that are planned to reach and traverse perceptual target regions. These perceptual targets can be approximately represented by regions in a multidimensional acoustic-temporal space. It is further posited that segmental, spectral, temporal, and prosodic structure are co-produced in such a way as to support and enhance the perceptual targets. The multidimensional target regions may be interpreted to be related to the exemplar clouds posited by Exemplar Theory (Schweitzer and Möbius, 2003, 2004).

Pierrehumbert (2001) explains that the optimal location of a given exemplar prototype may not always be actually represented by an existing exemplar token. Optimal locations may thus represent idealized, abstract prototypes. In my own work on speech synthesis (Möbius, 2001a) I have elaborated the concept of an "ideal point" (originally introduced by van Santen and Möbius; cf. Sproat, 1998), the center of a multidimensional region of pre-defined size. The ideal point serves either as the reference target for the online selection of speech units at synthesis runtime or as a reference for the optimal location of cut and concatenation points in offline acoustic unit inventory construction. In either scenario the size of the region represents the limits of (acoustically, perceptually) acceptable deviations of unit candidates from the ideal target. Selection criteria are multidimensional too: they comprise both spectral and prosodic features and, accordingly, the unit candidates in the speech corpus are annotated with segmental and prosodic feature vectors. Note that prosody is considered as secondary information in state-of-the-art unit selection systems (e.g., Balestri et al., 1999; Beutnagel et al., 1999; Taylor and Black, 1999).

Another interesting property of the speech production model (Guenther et al., 1998) is the dichotomy of phonemic settings and postural settings. In mature speech production auditory feedback has two functions (Boutsen and Christman, 2001). First, it helps maintain phonemic settings, i.e. parameters of phonemic distinctions; second, it assures intelligibility by monitoring the acoustic environment and accommodating the baseline postural settings of the respiratory, laryngeal, and supraglottal systems appropriately. We have suggested that the dichotomy of phonemic and postural settings applies not only to segmental properties of speech but to prosodic features as well (Möbius and Dogil, 2002).

Finally, one answer to the research question of an appropriate representation of prosody at the interface between categorical (symbolic) and continuous (parametric) levels of description may be found by developing computational models of prosody in the framework of natural language systems. Certain speech output generation strategies beyond the classical text-to-speech (TTS) scenario offer rather immediate interfaces between symbolic and acoustic representations of prosody (Batliner and Möbius, 2002). Concept-to-speech (CTS) systems, in particular, provide a direct link between language generation and acoustic-prosodic components. A CTS system has access to the complete linguistic structure of the sentence that is being generated; the system "knows" what to say, and how to render it. The degree of potential improvement to synthesized prosody can be illustrated by manually marking up the text or by providing access to semantic and discourse representations (Prevost and Steedman, 1994).

But even in a TTS scenario it has been demonstrated that models which use rich and detailed prosodic information, for instance accent type labels in addition to accent location alone, can generate intonation contours that are perceptually more acceptable than models which use accent location alone (Syrdal et al., 1998). The problem is that computing from text such detailed prosodic features as accent type is difficult and unreliable, but they may be more readily accessible in different speech generation strategies such as concept-to-speech.

Yet, in CTS systems it is still necessary to specify the mapping from semantic to symbolic features and from categorical symbolic features to continuous acoustic parameters. The issue of how much, and what kind of, information the language generation component should deliver to optimize the two mapping steps (i.e., the definition of a semantics/syntax-prosody interface) is an urgent research topic. Once the two mapping steps are optimized, we may even be able to advance one step further and get rid of the intermediate, phonological representation of prosody (cf. Batliner and Möbius, 2002).
 

Eigene Vorarbeiten / Own Work

Prosody: speech production and internal representations:
- Dogil and Möbius (2001a,b,c), Möbius and Dogil (2002), Schweitzer and Möbius (2003a,b, 2004)

Speech generation and synthesis:
- Möbius (2004b, 2003a, 2001a), Bailly et al. (2003), Campbell et al. (2001)
- Schweitzer et al. (to appear, 2004, 2003)

Prosodic modeling:
- Botinis et al. (2001)
- Batliner and Möbius (in press), Batliner et al. (2001)

Methodological issues:
- Möbius (2003a)
- Müller et al. (2000a,b)
- Lee et al. (in press), Möbius (2004a)
 

Publikationen der letzten 3 Jahre / Publications in the previous 3 years

  • Gerard Bailly, Nick Campbell, Bernd Möbius (2003): "ISCA Special Session: Hot topics in speech synthesis". Proceedings of the European Conference on Speech Communication and Technology (Geneva, Switzerland), 37-40.
  • Batliner, Anton and Bernd Möbius (in press): "Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground?" In William Barry, Wim van Dommelen (eds.), Integration of Phonetic Knowledge in Speech Technology (Kluwer).
  • Botinis, Antonis, Björn Granström and Bernd Möbius (2001): "Developments and paradigms in intonation research". Speech Communication 33 (4), 263-296.
  • Campbell, Nick, Wolfgang Hess, Bernd Möbius and Jan van Santen (2001): "The ISCA Special Interest Group on Speech Synthesis". Proceedings of the European Conference on Speech Communication and Technology (Aalborg, Denmark), vol. 2, 1149-1152.
  • Dogil, Grzegorz and Bernd Möbius (2001a): "Towards a model of target oriented production of prosody". Proceedings of the European Conference on Speech Communication and Technology (Aalborg, Denmark), vol. 1, 665-668.
  • Dogil, Grzegorz and Bernd Möbius (2001b): "Toward a perception based model of the production of prosody". Journal of the Acoustical Society of America 110 (5, Pt. 2), 2737.
  • Dogil, Grzegorz and Bernd Möbius (2001c): "A computational model of target oriented production of prosody". DFG Project.
  • Minkyu Lee, Jan van Santen, Bernd Möbius, Joe Olive (to appear): "Formant tracking using context dependent phonemic information". IEEE Transactions on Speech and Audio Processing.
  • Bernd Möbius (2004a): "Corpus-based investigations on the phonetics of consonant voicing". Folia Linguistica 38 (1-2), 5-26.
  • Bernd Möbius (2004b): "Sprachsynthesesysteme". In Kai-Uwe Carstensen et al. (eds.), Computerlinguistik und Sprachtechnologie: Eine Einführung (Spektrum Akademischer Verlag, Heidelberg), 2. Auflage (1. Auflage 2001), 517-523.
  • Möbius, Bernd (2003a): "Rare events and closed domains: Two delicate concepts in speech synthesis". International Journal of Speech Technology 6 (1), 57-71.
  • Bernd Möbius (2003b): "Gestalt psychology meets phonetics - An early experimental study of intrinsic F0 and intensity". Proceedings of the 15th International Congress of Phonetic Sciences (Barcelona), 2677-2680.
  • Möbius, Bernd (2001a): German and Multilingual Speech Synthesis. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (Univ. Stuttgart), AIMS 7 (4).
  • Möbius, Bernd and Grzegorz Dogil (2002): "Phonemic and postural effects on the production of prosody". In Bernard Bel and Isabelle Marlien (eds.), Proceedings of the Speech Prosody 2002 Conference (Aix-en-Provence, Laboratoire Parole et Langage), 523-526.
  • Antje Schweitzer, Norbert Braunschweiler, Grzegorz Dogil, Tanja Klankert, Bernd Möbius, Gregor Möhler, Edmilson Morais, Bettina Säuberlich, Matthias Thomae (to appear): "Multimodal speech synthesis". In Wolfgang Wahlster (ed.), SmartKom - Foundations of Multimodal Dialogue Systems (Springer).
  • Antje Schweitzer, Norbert Braunschweiler, Grzegorz Dogil, Bernd Möbius (2004): "Assessing the acceptability of the SmartKom speech synthesis voices". Proceedings of the 5th ISCA Speech Synthesis Workshop (Pittsburgh, PA), 1-6.
  • Antje Schweitzer, Bernd Möbius (2004): "Exemplar-based production of prosody: Evidence from segment and syllable durations". Proceedings of the Speech Prosody 2004 Conference (Nara, Japan), 459-462.
  • Antje Schweitzer, Bernd Möbius (2003a): "Temporal constraints on the production of frequent and infrequent syllables". 6th International Seminar on Speech Production (Sydney).
  • Antje Schweitzer, Bernd Möbius (2003b): "On the structure of internal prosodic models". Proceedings of the 15th International Congress of Phonetic Sciences (Barcelona), 1301-1304.
  • Antje Schweitzer, Norbert Braunschweiler, Tanja Klankert, Bettina Säuberlich, Bernd Möbius (2003): "Restricted unlimited domain synthesis". Proceedings of the European Conference on Speech Communication and Technology (Geneva, Switzerland), 1321-1324.

Themen geplanter Dissertationsprojekte / Planned PhD Dissertation Projects

  • Exemplar-based production of prosody
  • Acquisition of prosody
  • Perception of prosody
  • Phonemic vs. postural effects in prosody
  • Temporal representations and structures in prosody
  • Generation and synthesis of prosody in a multimodal dialog system
  • Granularity of prosodic information in a concept-to-speech system

Verzahnung innerhalb des Kollegs / Links to Other Parts of the Graduiertenkolleg

Thematic relations within the Kolleg:
  • Dogil (Prosody, Phonetics, Speech)
  • Heid (Morphosyntax, Productivity)
  • von Heusinger (Information Structure)
  • Kamp (Semantics, Generation)
  • Reyle (Semantics, Discourse)
  • Rohrer (Syntax, Lexicon)
  • Schütze (Exemplar Theory)
Interaction with external scientists:
  • Anton Batliner (Univ. Erlangen-Nürnberg; Prosodic models
  • David House (KTH Stockholm; Prosody perception)
  • Chilin Shih (Bell Labs; Prosody production)
  • Jan van Santen (OGI/CSLU; Prosodic models)
  • Doug Whalen (Haskins Labs; Internal representations of prosody)

Literatur / References

  • Alter, Kai, Hannes Pirker and Wolfgang Finkler (eds., 1997): Concept to Speech Generation Systems - Proceedings of a Workshop in conjunction with 35th Annual Meeting of the Association for Computational Linguistics (Madrid, Spain).
  • Baayen, Harald (2001): Word Frequency Distributions (Kluwer, Dordrecht).
  • Balestri, Marcello, Alberto Pacchiotti, Silvia Quazza, Pier Luigi Salza and Stefano Sandri (1999): "Choose the best to modify the least: a new generation concatenative synthesis system". Proceedings of the European Conference on Speech Communication and Technology (Budapest, Hungary), vol. 5, 2291-2294.
  • Beutnagel, Mark, Mehryar Mohri and Michael Riley (1999): "Rapid unit selection from a large speech corpus for concatenative speech synthesis". Proceedings of the European Conference on Speech Communication and Technology (Budapest, Hungary), vol. 2, 607-610.
  • Boutsen, Frank R. and Sarah S. Christman (2001): "Aprosodia: whether, where and why". In Maassen, Ben, Wouter Hulstijn, Ray D. Kent, Herman F.M. Peters and Pascal H.M.M. van Lieshout (eds.), 4th International Speech Motor Conference (Nijmegen), 232-236.
  • Dogil, Grzegorz (2000): "Understanding prosody". In Rickheit, Gert, T. Hermann and W. Deutsch (eds.), Psycholinguistics - An International Handbook (de Gruyter, Berlin).
  • Evert, Stefan and Anke Lüdeling (2001): "Measuring morphological productivity: Is automatic preprocessing sufficient?" Proceedings of Corpus Linguistics 2001 (Lancaster, UK).
  • Goldinger, S.D. (1996): "Words and voices: Episodic traces in spoken word identification and recognition memory". Journal of Experimental Psychology: Learning, Memory, and Cognition 22, 1166-1183.
  • Guenther, Frank H. (1995): "A modeling framework for speech motor development and kinematic articulator control". Proceedings of the 13th International Congress of Phonetic Sciences (Stockholm), vol. 2, 92-99.
  • Guenther, Frank H. (2003): "Neural control of speech movements". In Schiller, Niels O. and Meyer, Antje S. (eds.), Phonetics and Phonology in Language Comprehension and Production (Mouton de Gruyter, Berlin), 209-239.
  • Guenther, Frank H., Michelle Hampson and Dave Johnson (1998): "A theoretical investigation of reference frames for the planning of speech movements". Psychological Review 105, 611-633.
  • Hintzman, D.L. (1986): "`Schema abstraction' in a multiple-trace memory model". Psychological Review 93, 328-338.
  • Jilka, Matthias (2000): The contribution of intonation to the perception of foreign accent. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (Univ. Stuttgart), AIMS 6 (3).
  • Johnson, Keith (1997): "Speech perception without speaker normalization: An exemplar model". In Johnson, Keith and John W. Mullennix (eds.), Talker Variability in Speech Processing (Academic Press, San Diego), 145-166.
  • Jones, Jeffery A. and Kevin G. Munhall (2000): "Perceptual calibration of F0 production: Evidence from feedback perturbation". Journal of the Acoustical Society of America 108 (3), 1246-1251.
  • Lacerda, Francisco (in press): "Distributed memory representations generate the perceptual-magnet effect". Journal of the Acoustical Society of America.
  • Levelt, Willem J.M. (1989): Speaking: From Intention to Articulation (MIT Press, Cambridge, MA).
  • Levelt, Willem J.M. (1999): "Producing spoken language: a blueprint of the speaker". In Brown, Colin M. and Peter Hagoort (eds.), The Neurocognition of Language (Oxford University Press, Oxford), 83-122.
  • Perkell, Joseph S., Frank H. Guenther, Harlan Lane, Melanie L. Matthies, Pascal Perrier, Jennell Vick, Reiner Wilhelms-Tricarico and Majid Zandipour (2000): "A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss". Journal of Phonetics 28 (3), 233-272.
  • Pierrehumbert, Janet (2001): "Exemplar dynamics: Word frequency, lenition and contrast". In Bybee, Joan and Paul Hopper (eds.), Frequency and the Emergence of Linguistic Structure (Benjamins, Amsterdam).
  • Prescher, Detlef (2002): EM-basierte maschinelle Lernverfahren für natürliche Sprachen. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (Univ. Stuttgart), AIMS 8 (2).
  • Prevost, Scott and Mark Steedman (1994): "Specifying intonation from context for speech synthesis". Speech Communication 15 (1-2), 139-153.
  • Sproat, Richard (ed.) (1998): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach (Kluwer, Dordrecht).
  • Syrdal, Ann, Gregor Möhler, Kurt Dusterhoff, Alistair Conkie and Alan Black (1998): "Three methods of intonation modeling". Proceedings of the Third International Workshop on Speech Synthesis (Jenolan Caves, Australia), 305-310.
  • Teich, Elke, Eli Hagen, Brigitte Grote and John Bateman (1997): "From communicative context to speech: Integrating dialogue processing, speech production, and natural language generation". Speech Communication 21, 73-99.
  • Taylor, Paul and Alan W. Black (1999): "Speech synthesis by phonological structure matching". Proceedings of the European Conference on Speech Communication and Technology (Budapest, Hungary), vol. 2, 623-626.
  • van Santen, Jan P.H. (1993): "Exploring N-way tables with sums-of-products models". Journal of Mathematical Psychology 37 (3), 327-371.
Letzte Änderung: 22.12.2004 (bm)