Malgorzata Cavar Profile Picture

Malgorzata Cavar

  • mcavar@indiana.edu
  • Home Website
  • Associate Professor
    Linguistics

Research interests

  • Slavic and low-resourced languages
  • Theoretical phonology
  • Second language phonology
  • Laboratory phonology
  • Speech corpora

Representative publications

Palatalization in Polish: An Interaction of Articulatory and Perceptual Factors (2004)
Malgorzata Ewa Ćavar

The present dissertation studies palatalization as an effect of the interaction of a set of articulatory and auditory factors. A functional approach has been adopted with its basic claims that the shape of a language is determined by two tendencies: first, to minimize the effort of the speaker, that is, to simplify the articulation, and second, to minimize the effort of the listener, i.e. to maximize the distinctiveness of the units of language, cf. Passy (1891), Martinet (1955), Lindblom (1986), Flemming (1995), Boersma (1998). The attempt was to identify different articulatory and auditory factors in palatalization processes within the system of one language, that is, Polish, and to offer an explanatory account of the processes in Polish. The other goal was to offer adequate formal means for such an analysis.

Endangered language documentation: Bootstrapping a Chatino speech corpus, forced aligner, ASR (2016)
Malgorzata Cavar, Damir Ćavar and Hilaria Cruz
4004-4011

This project approaches the problem of language documentation and revitalization from a rather untraditional angle. To improve and facilitate language documentation of endangered languages, we attempt to use corpus linguistic methods and speech and language technologies to reduce the time needed for transcription and annotation of audio and video language recordings. The paper demonstrates this approach on the example of the endangered and seriously under-resourced variety of Eastern Chatino (CTP). We show how initial speech corpora can be created that can facilitate the development of speech and language technologies for under-resourced languages by utilizing Forced Alignment tools to time align transcriptions. Time-aligned transcriptions can be used to train speech corpora and utilize automatic speech recognition tools for the transcription and annotation of untranscribed data. Speech technologies can be used to reduce the time and effort necessary for transcription and annotation of large collections of audio and video recordings in digital language archives, addressing the transcription bottleneck problem that most language archives and many under-documented languages are confronted with. This approach can increase the availability of language resources from low-resourced and endangered languages to speech and language technology research and development.

Allophonic variation of Polish vowels in the context of prepalatal consonants (2017)
Malgorzata E Cavar, Steven M Lulich and Max Nelson
Proceedings of Meetings on Acoustics 173EAA, 30 (1), 60007

Phonetic studies of Polish mention allophonic variation in Polish vowels: there is a systematic effect of fronting and/or raising in the prepalatal consonant context (Sawicka 1995, Wiśniewski 1997). Additionally, phonemic [i] is excluded after non-palatalized consonants, and the phonemic [ɨ] does not occur after prepalatals. In other words, a subset of vowel realizations is bound to the context of prepalatal consonants. No X-ray images are available of the contextual variants of vowels adjacent to prepalatal consonants. In this study, we present 3-D tongue shapes of the vowels in the neutral and prepalatal context. All participants in the study show, indeed, variation in the vowel articulation: one can observe some degree of fronting and/or raising of the tongue body combined with the consistent advancement of the tongue root for the vowels in the context of prepalatals - as opposed to the vowels in the neutral (non …

[ATR] in Polish (2007)
Małgorzata E Ćavar
Journal of Slavic Linguistics, 207-228

The feature [ATR] is usually used exclusively for the description of vowels. In this article, it is argued that phonotactic constraints in Polish indicate that [ATR] may be a useful dimension in the description of consonants. Under this assumption we are able to offer a straightforward and phonetically motivated account of the discussed phonotactic constraints and relate them to palatalization processes in Polish. The consequence of the assumption that [ATR] is a consonantal dimension is a reanalysis of some palatalization processes in terms of [ATR] and the identification of the need for a new typology of palatalization processes.

Language-specific differences in the weighting of perceptual cues for labiodentals (2010)
Silke Hamann, Paul Boersma and Małgorzata Ćavar
Proceedings of New Sounds 2010, 167-172

Cross-language perception provides insight into the use of perceptual cues to native segments and their application to segments in a different language. In the present study we test the perception of the three Dutch labiodentals/f, v, ʋ/by listeners of German, English, Croatian and Polish in a forced-choice identification task. We test whether the perceptual boundaries on the auditory dimensions of harmonics-to-noise ratio and duration are more similar for listeners from the same language family (German and English versus Croatian and Polish) or whether these boundaries are more similar for listeners with the same number of labial categories in their native languages (German and Croatian with four labials versus English and Polish with five). Our findings show that the same number of labial categories results in similar perceptual boundaries along the two auditory dimensions, and that language family does not influence the location of the boundaries.

Alternating environment in the analysis of derived environment effects (2005)
Małgorzata Ćavar
IULC Working Papers, 5 (1),

In this article it is argued that the so-called derived environment effects can be explained in terms of the alternating environment without reference to the notion of derivation. The notion of the alternating environment refers to the surface sequences of segments across the paradigm, and goes back to Timberlake (1972). In the article, earlier OT approaches are discussed. It is argued that the solution based on the assumption that the stem-particular faithfulness constraints are ranked over general faithfulness constraints, as proposed for example in Pater (1999), cannot be applied to Polish because it is stem consonants that undergo palatalization (and are unfaithful). Another approach, by Lubowicz (1998, 2002) relying on the violation of the local conjunction of the constraint inducing palatalization with Anchor constraint (correspondence of the right-most segment of the stem in the input to the rightmost segment of a syllable in the output) does not cover the whole range of palatalization effects in Polish (as well as in Slovak and Basque). Thus, it is argued that the solution in terms of the alternating environment offers a more general solution to the derived environment effects.

Three-dimensional ultrasound images of Polish high front vowels (2017)
Steven M Lulich, Malgorzata E Cavar and Max Nelson
Proceedings of Meetings on Acoustics 173EAA, 30 (1), 60006

The 3-D ultrasound method has been applied to collect data on Polish high front vowels. In particular, Polish has one unambiguous high front vowel and another one that in the phonological literature is variously referred to as high central or back unrounded and transcribed as [ɨ]. While there exists a sizeable body of articulatory research on Polish, including X-rays from as early as the 50s and 60s, the ultrasound data reveal more detail about the position of the tongue center and tongue root. The data evaluated so far support the view that the vowel transcribed as [ɨ] is a front vowel. The two front vowels differ in the position of the tongue root, relative raising of the tongue, and extent of lip gesture, but do not differ substantially with regard to tongue body advancement on the font-back axis. The data also capture the temporal aspect of speech, and together with time-aligned audio recordings and video recordings of …

Merger of the place contrast in the posterior sibilants in Croatian (2011)
Małgorzata Ćavar

The paper investigates the reasons behind the merger of the place contrast in posterior sibilants in Croatian, ie of/č/and/ć/, and on the other hand, of/dž/and/đ/. 1 It is argued that systemic factors such as inventory density are not a sufficient trigger for merger. On the other hand, acoustic variation in the realization of the categories may lead to merger. This approach is formalized in terms of Functional Phonology (Boersma 1998).

Phonemes, features and allophones in L2 phonology. Polish sibilants in Croatian ears (and brains) (2011)
Malgorzata Ewa Ćavar and Silke Hamann
Cambridge Scholars Publishing.

The article presents the preliminary results of a study of the perception of foreign phonemes, in particular, Polish phonemes by naï ve Croatian native speakers. Our experiments have shown that listeners identify new, unknown L2 phonemes without prior training if the phonemes utilize the phonological distinctions (distinctive features) present in other phonemes of L1. The results indicate that subjects can extrapolate the knowledge of phonological distinction across different phonemes and that the basic unit of the acquisition of L2 phonology is the feature rather than the phoneme.

Phonemic IPA transcription and syllabification for Croatian (2010)
Damir Ćavar and Malgorzata Ćavar

Statistika bibliografskih podataka o projektima, znanstvenicima i znanstvenim institucijama.

Auditory factors in the emergence of prepalatal affricates in Polish (2004)
Malgorzata Cavar
Working papers in Slavic studies: Proceedings of the first graduate colloquium on Slavic linguistics, 3 20-Jan

The idea that auditory factors play a role in phonology besides articulation has gained more and more attention in recent years (cf. Steriade, eg 1997, 2001, Flemming, 1995/2002, Boersma 1998, Padgett 2001a, 2001b, Hume and Johnson 2001, and references therein; NiChiosain and Padgett 2001, Ćavar 2003, and many others). The standard assumption for Polish so far has been that the emergence of prepalatal affricates is of articulatory nature and accounts have been offered in terms of articulatory features (eg Rubach 1984, Szpyra 1995, Ćavar 1997). I will argue that, though articulatory factors may play here some role, the driving force is of auditory background. Two parameters will be investigated separately, namely, place of articulation, and stridency. Arguments from the typology of consonantal inventories, and from the phonology of standard Polish and Polish dialects will be presented to support this view.

Croatian language corpus Riznica 0.1 (2018)
Dunja Brozović Rončević, Damir Ćavar, Małgorzata Ćavar, Tomislav Stojanov, Kristina Štrkalj Despot, Nikola Ljubešić ...
Institute of Croatian Language and Linguistics.

Description The Croatian Language Corpus was built between 2007 and 2011 at the Institute of Croatian Language and Linguistics in the scope of the research programme" Hrvatska jezična riznica" as a reference corpus of Croatian language to serve various lexicographic and other linguistic and language technology projects. The corpus consists of 28% of fiction texts and 72% of specialized texts. In 2017, the corpus was segmented, part-of-speech tagged and lemmatized inside the MREŽNIK project to be used for the development of the first Croatian corpus-based dictionary.

Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project (2016)
Malgorzata Cavar, Damir Ćavar, Dov-Ber Kerler and Anya Quilitzsch
4688-4693

To create automatic transcription and annotation tools for the AHEYM corpus of recorded interviews with Yiddish speakers in Eastern Europe we develop initial Yiddish language resources that are used for adaptations of speech and language technologies. Our project aims at the development of resources and technologies that can make the entire AHEYM corpus and other Yiddish resources more accessible to not only the community of Yiddish speakers or linguists with language expertise, but also historians and experts from other disciplines or the general public. In this paper we describe the rationale behind our approach, the procedures and methods, and challenges that are not specific to the AHEYM corpus, but apply to all documentary language data that is collected in the field. To the best of our knowledge, this is the first attempt to create a speech corpus and speech technologies for Yiddish. This is also the first attempt to work out speech and language technologies to transcribe and translate a large collection of Yiddish spoken language resources.

Roadrunners and Eagles (2013)
Linda Shockey and Małgorzata E Ćavar
Research in Language, 11 (1), 97-102

Our previous research on perception of gated casual English by university students suggests that ceteris paribus, Polish students are much more accurate than Greeks. A recent pilot study of casually-spoken Polish leads us to the conclusion that many shortcuts found in English are also common in Polish, so that similar perceptual strategies can be used in both languages, though differing in detail. Based on these preliminary results, it seems likely that perceptual strategies across languages tend towards the “eagle” approach-where a birds-eye view of the acoustic terrain without too much emphasis on detail is found-or the “roadrunner” approach, where phonetic detail is followed closely. In the former case, perceivers adjust easily to alternation caused by casual speech phonology while in the latter, perceivers expect little variation and possibly even find it confusing. Native speakers of Greek are “roadrunners …

Inducing recursion (2007)
Damir Ćavar and Malgorzata Ewa Ćavar

In this paper we defend the hypothesis that recursion as such is a side effect of structure discovery strategies, rather than a core concept of either the FLN or FLB (Hauser et al. 2002). Independent of theoretical standpoint, e.g. nativist, connectionist, or empiricist, simple guided learning strategies have the capability of generating (left- or right-linear, or context free) rule sets that are recursive.

Edit your profile