Damir Cavar Profile Picture

Damir Cavar

  • dcavar@indiana.edu
  • Ballantine Hall, Room 850 1020 E. Kirkwood Ave
  • Home Website
  • Associate Professor
    Computational Linguistics

Research interests

  • I am working in linguistics for the last 15 years. My work was in the area of theoretical linguistics, as well as in corpus and computational linguistics, speech and language technology, and psycholinguistics. After my PhD I focused mainly on speech and language technologies, corpora and NLP. In 2010 my wife and I got positions at the University of Konstanz, and from there moved to Michigan, taking over the Institute for Language Information and Technology that hosted The LINGUIST List and various research projects. In 2014 we successfully relocated to Indiana University. Most of my work on language and computation is related to language-related Cognitive Science research, and as for the computational and corpus work, it at least facilitates it. My current projects include: The implementation of an LFG-parser (Lexical Functional Grammar); the implementation of a Parallel Processing Parser as described in Jackendoff’s “The Architecture of the Language Faculty”; the development of a speech corpus infrastructure and speech corpora from as many languages as possible, enough to generate not only acoustic and language models for Forced Aligners and Speech Recognizers, but also for the generation of part-of-speech tagged corpora, parsed and syntactically, semantically and pragmatically annotated corpora, see http://gorilla.linguistlist.org/about/; various other projects relating to visualization of linguistic theories, geographical mapping of language data and information, data and technologies for low-resourced and under-documented languages.

Professional Experience

  • since 2016 Associate Professor of Computational Linguistics, Department of Linguistics, Indiana University.
  • since 2014 Co- Director of The LINGUIST List at Indiana University.
  • 2014-2016 Associate Research Scientist, Computational Linguistics, at Indiana University.
  • 2013-2014 Director of the Institute for Language Information and Technology at Eastern Michigan University.
  • 2011-2014 Assistant and Associate Professor (tenured and quit 2014), Computational Linguistics, at Eastern Michigan University.
  • since 2006 Adjunct Assistant Professor of Computational Linguistics in the Department of Linguistics at Indiana University.
  • 2011 Adjunct Affiliate Researcher at the Institute of Croatian Language and Linguistics (IHJJ).
  • 2010-2011 Substituting Professor of Computational Linguistics in the Linguistics Department at the University of Konstanz (Germany), temporary one-year professorship.
  • 2009-2011 Head of the branch office of the Institute of Croatian Language and Linguistics in Zadar (with its main institute in Zagreb)
  • 2009-2010 Acting Chair of the Linguistics Department at the University in Zadar
  • 2007-2008 Acting Chair of the English Department at the University in Zadar
  • 2005-2011 Assistant Professor of Computational Linguistics at the University in Zadar, Linguistics Department (English Department till 2008)
  • 2006-2009 Head of the department for theoretical and computational linguistics at the Institute of Croatian Language and Linguistics in Zagreb
  • 2003-2006 Director of the Computational Linguistics Program in the Department of Linguistics at Indiana University
  • 2003-2006 Assistant Professor of (Computational) Linguistics in the Department of Linguistics at Indiana University
  • 2002-2003 Visiting Assistant Professor in the Department of Linguistics at Indiana University
  • 2001-2002 IT Architect, Project Manager and Researcher in the Research & Innovations group of the IS-STA (Dresdner Bank AG and Allianz AG) in Frankfurt a.M. (Germany)
  • 2000-2001 Researcher (PostDoc) at the Berlin-Brandenburgian Academy of Science, Project DWDS, PI: Prof.Dr. Manfred Bierwisch and Prof.Dr. Wolfgang Klein
  • 2000 Researcher (PostDoc), Department of Computer Science at the Technical University in Berlin, KIT Group, Project Verbmobil
  • 1999 Researcher at the University of Potsdam, Linguistics Department, in the interdisciplinary project “Formale Modelle Kognitiver Komplexität” (Formal Models of Cognitive Complexity), WP PI: Prof.Dr. Gisbert Fanselow
  • 1998-1999 Researcher, Computer Science at the University in Hamburg, Natural Language Systems Group (NATS), PI: Prof.Dr. Walther von Hahn and Prof.Dr. Wolfgang Menzel, Project Verbmobil
  • 1994-1998 Researcher, Berlin-Brandenburgian Academy of Science, Project RULE (Rule Learning and Rule Knowledge), PI: Prof.Dr. Jürgen Weissenborn and Prof.Dr. Angela Friederici
  • 1994-1995 Researcher at the Innovationskolleg and Research Project “Formale Modelle Kognitiver Komplexität” (Formal Models of Cognitive Complexity), Linguistics Department, University of Potsdam, WP PI: Prof.Dr. Gisbert Fanselow

Representative publications

Distributed deletion (2002)
Gisbert Fanselow and Damir Cavar
Theoretical approaches to universals, 65-107

DPs and PPs often surface in a discontinuous manner. Standard Wh—movement extracts constituents out of DPs and PPs (1). Quantifiers may appear to the right of the DP which they modify semantically (as in (2)). According to Sportiche (1988), this construction emerges by the stranding of the quantifier when DP moves to Spec, IP. Whether “extraposition from NP” in (3) involves rightward movement depends on the status of the antisymmetry hypothesis (Kayne 1994; Chomsky 1995), but independent considerations may militate against a rightward movement explanation as well (see Culicover 8r Rochemont 1990). Noun incorporation also gives rise to discontinuous noun phrases, as (4) illustrates for Greenlandic. Finally, DPs and PPs may simply be ‘split’in a considerable number of languages such as German, Croatian, Polish, Russian, Hungarian, Finnish, Latin, Ancient Greek, and Warlpiri, as (5) and (6) illustrate.

Long head movement? Verb movement and cliticization in Croatian (1994)
Chris Wilder and Damir Ćavar
Lingua, 93 (1), Jan-58

Croatian, along with other Slavic, and some Romance languages, has type of verb-fronting — the ‘Long Head Movement’ (LHM) construction — in which a non-finite verb raises to C<sup>0</sup> across a finite auxiliary, in apparent contravention of the Head Movement Constraint. Rivero and Roberts have each argued on the basis of LHM-data that the Minimality condition of the ECP should permit one head to cross another under certain conditions. A detailed investigation of the properties of LHM in Croatian shows that the conclusions drawn by these authors with respect to the ECP are inadequate in two respects. Firstly, the structural analysis for LHM sentences in Croatian assumed by Rivero and by Roberts is incorrect. Clitics in Croatian are syntactically enclitic, right-adjoining to C<sup>0</sup>. The auxialiary involved in LHM-constructions is a clitic form, hence is located in C<sup>0</sup> at S-structure, and not lower down in the clause. Since the …

Remarks on the economy of pronunciation (2001)
Gisbert Fanselow and Damir Cavar
Competition in syntax, 107-150

The idea that syntactic movement is composed of two steps, a copying operation followed by a deletion operation (the C&amp;D-theory of movement)–as illustrated in (1)—has again become popular with the rise of the Minimalist Program (Chomsky 1993). In one of the straightforward extensions of the C&amp;D-approach, at least certain instances of so-called covert movement arise from the overt copying of a full phrase before SPELLOUT, followed by the deletion of the higher rather than the lower copy—an assumption that implies that spellout conventions regulate whether the target or the source position of the copying operation is realized phonetically (see, eg, Bobaljik 1995, Groat &amp; O'Neil 1996, Pesetsky 1997, 1998a, Roberts 1997 (for head movement), Sabel 1998, among others)–as illustrated in (2) for Chinese.

Word order variation, verb movement, and economy principles (1994)
Chris Wilder and Damir Ćavar
Studia linguistica, 48 (1), 46-86

In Chomsky's Minimalist framework, word order variation reflects different movement options arising from interaction between parametrized morphological properties of functional items and invariant economy principles. In the simplest case, languages vary in whether a given movement (e. g. V‐to‐I‐raising) feeds PF (French) or not (English). We consider more complex cases involving language‐internal word order differences due to construction‐specific movement‐Germanic Verb‐Second, Last Resort Verb fronting in Croatian‐showing how they can be explained using the concept of Early Altruism latent in Chomsky's model. We further propose that (i) not only morphological, but also purely phonological properties of lexical items can trigger movement (clitics in Croatian); (ii) both finite and non‐finite verbs raise to C in LF; (iii) English do‐support is not a Last Resortoperation: instead English ‘simple’ tenses are …

``Clitic Third''in Croatian (1993)
Damir Cavar and Chris Wilder

this paper, we are mainly concerned with more complex environments in which clitics do not appear in second, but in third position (hence the title), or somewhere further into the clause. In the work cited, we were concerned with one particular aspect of the clitic second phenomenon: the way that it interacts with verb movement. As illustrated in the paradigm (13), a verb may precede clitics in its clause only when no other constituent precedes the clitics (throughout this paper, clitics are marked in bold type):(1) a. Ivan ga je# esto# itao.

Long Head Movement?: Verb-movement and Cliticization in Croatian (1994)
Damir Ćavar and Chris Wilder
Johann Wolfgang Goethe-Universität Frankfurt aM, Institut für Deutsche Sprache und Literatur II. (7),

In this paper, we investigate the syntax of clitics and verb-movement in Croatian, a language in which these two aspects of grammar interact in intriguing fashion. Our discussion assumes the principle-and-parameters model of Chomsky (1989) etc., and is set in the context of recent discussion of Long Head Movement phenomena. 1 In a number of papers (Lema &amp; Rivero (1989a, 1989b), Rivero (1988, 1990, 1991)), Rivero and Lema discuss a type of verb-fronting found in several Romance and Slavic languages which represents an apparent counterexample to the Head Movement Constraint (HMC). This construction, known as the Long Head

Optimal parsing: Syntactic parsing preferences and optimality theory (1999)
Gisbert Fanselow, Matthias Schlesewsky, Damir Cavar and Reinhold Kliegl

Principled accounts of syntactic parsing like the garden path theory (Frazier 1978, see Frazier &amp; Clifton 1996 for further references) have always presupposed that the heuristic strategies that characterize the behavior of the human parser in eg the case of a local ambiguity are nothing but descriptive characterizations of more profound factors that come into play in parsing. Among the candidates for such factors are the limited capacity of the working memory (eg, Frazier 1987), interference (eg Lewis 1993), the limited" window" size for the initial steps in parsing (eg, Fodor 1998), or speed differences among competing analyses (eg, Frazier &amp; Fodor 1978). In the tuning approach (Cuetos &amp; Mitchell 1985), on the other hand, parsing principles have an independent status of their own, and reflect the responsiveness of the parsing algorithm to frequency differences in the input.The quite different view that heuristic parsing strategies reflect the influence of the principles of grammar (Pritchett 1992, Gorrell 1995, Phillips 1996) has received less attention and support in the past. Part of the reason for this may lie in the widespread yet incorrect conviction that the impossibility of identifying the parser with the grammar has been established in the seventies, with the failure of the'Derivational Theory of Complexity'(see Fodor, Bever &amp; Garrett 1974, Pritchett &amp; Whitman 1995, Phillips 1996 for a discussion). Indeed, as we will briefly discuss below, most models of grammar cannot be applied directly in the context of left-to-right incremental parsing.

Discontinuous constituents in Slavic and Germanic languages (2000)
Damir Ćavar and Gisbert Fanselow
University of Hamburg and University of Potsdam,

Children's sensitivity to word-order violations in German: Evidence for very early parameter-setting (1998)
Jürgen Weissenborn, Barbara Höhle, Dorothea Kiefer and Damir Ćavar

Statistika bibliografskih podataka o projektima, znanstvenicima i znanstvenim institucijama.

Split constituents in Germanic and Slavic (1997)
Damir Ćavar and Gisbert Fanselow
International Conference on Pied-Piping, Friedrich-Schiller University, Jena,

On cliticization in Croatian: Syntax or prosody? (1996)
Damir Cavar

In the following paper it will be argued that the phonological approach to clitic placement in Serbian/Croatian, as proposed in Zec &amp; Inkelas (1990), not only fails to explain the observed phenomena, but also fails at the level of descriptive adequacy. Further arguments are presented against accounts which claim that clitic placement is syntactic and which utilize a post-syntactic operation of Prosodic Inversion (PI) in order to explain certain cases of apparent split of syntactic constituents (Halpern, 1992; Schütze, 1994). It will be argued that an alternative analysis which assumes syntactic clitic placement as proposed in Wilder &amp; Cavar (1994) and Cavar &amp; Wilder (1994) appears to be descriptively adequate.

Aspects of the syntax-phonology interface (1999)
Damir Ćavar

In this dissertation we discuss formal theoretical and empirical concepts of the relation between syntax and phonology, focusing on empirical data that is subject to both levels of representation.

Clitic third in Croatian (1999)
Damir Ćavar and Chris Wilder
Clitics in the languages of Europe. Berlin: de Gruyter, 429-467

The rich system of clitics, and the “clitic second” effect which shows up in simple main clauses, are two conspicuous features of Croatian. In previous work

Learning Arabic morphology using statistical constraint-satisfaction models (2007)
Paul Rodrigues and Damir Cavar
AMSTERDAM STUDIES IN THE THEORY AND HISTORY OF LINGUISTIC SCIENCE SERIES 4, 289 63

Arabic words are constructed by a root and pattern-based morphological system, where the root represents a semantic field and the pattern represents syntactic information, such as voice, transitivity, or intensity. There are over 5,000 Arabic roots, which can be 3, 4, or 5 characters in length, with the shorter roots being the more common. The 3, 4, and 5 character roots each have different pattern systems. For example, McCarthy (1979) contains a table showing 72 patterns for triliteral roots, and 24 patterns for quadriliteral roots. Root morphemes occur with varying degrees of regularity. Sound roots are the most perfect, with the three radicals of the root appearing in the surface form of the word. Doubly weak roots are the least perfect,

Alignment based induction of morphology grammar and its role for bootstrapping (2004)
Damir Ćavar, Joshua Herring, Toshikazu Ikuta, Paul Rodrigues and Giancarlo Schrementi

Diff erent Alignment Based Learning (ABL) algorithms have been proposed for unsupervised grammar induction, e. g. Zaanen (2001) and Dejean (1998), in particular for the induction of syntactic rules. However, ABL seems to be better suited for the induction of morphological rules. In this paper we show how unsupervised hypothesis generation with ABL algorithms can be used to induce a lexicon and morphological rules for various types of languages, e. g. agglutinative or polysynthetic languages. The resulting morphological rules and structures are optimized with the use of confl icting constraints on the size and statistical properties of the grammars, i. e. Minimium Description Length and Minimum Relative Entropy together with Maximum Average Mutual Information. Further, we discuss how the resulting (optimal and regular) grammar can be used for lexical clustering/classifi cation for the induction of syntactic (context free) rules.

Edit your profile