Chung-chieh Shan Profile Picture

Chung-chieh Shan

  • ccshan@indiana.edu
  • Luddy Hall (700 N. Woodlawn Ave) 3018
  • (812) 856-4400
  • Home Website
  • Associate Professor
    School of Informatics and Computing

Field of study

  • Artificial Intelligence
  • Programming Languages

Education

  • Ph.D., Computer Science at Harvard University, 2005
  • BA, Mathematics at Harvard University, 1999

Professional Experience

  • Assistant Professor, School of Informatics and Computing, Indiana University (2013–2019)
  • Researcher, Department of Computer Science, University of Tsukuba (Spring 2012)
  • Visiting Assistant Professor, Department of Linguistics, Cornell University (Fall 2011)
  • Assistant Professor, Department of Computer Science and Center of Cognitive Science, Rutgers University (2005–2011)

Awards

  • Best paper (with Oleg Kiselyov) 2009, working conference on domain-specific languages
  • Beth dissertation award 2006, Association for Logic, Language and Information
  • Best paper (with Balder D. ten Cate) 2002, ESSLLI student session
  • First place (with Dylan P. Thurston) 2001, ACM ICFP programming contest

Representative publications

From high-level inference algorithms to efficient code (2019)
Rajan Walia, Praveen Narayanan, Jacques Carette, Sam Tobin-Hochstadt, and Chung-chieh Shan
Proceedings of the ACM on Programming Languages, 3 (ICFP),

Probabilistic programming languages are valuable because they allow domain experts to express probabilistic models and inference algorithms without worrying about irrelevant details. However, for decades there remained an important and popular class of probabilistic inference algorithms whose efficient implementation required manual low-level coding that is tedious and error-prone. They are algorithms whose idiomatic expression requires random array variables that are latent or whose likelihood is conjugate. Although that is how practitioners communicate and compose these algorithms on paper, executing such expressions requires eliminating the latent variables and recognizing the conjugacy by symbolic mathematics. Moreover, matching the performance of handwritten code requires speeding up loops by more than a constant factor. We show how probabilistic programs that directly and concisely express these desired inference algorithms can be compiled while maintaining efficiency. We introduce new transformations that turn high-level probabilistic programs with arrays into pure loop code. We then make great use of domain-specific invariants and norms to optimize the code, and to specialize and JIT-compile the code per execution. The resulting performance is competitive with manual implementations.

Exact Bayesian inference by symbolic disintegration (2017)
Chung-chieh Shan and Norman Ramsey
Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, 52 (1), 130-144

Bayesian inference, of posterior knowledge from prior knowledge and observed evidence, is typically defined by Bayes's rule, which says the posterior multiplied by the probability of an observation equals a joint probability. But the observation of a continuous quantity usually has probability zero, in which case Bayes's rule says only that the unknown times zero is zero. To infer a posterior distribution from a zero-probability observation, the statistical notion of disintegration tells us to specify the observation as an expression rather than a predicate, but does not tell us how to compute the posterior. We present the first method of computing a disintegration from a probabilistic program and an expression of a quantity to be observed, even when the observation has probability zero. Because the method produces an exact posterior term and preserves a semantics in which monadic terms denote measures, it composes …

The character of quotation (2010)
Chung-chieh Shan
Linguistics and Philosophy, 33 (5), 417-443

This paper presents syntactic and semantic rules for a fragment of English with mixed quotation. The fragment shows that quotation has a recursive and compositional structure. Quoted expressions turn out to denote characters, so the semantics of quotation simulates the pragmatics of speech, including dependence on utterance contexts and reference to mental entities. The analysis also accommodates varieties of unquotation, pure quotation, and causal reference.

Finally tagless, partially evaluated: tagless staged interpreters for simpler typed languages (2009)
Chung-chieh Shan, Jacques Carette, and Oleg Kiselyov
Journal of Functional Programming, 19 (5), 509-543

We have built the first family of tagless interpretations for a higher-order typed object language in a typed metalanguage (Haskell or ML) that require no dependent types, generalized algebraic data types, or postprocessing to eliminate tags. The statically type-preserving interpretations include an evaluator, a compiler (or staged evaluator), a partial evaluator, and call-by-name and call-by-value continuation-passing style (CPS) transformers. Our principal technique is to encode de Bruijn or higher-order abstract syntax using combinator functions rather than data constructors. In other words, we represent object terms not in an initial algebra but using the coalgebraic structure of the λ-calculus. Our representation also simulates inductive maps from types to types, which are required for typed partial evaluation and CPS transformations. Our encoding of an object term abstracts uniformly over the family of ways to interpret it, yet statically assures that the interpreters never get stuck. This family of interpreters thus demonstrates again that it is useful to abstract over higher-kinded types.

Embedded probabilistic programming (2009)
Oleg Kiselyov and Chung-Chieh Shan
IFIP Working Conference on Domain-Specific Languages, 360-384

Two general techniques for implementing a domain-specific language (DSL) with less overhead are the finally-tagless embedding of object programs and the direct-style representation of side effects. We use these techniques to build a DSL for probabilistic programming, for expressing countable probabilistic models and performing exact inference and importance sampling on them. Our language is embedded as an ordinary OCaml library and represents probability distributions as ordinary OCaml programs. We use delimited continuations to reify probabilistic programs as lazy search trees, which inference algorithms may traverse without imposing any interpretive overhead on deterministic parts of a model. We thus take advantage of the existing OCaml implementation to achieve competitive performance and ease of use. Inference algorithms can easily be embedded in probabilistic programs themselves.

A static simulation of dynamic delimited control (2007)
Chung-chieh Shan
Higher-Order and Symbolic Computation, 20 (4), 371-401

We present a continuation-passing-style (CPS) transformation for some dynamic delimited-control operators, including Felleisen’s control and prompt, that extends a standard call-by-value CPS transformation. Based on this new transformation, we show how Danvy and Filinski’s static delimited-control operators shift and reset simulate dynamic operators, allaying in passing some skepticism in the literature about the existence of such a simulation. The new CPS transformation and simulation use recursive delimited continuations to avoid undelimited control and the overhead it incurs in implementation and reasoning.

Explaining crossover and superiority as left-to-right evaluation (2006)
Chung-chieh Shan and Chris Barker
Linguistics and Philosophy, 29 (1), 91-134

We present a general theory of scope and binding in which both crossover and superiority violations are ruled out by one key assumption: that natural language expressions are normally evaluated (processed) from left to right. Our theory is an extension of Shan’s (2002) account of multiple-wh questions, combining continuations (Barker, 2002) and dynamic type-shifting. Like other continuation-based analyses, but unlike most other treatments of crossover or superiority, our analysis is directly compositional (in the sense of, e.g., Jacobson, 1999). In particular, it does not postulate a level of Logical Form or any other representation distinct from surface syntax. One advantage of using continuations is that they are the standard tool for modeling order-of-evaluation in programming languages. This provides us with a natural and independently motivated characterization of what it means to evaluate expressions …

Edit your profile