A Bayesian non-parametric methodology for inferring grammar complexity

English

Séminaire Probabilités & Statistique

24/10/2019 - 14:00 Robin Ryder (Université Paris-Dauphine) Salle 106 - Batiment IMAG

Based on a set of strings from a language, we wish to infer the complexity of the underlying grammar. To this end, we develop a methodology to choose between two classes of formal grammars in the Chomsky hierarchy: simple regular grammars and more complex context-free grammars. To do so, we introduce a probabilistic context-free grammar model in the form of a Hierarchical Dirichlet Process over rules expressed in Greibach Normal Form. In comparison to other representations, this has the advantage of nesting the regular class within the context-free class. We consider model comparison both by exploiting this nesting, and with Bayes' factors. The model is fit using a Sequential Monte Carlo method, implemented in the Birch probabilistic programming language. We apply this methodology to data collected from primates, for which the complexity of the grammar is a key question.