[Merci à Jérôme Lelong pour l'image de Belledonne (for ever)]
# Rencontre GLyS -- 19 juin 2014

Rencontre Grenoble Lyon de Statistique

##
Journée autour de la thématique:

"réduction de dimension" dans le cadre de la visite d'Emmanuel Candès

organisée par Jean-François Coeurjolly (Laboratoire Jean Kuntzmann, Grenoble)
et Anne-Laure Fougères (Institut Camille Jordan, Lyon)
L'objectif de cette journée est de faire se rencontrer les statisticiens des pôles lyonnais et grenoblois pour favoriser des synergies et collaborations. Nous souhaitons aussi profiter de la visite d'Emmanuel Candès au Laboratoire Jean Kuntzmann pour inscrire cette journée autour d'une de ses thématiques de recherche, à savoir la réduction de dimension.
## Localisation

La journée se déroulera à l'amphithéâtre de la Maison Jean Kuntzman. Depuis la gare de Grenoble, prendre le tram B (direction Gières - Plaine des
sports) et descendre à l'arrêt Bibliothèques Universitaires. Le bâtiment se situe en revenant sur vos pas, 200m à gauche de la grande tour (Laboratoire Jean Kuntzmann).
Le plan d'accés précis se trouve là : mi2s.
## Programme prévisionnel

## Inscription

La journée est libre et ouverte à tous dans la mesure de la disponibilité de l'amphithéâtre de la Maison Jean Kuntzmann (environ 100 places). Merci de noter que le buffet du midi est offert aux membres du département probas/stat du LJK et Lyonnais (inscrits au préalable, cf doodle envoyé séparément).

Rencontre Grenoble Lyon de Statistique

"réduction de dimension" dans le cadre de la visite d'Emmanuel Candès

- 9:00-9:30 Café d'accueil
- 9:30-10:15 Vivian Viallon (IFSTTAR et ICJ, Université Lyon 1),
Generalized Fused Lasso: asymptotic properties and robustness against prior misspecification. Application to the joint modeling of multiple sparse regression models

The generalized fused lasso is an extension of the fused lasso aiming at handling features whose effects are structured according to a given network. In this talk I will present theoretical and empirical properties of the generalized fused lasso in the context of generalized linear models, especially in the "fixed p" case. I will mostly focus on the robustness of the adaptive generalized fused lasso against misspecification of the network as well as its applicability when theoretical coefficients are not strictly equal. I will also evaluate the applicability of the generalized fused lasso for jointly modeling multiple sparse regression functions. If there's time, I will finally present preliminary results for two new l1-based methods dedicated to this joint modeling context. This talk is based on joint works with Sophie Lambert-Lacroix (LJK, Grenoble), Holger Höfling (Novartis, Bäsel), and Franck Picard (LBBE, Lyon) (Generalized Fused Lasso) and Edouard Ollier (UMRESTTE, Lyon) (joint modeling). - 10:15-11:00 Frédérique Letué (Univ. Grenoble Alpes), The Dantzig selector in Cox's proportional hazards model.

The Dantzig Selector has been introduced by Candès and Tao (2007) to perform estimation in high-dimensional linear regression models with a large number of explanatory variables and a relatively small number of observations. In this talk, we adress the estimation problem for Cox's proportional hazards function regression models using a framework that extends the theory, the computational advantages and the optimal asymptotic rate properties of the Dantzig selector to the class of Cox's proportional hazards under appropriate sparsity scenarios. We perform a detailed simulation study to compare our approach to other methods and illustrate it on a well-known microarray gene expression data set for predicting survival from gene expressions - 11:00-12:00 Emmanuel Candès (Stanford University), The Knockoff Filter for Controlling the False Discovery Rate.

In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR)---the expected fraction of false discoveries among all discoveries---is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This talk introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, and the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap---their construction does not require any new data---and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. We test the method in combination with statistics from the Lasso for sparse regression, and obtain empirical results showing that the resulting method has far more power than existing selection rules when the proportion of null variables is high. We also apply the knockoff filter to HIV data with the goal of identifying those mutations associated with a form of resistance to treatment plans. This is joint work with Rina Foygel Barber. - 12:00-14:00 Buffet (offert aux membres du département Probas/Stat du LJK et Lyonnais préalablement inscrits)
- 14:00-14:45 Laurent Jacob (LBBE, Lyon), Statistical inference on large feature sets for biological sequences

Several estimation problems in computational biology involving sequences lead to considering sets of features with size exponential in the sequence length. These feature sets correspond to combinations of sequence elements. They are typically too large to be explicitly described and manipulated, but can be represented as a set of paths on particular graphs. We use this implicit representation to make estimation possible. A first application is isoform deconvolution from RNA-Seq data: high throughput sequencing technologies allow us to observe small pieces of RNA which can then be mapped on the genome. However several RNA molecules can correspond to the same gene, with different pieces (exons) spliced out. An important task is to estimate the set of molecule present in a sample. This task can be cast as a regression of observed reads against the set of all possible RNA molecules formed by all combinations of inclusion/exclusion of exons. This set of regressor is of exponential size in the number of exons but we show that a sparse regression against this set can be formulated as a network flow problem on a small graph and solved in polynomial time. Another application is to detect patterns associated with sequence phenotypes. When given a set of sequences. e.g., DNA, along with a property of interest, we seek to extract features of the sequences which are associated with the property, either to obtain a better understanding of the underlying biology, or with the objective to predict the property efficiently on new sequences. If no prior information is available on which sequence features could be of interest, we propose to consider all possible words in the sequence alphabet, of all possible lengths and at all possible positions. This results in a very large set of covariates which could not be explicitly enumerated, but within which sparse estimation is possible using a representation of patterns as paths in a trellis graph. - 14:45-15:30 Anatoli Iouditski (LJK, Univ. Grenoble Alpes), On adaptive signal denoising

We consider the problem of pointwise estimation of multi-dimensional signals from noisy indirect observations on the regular grid. The basic setting of the problem can be summarized as follows: there exists "in the nature" an oracle linear estimator of the signal (a time-invariant linear filter which may be "tuned to the signal") which recovers the signal from observations with small mean-squared error (we refer to such signals as ``well-filtered''). Assuming that we do not know the filter in question (but we do know that such a filter exists), we address the following questions:

- when is it possible to construct an adaptive signal estimation which relies upon observations and recovers the signal with essentially the same error as the oracle estimator?

- how rich is the family of well-filtered (in the above sense) signals?

We establish the estimation framework in which we can give an affirmative answer to the first question and provide a numerically efficient construction of a nonlinear adaptive filter. Further, we establish a simple calculus of ``well-filtered" signals, and show that their family is quite large: it contains, among others, sampled smooth signals, sampled modulated smooth signals and sampled harmonic functions.