Séminaire Statistique-Probabilités et Sciences de Données

Stat-Prob & Data Science Seminar @ LMNO

  • Usual days : Thursday at 14.00
  • Responsible : Faïcel Chamroukhi
  • Contact : chamroukhi at unicaen dot fr
  • How to find us : Address: Université de Caen, Campus 2 Côte de Nacre, Boulevard du Maréchal Juin, 14000 Caen-Cedex, Bâtiment Sciences 3.
    • By train: If you come by train from Paris or other areas, get off at Caen station. Then, take the tramway "A" direction Caen - Campus 2 and get off at the Last stop (Campus 2). The tramway station is just at the front of the Train station and the Building Sciences 3 in which the seminar takes place is just at few meters from the tram stop.
    • By car: If you come by car, follow "périphérique nord" (North caen ringroad) : exit n°5, direction Douvres-la-Délivrande
    • By bus: From Caen city center, you can also take the bus lines 10, 13 et 14 and get off at the station : Maréchal Juin or take the bus line 7 and get off at the station : centre commercial campus 2
    • Our location on Google Maps: 


Programme 2016/2017 :

Wajdi Farhani

  • Date : 23 february 2017 at 14:00
  • Room : S3 124
  • Affiliation : Artfact http://www.artfact-online.fr/about.html
  • Title :  Online clustering with MCMC
  • Abstract : Clustering analysis is increasingly used in modern industry and classical algorithms aren't able to fit with all use-cases. In this talk, I will present an online clustering algorithm based on MCMC methods developed within Artfact. With each incoming data-point, the algorithm aims to detect the optimal number of clusters and their geometrical position within the n-dimensional space (n: number of variables/columns).
    I will begin with a presentation of the algorithm before talking about the challenges of its implementation.Then, I will give some concrete example of industrial use-cases where such algorithm had been or can be used.
  • Slides : TBA

Tuyen B. Huynh

  • Date : 09 february 2017 at 14:00
  • Room : S3 124
  • Affiliation : Laboratoire de Mathématiques Nicolas Oresme (LMNO), Université de Caen
  • Webpage : 
  • Title :  Statistical learning in large-scale scenarios: State of the art.
  • Abstract : TBA.
  • Slides : TBA

Faïcel Chamroukhi

  • Date : 02 february 2017 at 14:00
  • Room : S3 124
  • Affiliation : Laboratoire de Mathématiques Nicolas Oresme (LMNO), Université de Caen
  • Webpage : http://math.unicaen.fr/~chamroukhi/
  • Title :  Hierarchical dynamical mixture models for high-dimensional data.
  • Abstract : The unsupervised statistical analysis of high-dimensional data, in particular functional data, is a popular topic in modern statistics and is related to the filed of statistical inference of latent data models. In this talk, I will present latent data models and inference algorithms to learn from heterogeneous temporal and functional data. First, I will present hidden process regression models for non-stationary temporal data modeling and segmentation. Then, I will consider the problem of statistical modeling when the basic unit of information is a curve, that is, the framework of functional data analysis, and present hierarchical dynamical mixtures for simultaneous clustering and segmentation of heterogeneous functional data. The presented models will be illustrated on real-world applications.
  • Slides : TBA

Antoine Channarond

  • Date : 19 january 2017 at 14:00
  • Room : S3 124
  • Affiliation : Laboratoire de Mathématiques Raphaël Salem (LMRS), Université de Rouen
  • Webpage : http://lmrs.univ-rouen.fr/Persopage/Channarond/
  • Title : Modèle de graphe aléatoire à positions latentes, et applications statistiques
  • Abstract : On considère le modèle de graphe aléatoire suivant: les noeuds sont aléatoirement disposés dans un espace euclidien selon une certaine densité non-paramétrique f et la probabilité de connexion entre deux noeuds ne dépend que de la distance entre eux. D'un point de vue statistique, les positions des noeuds ne sont pas observées: elles sont dites latentes. Un défi majeur dans ce contexte est d'obtenir de l'information sur l'espace latent à partir du graphe seulement. L'exposé abordera les problème d'estimation des distances et de clustering des noeuds du graphe: les clusters sont définis comme les composantes connexes d'un ensemble de niveau t de la densité f, et il s'agit d'inférer quels noeuds sont dans l'ensemble de niveau, et dans quel cluster.
  • Slides : TBA

Vincent Roger

  • Date : 17 November 2016 at 14:00
  • Room : S3 279
  • Affiliation : Laboratoire des Sciences de L'information et des Systèmes (LSIS), Université de Toulon
  • Title : Unsupervised learning from large-scale bioacoustic data
  • Abstract : Understanding communication or interpreting different animal signals is
    an important topic in bioacoustics. We investigate probabilistic
    models on real-world challenging bioacoustic sound scenes. Theses challenging
    problems do not have ground truth and we do not have a prior knowledge like in
    speech analysis.
    Thus, we investigate Bayesian non-parametric models to segment the
    different sounds. First, we study sequential models based on Hiden Markov
    Model (HMM), that is Hierarchical Dirichlet Process HMM (HDP-HMM).
    Next, we study a non-sequential model: Dirichlet Process Gaussian Mixture
    Model (DPGMM). The main problem of such approaches is the evaluation
    the results. We give a first answer, and we will show the next steps we want
    to follow.
  • Slides :

Charles Bouveyron

  • Date : 10 November 2016 at 14.00
  • Room : S3 263
  • Affiliation : MAP5, Université Paris Descartes
  • Webpage : http://w3.mi.parisdescartes.fr/~cbouveyr/
  • Title : The Stochastic Topic Block Model for the Clustering of Vertices in Networks with Textual Edges
  • Abstract
    Due to the significant increase of communications between individuals via social media (Facebook, Twitter, Linkedin) or electronic formats (email, web, e-publication) in the past two decades, net- work analysis has become a unavoidable discipline. Many random graph models have been proposed to extract information from networks based on person-to-person links only, without taking into account information on the contents. This paper introduces the stochastic topic block model (STBM), a probabilistic model for networks with textual edges. We address here the problem of discovering meaningful clusters of vertices that are coherent from both the network interactions and the text contents. A classification variational expectation-maximization (C-VEM) algorithm is proposed to perform inference. Simulated data sets are considered in order to assess the proposed approach and to highlight its main features. Finally, we demonstrate the effectiveness of our methodology on two real-word data sets: a directed communication network and a undirected co-authorship network.
  • Slides :  link

Emeline Perthame

  • Date : 27 October 2016 at 14.00
  • Room : S3 279
  • Affiliation : INRIA MISTIS, Grenoble
  • Webpage : https://emelineperthame.wordpress.com/
  • Title : Inverse regression approach to non-linear high-to-low dimensional mapping
  • Abstract
    During this presentation, I will introduce a model that adresses non-linear regression issues, when the number of covariates is large with regard to the number of responses. In the proposed method, non linearity is handled via a mixture of regressions. Mixture models and paradoxically the so-called mixture of regression models are mostly used to handle clustering issues and few articles refer to mixture models for actual regression and prediction purposes. Interestingly, it was shown in (Deleforge et al., 2015 [1]) that a prediction approach based on mixture of regressions and on an inverse regression trick in a Gaussian setting achieves low prediction errors compared to the literature. However, the method developed by these authors is not designed to perform robust regression. Indeed, under a Gaussian setting, outliers are known to affect the stability of the results and can lead to misleading predictions. Robust approaches that are tractable in high dimension are therefore needed in order to improve the accuracy of regression methods under the presence of outliers.

    The goal of this talk is to present how we refine the work in [1]  by considering mixture of Student distributions that are able to handle outliers. As in [1], we propose to handle high-dimensional data by using an inverse regression trick. However, in the Student mixture context, a joint modelling approach on both responses and regressors is necessary in order to guarantee the tractability of the inverse regression of interest.

    This work is a collaboration with Florence Forbes (INRIA, Grenoble) and Antoine Deleforge (INRIA, Rennes). 

    [1] Deleforge, A., Forbes, F. and Horaud, R. (2015). High-dimensional regression with gaussian mixtures and partially-latent response variables. Statistics and Computing, 25(5):893–911.
    [2] Perthame, E., Forbes, F. and Deleforge, A. (2016).  Inverse regression approach to robust non-linear high-to-low dimensional mapping. Submitted, 2016.
  • Slides : Perthame.pdf