Model Selection and Approximation in High-dimensional Mixtures of Experts Models: From Theory to Practice
In this thesis, we study the approximation capabilities, model estimation and selection properties, of a rich family of mixtures of experts (MoE) models in a high-dimensional setting, including MoE with Gaussian experts and soft-max (SGaME) or Gaussian gating functions (GLoME).Firstly, we improve upon universal approximation results in the context of unconditional mixture distributions, and study such capabilities for MoE models in a variety of contexts, including conditional probability density functions (PDF) approximation and approximate Bayesian computation.More precisely, we prove that to an arbitrary degree of accuracy, location-scale mixtures of a continuous PDF can approximate any continuous PDF, uniformly, on a compact set; location-scale mixtures of an essentially bounded PDF, resp. of conditional PDF, can approximate any PDF, resp. any continuous conditional PDF whenever the input and output variables are both compactly supported, in Lebesgue spaces.Next, we establish non-asymptotic model selection results in high-dimensional regression scenarios for a variety of MoE regression models, including GLoME and SGaME, based on an inverse regression strategy or a Lasso penalization, respectively. These include results for the selection of the number of mixture components, as well as for the joint variable and covariance matrices rank selection. In particular, these results provide a strong theoretical guarantee: a finite-sample oracle inequality satisfied by the penalized maximum likelihood estimator with a Jensen–Kullback–Leibler type loss, to support the slope heuristic criterion in a finite sample setting, compared to the classical asymptotic criteria.Finally, to support our theoretical results and the statistical study of non-asymptotic model selection, we perform numerical studies by considering simulated and real data, which highlight the performance of our results, including the finite-sample oracle inequalities.