Image credit: Unsplash

Interpretable Network Selection

May 30, 2021

This project develops interpretable model and network selection methods for high-dimensional data. The motivating problem is that many predictive models can achieve similar accuracy, while telling very different scientific stories. Instead of selecting a single apparently optimal model, this line of work studies sets of models with comparable predictive performance and uses their structure to reveal stable, ambiguous, or context-dependent relationships among variables.

The SWAG framework is central to this direction. It is a wrapper method for sparse learning that emphasizes both predictive ability and interpretability. In applications to genomics and biomedical data, this perspective is especially useful because correlated biomarkers can have competing or even antagonistic roles. The resulting selected model sets can be interpreted as networks, making it possible to identify variables that are consistently active, variables whose effects depend on their context, and variables whose apparent importance is unstable across equivalent models.

The project has been applied to gene selection, inflammatory bowel disease, melanoma, breast cancer microRNAs, neuroimaging, stellar blend classification, sleep epigenetics, and medical decision support. Across these applications, the goal is the same: build statistical learning tools that are not only accurate, but also scientifically interpretable.

Selected related publications include:

Swag: A Wrapper Method for Sparse Learning
A Predictive Based Regression Algorithm for Gene Network Selection
A Paradigmatic Regression Algorithm for Gene Selection Problems
Evidence of antagonistic predictive effects of miRNAs in breast cancer cohorts through data-driven networks
Is Nonmetastatic Cutaneous Melanoma Predictable Through Genomic Biomarkers?
Stellar Blend Image Classification Using Computationally Efficient Gaussian Processes
A Multi-Model Framework to Explore ADHD Diagnosis From Neuroimaging Data
Epigenetic Impact of Sleep Timing in Children: Novel DNA Methylation Signatures via SWAG Analysis

Model Selection

Roberto Molinari

Assistant Professor in Statistics

My research interests include robust statistics, signal processing, model selection and differential privacy.

Publications

Epigenetic Impact of Sleep Timing in Children: Novel DNA Methylation Signatures via SWAG Analysis

Erika Richter, Priyadarshni Patel, Yagmur Y. Ozdemir, Ukamaka V. Nnyaba, Roberto Molinari, Jeganathan R. Babu, Thangiah Geetha

October 2025 International Journal of Molecular Sciences, 26, (21), 10615

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: Epigenetic Impact of Sleep Timing in Children: Novel DNA Methylation Signatures via SWAG Analysis.

Project

Machine Learning and Explainable AI for Type-2 Diabetes Management

Claudio Mazzi, Chiara Seghieri, Roberto Molinari

January 2025 Scientific Meeting of the Italian Statistical Society, 175–180

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: Machine Learning and Explainable AI for Type-2 Diabetes Management.

Project

Stellar Blend Image Classification Using Computationally Efficient Gaussian Processes

Chinedu Eleh, Yunli Zhang, Rafael Bidese, Benjamin W. Priest, Amanda L. Muyskens, Roberto Molinari, Nedret Billor

July 2024 arXiv preprint arXiv:2407.19297

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: Stellar Blend Image Classification Using Computationally Efficient Gaussian Processes.

Project

A Multi-Model Framework to Explore ADHD Diagnosis From Neuroimaging Data

Yagmur Yavuz Ozdemir, Naga Chandra Padmini Nukala, Roberto Molinari, Gopikrishna Deshpande

January 2024 Journal of Data Science, 22, (2)

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: A Multi-Model Framework to Explore ADHD Diagnosis From Neuroimaging Data.

Project

Stellar Blend Image Classification Using Computationally Efficient Gaussian Processes (MuyGPs)

Rafael Bidese, Chinedu Eleh, Yunli Zhang, Roberto Molinari, Nedret Billor, Benjamin Priest, Imene Goumiri, Amanda Muyskens, Alec Dunton

January 2023 Symposium on Data Science and Statistics

Project

Evidence of antagonistic predictive effects of miRNAs in breast cancer cohorts through data-driven networks

Cesare Miglioli, Gaetan Bakalli, Samuel Orso, Mucyo Karemera, Roberto Molinari, Stephane Guerrier, Nabil Mili

March 2022 Scientific Reports

This work attempts to explain some contradictory results that have been found regarding the oncogenic or protective roles of miRNAs in breast cancer progression

Project Link to paper

Non applicability of validated predictive models for intensive care admission and death of COVID-19 patients in a secondary care hospital in Belgium

Nicolas Parisi, Aurore Janier-Dubry, Ester Ponzetto, Charalambos Pavlopoulos, Gaetan Bakalli, Roberto Molinari, Stéphane Guerrier, Nabil Mili

May 2021 Journal of Emergency and Critical Care Medicine

This study highlights how a highly-promoted predictive-score for admitting patients to intensive care is not applicable in Belgium where other predictive models are more appropriate

Project Link to paper

Chameleon microRNAs in Breast Cancer: Their Elusive Role as Regulatory Factors in Cancer Progression

Cesare Miglioli, Gaetan Bakalli, Samuel Orso, Mucyo Karemera, Roberto Molinari, Stephane Guerrier, Nabil Mili

December 2020 bioRxiv

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: Chameleon microRNAs in Breast Cancer: Their Elusive Role as Regulatory Factors in Cancer Progression.

Project

Swag: A Wrapper Method for Sparse Learning

Roberto Molinari, Gaetan Bakalli, Stephane Guerrier, Cesare Miglioli, Samuel Orso, Mucyo Karemera, Olivier Scaillet

June 2020 arXiv preprint arXiv:2006.12837

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: Swag: A Wrapper Method for Sparse Learning.

Project

Is Nonmetastatic Cutaneous Melanoma Predictable Through Genomic Biomarkers?

Mattia Branca, Samuel Orso, Roberto Molinari, Haotian Xu, Stephane Guerrier, Yuming Zhang, Nabil Mili

January 2018 Melanoma Research, 28, (1), 21–29

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: Is Nonmetastatic Cutaneous Melanoma Predictable Through Genomic Biomarkers?.

Project

A Predictive Based Regression Algorithm for Gene Network Selection

Stephane Guerrier, Nabil Mili, Roberto Molinari, Samuel Orso, Marco Avella-Medina, Yanyuan Ma

June 2016 Frontiers in Genetics

A new prediction-based objective function for gene selection is proposed, enabling the identification of small, interpretable models with high predictive power, outperforming alternatives while offering a network of models

Project

Differentiating Inflammatory Bowel Diseases by Using Genomic Data: Dimension of the Problem and Network Organization

Nabil Mili, Roberto Molinari, Yanyuan Ma, Stephane Guerrier

January 2016 Human Genomics

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: Differentiating Inflammatory Bowel Diseases by Using Genomic Data: Dimension of the Problem and Network Organization.

Project

A Paradigmatic Regression Algorithm for Gene Selection Problems

Stephane Guerrier, Nabil Mili, Roberto Molinari, Samuel Orso, Marco Avella-Medina, Yanyuan Ma

November 2015 arXiv preprint arXiv:1511.07662

This publication contributes to work in applied statistics, model selection, with a focus reflected in its title: A Paradigmatic Regression Algorithm for Gene Selection Problems.

Project