MLTox

Enhancing toxicological testing through machine learning

Started
September 1, 2021
Status
In Progress
Share this project

Abstract

We use machine learning (ML) methods to predict the effects of chemicals on aquatic species. Our main goal is to use data from in vivo (whole organisms) experiments to infer the effects of chemicals on organisms for which no testing data is available (both for the chemical and for the organism). In the literature, this kind of problem is also known as across-chemical (and across-species) extrapolation. Usually, extrapolation across chemicals is performed using measures of chemical similarity under the assumption that similar chemicals will be similarly toxic to the same species. Extrapolation across species can be performed based on measured chemical effects on some species and the similarity between species, either by phylogenetic distance or sequence/structure similarity of known molecular targets of the chemicals, if at all available, or through similarity in physiological traits. Given the enormous number of chemicals and of potentially affected species, extrapolation of chemical by chemical or species by species is a daunting task. In an interdisciplinary effort by ecotoxicologists and ML experts, we combine thus far unconnected data to obtain predictions of toxicity across chemicals and species. We will use a variety of data sources and types, all available in different publicly available tools and databases, combining chemical structure, and data on chemical testing on different organisms.

People

Collaborators

SDSC Team:
Lilian Gasser
Quentin Duchemin
Guillaume Obozinski
Fernando Perez-Cruz

PI | Partners:

Eawag, Environmental Toxicology:

  • Prof. Dr. Kristin Schirmer
  • Dr. Christoph Schür

More info

Eawag, Systems Analysis, Integrated Assessment and Modelling:

  • Dr. Marco Baity Jesi

More info

description

Motivation

Ecotoxicological testing requires investing large amounts of money, workforce, and time, in addition to the animal suffering from in vivo tests. There are global efforts to reduce or replace animal testing for human and environmental risk assessment for both ethical and feasibility concerns.

Proposed Approach / Solution

As a first step, we have compiled a benchmark dataset on acute mortality of the taxonomic groups fish, crustaceans, and algae, which is intended to be used to train, compare, and benchmark models on. The ADORE dataset has been compiled from several sources and contains data subsets of varying complexity, starting with single-species data, to data within a taxonomic group, to data across taxonomic groups (Figure 1). It has been published as a Nature data descriptor and is openly available.

From the challenges provided in ADORE, we have first focused on fish acute mortality by training standard ML models such as LASSO, random forest and XGBoost and comparing them to more elaborate models. Simultaneously, we are using our models to gain better understanding of the nonlinear relationships that connect species, chemicals, and the related toxicity.

Impact

We provide new means to protect the environment from toxicants by combining ML and in vivo data. This is valuable in the discussion between toxicologist, regulators, and data scientists to progress in reducing experiments on animals.

Figure 1: The ADORE benchmark datasets consists of eleven challenges of varying complexity, from including all taxonomic groups to single species data.

Gallery

Annexe

Publications

  • Wu, J., D’Ambrosi, S., Ammann, L., Stadnicka-Michalak, J., Schirmer, K., & Baity-Jesi, M. (2022). Predicting chemical hazard across taxa through machine learning. Environment International, 163, 107184. https://doi.org/10.1016/j.envint.2022.107184
  • Schür C., Gasser L., Perez-Cruz F., Schirmer K., Baity-Jesi M. (2023) A benchmark dataset for machine learning in ecotoxicology. Sci Data 10, 718 (2023). https://doi.org/10.1038/s41597-023-02612-2
  • Gasser, L., Schür, C., Perez-Cruz, F., Schirmer, K., & Jesi, M. B. (2024). Machine learning-based prediction of fish acute mortality: Implementation, interpretation, and regulatory relevance. Environmental Science: Advances. https://doi.org/10.1039/D4VA00072B
  • Dataset: ADORE Reproducible Data Science | Open Research | Renku
  • Poster: Schür C., Gasser L., Wu J., Perez-Cruz F., Schirmer K., Baity-Jesi M. (2022) Preparation and characterization of a benchmark data set for machine learning in ecotoxicology, Swiss Society of Toxicology, SST Annual Meeting 2022, 17. November, Basel
  • Poster: Schür C., Gasser L., Perez-Cruz F., Schirmer K., Baity-Jesi M. (2023) A Benchmark Dataset for Machine Learning in Ecotoxicology. 33rd SETAC Europe Annual Meeting, Dublin, Ireland.
  • Talk: Schür C., Gasser L., Perez-Cruz F., Schirmer K., Baity-Jesi M. (2023) Predicting Ecotoxicity across Taxa through Machine Learning. 33rd SETAC Europe Annual Meeting, Dublin, Ireland.
  • Poster: Schür C., Gasser L., Wu J., Perez-Cruz F., Schirmer K., Baity-Jesi M. (2023) Machine learning for predictive ecotoxicology in fish, Swiss Society of Toxicology, SST Annual Meeting 2023, 16. November, Basel

Bibliography

  1. Luechtefeld, T., Marsh, D., Rowlands, C., & Hartung, T. (2018). Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility. Toxicological Sciences, 165(1), 198-212. Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility

Publications

Related Pages

More projects

ML-L3DNDT

Completed
Robust and scalable Machine Learning algorithms for Laue 3-Dimensional Neutron Diffraction Tomography
Big Science Data

BioDetect

Completed
Deep Learning for Biodiversity Detection and Classification
Energy, Climate & Environment

IRMA

In Progress
Interpretable and Robust Machine Learning for Mobility Analysis
No items found.

FLBI

In Progress
Feature Learning for Bayesian Inference
No items found.

News

Latest news

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data
May 1, 2024

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

We’ve developed a smart solution for wind tunnel testing that learns as it works, providing accurate results faster. It provides an accurate mean flow field and turbulence field reconstruction while shortening the sampling time.
The Promise of AI in Pharmaceutical Manufacturing
April 22, 2024

The Promise of AI in Pharmaceutical Manufacturing

The Promise of AI in Pharmaceutical Manufacturing

Innovation in pharmaceutical manufacturing raises key questions: How will AI change our operations? What does this mean for the skills of our workforce? How will it reshape our collaborative efforts? And crucially, how can we fully leverage these changes?
Efficient and scalable graph generation through iterative local expansion
March 20, 2024

Efficient and scalable graph generation through iterative local expansion

Efficient and scalable graph generation through iterative local expansion

Have you ever considered the complexity of generating large-scale, intricate graphs akin to those that represent the vast relational structures of our world? Our research introduces a pioneering approach to graph generation that tackles the scalability and complexity of creating such expansive, real-world graphs.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!