Synthetic Data for Biomedical Applications
Before joining SDSC, Arshjot Khehra received his MSc in Artificial Intelligence from USI Lugano, where he completed his thesis on hierarchical graph reinforcement learning. Previously, he worked for 4+ years across India and Singapore gaining data science experience in insurance, logistics, and manufacturing sectors. He also holds a BSc in Industrial Engineering from PEC Chandigarh. Over the course of his career, Arshjot worked on a wide array of projects, such as, handwritten text recognition and generation, voice matching across phone call recordings, policy lapse rate prediction for customer retention, and automated insurance claim processing.
Matthias Galipaud obtained his PhD in evolutionary biology in 2012 from the University of Burgundy in Dijon (France), and held postdoctoral positions as a mathematical biologist at the university of Bielefeld (Germany) and the university of Zurich, where he researched the evolutionary theories of aging and mate choice. In 2020, he became a data scientist, developing machine learning solutions for startups in Switzerland and Australia before joining the SDSC Innovation Team in November 2022.
Valerio started his career working for 7 years as a particle-physics researcher at CERN. There, he used state-of-the-art techniques to extract information from data, especially to search for traces of dark matter in particle collisions. Since 2016, he has worked in consulting, applying data science in several industries. First, he joined the Quant team of Ernst & Young in Geneva. Later, he created his own company, SamurAI sàrl, providing consulting services for his clients. He also has a passion for teaching very complex subjects in simple terms. That is why he particularly enjoys offering training programs to private companies and universities. Valerio joined the SDSC in Mai 2022 as a Principal Data Scientist with the mission of accompanying industrial partners and other institutions through their data science journey.
Presentation
Overview
Recently, synthetic data has enjoyed growing interest from the biomedical sector. Synthetic patient data helps in leveraging privacy issues. Augmenting datasets with synthetic records helps with increasing classification model training performance in the face of scarce health data and rare minority classes (e.g. rare diseases).
During this one-day workshop, organized by CHUV and SDSC, we will review available tools for synthetic data generation and use cases in the biomedical and pharmaceutical sectors.
Details
Target Audience
Experienced professionals, executives, and data scientists in the biomedical and pharmaceutical sectors wishing to acquire hands-on knowledge on synthetic data generation and usage.
As the workshop involves hands-on sessions, prior experience with the programming language Python is required. The workshop will be held in English.
Programme
Objectives
By the end of the day, participants will:
- Have a grasp of current available methods for synthetic tabular and image data generation.
- Have identified use cases and challenges of synthetic data in the biomedical and pharmaceutical sector.
- Have hands-on experience with generating synthetic data with python and evaluating its quality.
Agenda
09:00
Welcome coffee
09:30
Welcome & introduction
09:40
Synthetic data: How it works and where it is currently used
10:10
GANs, VAEs and diffusion: a deeper dive
10:55
Break
11:15
Applications in healthcare
11:45
Towards the use of synthetic data in biomedical applications: Evaluation of privacy and utility tradeoff
12:15
Lunch
13:15
Applications in the pharmaceutical industry
14:00
Hands-on (part 1): Understanding synthetic data generation (e.g. generating synthetic medical images for image classification)
14:50
Break
15:10
Hands-on (part 2): Understanding synthetic data evaluation (e.g., sharing survival data evaluating the utility and privacy of tabular synthetic data
16:40
Panel discussion
17:10
Concluding remarks, Apéro & Networking
Instructors
Jeremie Despraz, MS, Principal Data Scientist in Clinical AI, CHUV
Matthias Galipaud, PhD, Senior Data Scientist, SDSC, ETHZ
Beyrem Kaabachi, MS, Data Scientist in Health Data Privacy, CHUV
Arshjot Khehra, Data Scientist, SDSC ETHZ
Jean-Louis Raisaro, PhD, Tenure-Track Assistant Professor in Biomedical data science, CHUV
Alena Simalatsar, PhD, Assistant Professor, HES-SO
Practical Information
Price
Non-members: 150/pers
Ongoing collaborations with SDSC or BDSC: free
Availability & Registration
52 registered participants - Registration closed.
Other events
ETH Industry Day 2023
Silvia holds an MSc in Computer Science from EPFL and a PhD in Computer Science from the University of York, UK. She has been a senior research fellow at the University of Trento and later at Politecnico di Milano, Italy. Here, she had the chance to work on Marie Curie and ERC projects relating to natural language processing. From 2012 to 2019, she was a Senior Manager and NLP expert at ELCA Informatique Switzerland, whose AI department she helped create and expand. Silvia joined the Swiss Data Science Center in 2019 and is currently its Chief Transformation Officer, in charge of the team leading organizations to digital transformation.
SDSC Hackathon
Valerio started his career working for 7 years as a particle-physics researcher at CERN. There, he used state-of-the-art techniques to extract information from data, especially to search for traces of dark matter in particle collisions. Since 2016, he has worked in consulting, applying data science in several industries. First, he joined the Quant team of Ernst & Young in Geneva. Later, he created his own company, SamurAI sàrl, providing consulting services for his clients. He also has a passion for teaching very complex subjects in simple terms. That is why he particularly enjoys offering training programs to private companies and universities. Valerio joined the SDSC in Mai 2022 as a Principal Data Scientist with the mission of accompanying industrial partners and other institutions through their data science journey.
Sean obtained his PhD in Telecommunications Engineering from Dublin City University in 2001. Since then he has worked in large industry and startup contexts but has spent most of his time working in academic research labs with a strong applied focus spanning both Ireland and Switzerland. Sean has experience with all aspects of the research project lifecycle, ranging from project inception to proposal stage to project execution and reporting. Having a keen interest in technology trends and evolution, he strives to maintain a hands on approach with practical experience with key technologies in the rapidly changing cloud and analytics technology landscape. Sean works on the Renku Infrastrucuture team, leveraging his experience with modern cloud technologies, helping to make Renku easy to deploy and manage.
Luis is originally from Spain, where he completed his bachelor's studies in Electrical engineering, and the Ms.C. on signal theory and communications, both at the University of Seville. During his Ph.D. he started focusing on machine learning methods, more specifically message passing techniques for channel coding, and Bayesian methods for channel equalization. He carried it out between the University of Seville and the University Carlos III in Madrid, also spending some time at the EPFL, Switzerland, and Bell Labs, USA, where he worked on advanced techniques for optical channel coding. When he completed his Ph.D. in 2013, he moved to the Luxembourg Center on Systems Biomedicine, where he switched his interest to neuroscience, neuroimaging, life sciences, etc., and the application of machine learning techniques to these fields. During his 4 and a half years there as a Postdoc, he worked on many different problems as a data scientist, encompassing topics such as microscopy image analysis, neuroimaging, single-cell gene expression analysis, etc. He joined the SDSC in April 2018. As Lead Data Scientist, Luis coordinates projects in various domains. Several projects focus on the application of natural language processing and knowledge graphs to the study of different phenomena in social and political sciences. In the domains of architecture and engineering, Luis is responsible for projects centered on the application of novel generative methods to parametric modeling. Finally, Luis also coordinates different projects in robotics, ranging from collaborative robotic construction to deformable object manipulation.
Carlos Vivar Ríos joined the SDSC in 2023, where he is part of the Open Research Data and Engagement Unit (ORDES). As a multidisciplinary data engineer, he brings a diverse background in biology, cognitive sciences, and bioinformatics from the University of Malaga. His multifaceted professional career spans several disciplines, including genomics at RIKEN in Yokohama, multidimensional image analysis in microscopy at the University of Lausanne (UNIL), and cellular biology modeling at INRIA in Lyon. Carlos has been involved in a variety of projects, such as analyzing astrocyte calcium dynamics, de novo sequencing Solea senegalensis, drug repurposing for Alzheimer's based on GWAS studies, conducting geospatial analysis for linguistic corpora, and assessing drought through remote sensing. He is dedicated to advancing reproducible research methods and actively supports the open science movement.
Contact us
Let’s talk Data Science
Do you need our services or expertise?
Contact us for your next Data Science project!