Identifying Subgroups in Biomedical Datasets using Data Attribution

Yes

Identifying Subgroups in Biomedical Datasets using Data Attribution

Date Posted: October 28, 2024

Date Recorded: October 9, 2024

Speaker(s): Djuna von Maydell, MIT

All Captioned Videos
Computational Tutorials

Description:

Understanding how training data influences model predictions ("data attribution") is an active area of machine learning research. In this tutorial, we will introduce a data attribution method (datamodels: https://gradientscience.org/datamodels-1/) and explore how it can be applied in the life sciences to identify meaningful subgroups in biomedical datasets, such as disease subtypes. We will begin with a simple example from image classification (CIFAR10), offering a step-by-step guide to demonstrate how the data attribution method works in practice. Since the approach involves training thousands of lightweight classifiers, we will focus on strategies for fast and efficient model training. Next, we will explore its applications in biomedical science, with a focus on single-cell and genetic datasets, highlighting the biological insights gained from applying this computational approach. The tutorial will conclude with an interactive, hands-on session using Google Colab, where participants can apply the techniques themselves and explore the approach further. This session is designed to be accessible to participants of all coding and machine learning experience levels—whether you're new to machine learning or curious about its intersection with biomedical applications.

Slides: https://drive.google.com/file/d/1qGahNYBUnThba07D2D9gZTviiU_kOedF/view?u...
Github repository of tutorial code: https://github.com/djunamay/datamodels_tutorial
Code with outputs: https://colab.research.google.com/drive/1u2jZzWs7SVT6kj-O8rMsUphHfvyeqnH...
Code no outputs: https://colab.research.google.com/drive/1lwl7-Xsc7lg9bTg97hEEqPt54x-J1qe...

Search form

You are here

Video

Yes

Identifying Subgroups in Biomedical Datasets using Data Attribution