Inferring Actionable Biomarkers from Routine Hematoxylin and Eosin-Stained Tissue.

In the last decade, clinical genomic sequencing has advanced personalized medicine, enabling targeted treatment recommendations based on somatic mutations in a patient’s tumor. Using an FDA-authorized assay, our institution has compiled a clinical sequencing database of over 60,000 patients with solid tumors. Critically, we have also assembled the H&E-stained whole-slide images (WSIs) associated with the same specimens. This cohort offers an opportunity to uncover genotypic-phenotypic correlations of human cancers at an unprecedented scale, thereby extending personalized medicine to patients beyond the reach of genomic sequencing. Several recent studies have established models to infer individual molecular features directly from H&E imaging, but recent advances in machine learning and dataset scale warrant further exploration of pan-cancer associations between histopathologic features and actionable variants. We hypothesize that specific genomic biomarkers can be linked to image features extracted from whole-slide images. In this project, we aim to [1] develop open-source, scalable tools for label-free slide preprocessing and tumor identification, [2] integrate genomic and histopathologic features to test the ability to infer clinically-actionable genomic variants from H&E slides across solid tumor histologies, [3] develop new browsing capabilities for joint exploration of genomic, clinical and histopathological features using natural language- and image-based queries.