Improving Bioinformatics Methods for Analysis of Virus-Associated Cancers

During the past century, the association between viruses and cancer has remained a focal topic in cancer research due to the large number of tumors associated with a viral origin. According to the world health organization, an estimated 5% of cancers are caused by indirect viral mechanisms (e.g., host immunosuppression), and as many as 15% of all cancers are directly caused by viruses known as “tumor viruses”. Mechanistically, while tumor viruses express viral oncoproteins that are associated with tumor development, most tumor viruses also commonly insert their own genome into the target host genome, and some will also form new viral-host fusion genes. Unfortunately, the role of viral genomic integration in tumor development has been difficult to study due to a lack of robust and generalizable viral analysis pipelines. Further, given the complexity of the novel viral-host fusion genes that are formed, functional prediction and annotation also remains a problem, again due to a lack of robust informatics tool kits. Consequently, pipelines that facilitate the detection of viral integration sites in the host genome will likely be crucial for both understanding viral-driven disease genetics and building genetics-based companion diagnostics in the future. Here we propose to address this gap in our field through the following aims: 1) To develop a pipeline capable of bridging multiple next-generation sequencing (NGS) technologies that accurately detects virus-host integration and fusion sites in virus-associated cancers, and 2) To integrate and re-analyze the public data related to virus-associated cancers in the context of viral-host integration and fusion events in order to facilitate functional annotation. Through these aims, we now have the opportunity to integrate and re-annotate publicly available data in the context of virus integration-induced alterations in order to leverage public resources to improve the functional annotation of viral-host events.

Final Report

The research conducted during this funding period significantly lowered barriers to innovation, particularly in the field of viral integration analysis, by refining and expanding the functionality of the short read viral caller software packages including (SearcHPV and MCPViewer). 

New R01 funding built on this project will allow for further refinement and expansion of the viral caller across additional viruses. The project led to multiple key publications and abstracts, as well as increased usage of the viral
detection pipelines in their specific fields, positioning it as a valuable resource for ongoing research. This productivity, along with the new NIH R01DE032699 grant, highlights the project's role in driving forward both data generation and optimization, while also supporting continued innovation in the analysis of
virus-associated cancers.

Learn More About Their Work:

MDPI.com Article (26 October, 2022):
Analysis of Human Papilloma Virus Content and Integration in Mucoepidermoid Carcinoma (Download PDF)

Funded NIH R01, 1R01DE032699-01A1, Defining the Role of HPV Integration Structures in HNSCC Molecular Heterogeneity. Data and integration analysis tools supporting this R01 grant were developed with this ICI grant. 

AACR 2023: LB252 Abstract: HPV integration events are heterogenous, clonally selected and associated with spatially distinct transcriptomic profiles in aggressive HPV-positive oropharyngeal squamous cell carcinoma.

Viruses:  Published manuscript on HPV integration in mucoepidermoid carcinoma. 

ASTRO MultiD, 2024: Abstract Accepted: Tumor Mutational Burden Predicts Survival in Merkel Cell Carcinoma. 

ASTRO MultiD, 2024: Abstract Accepted: Integrative Single-Cell RNA-Seq, Single Cell ATAC-Seq, and Spatial RNA-Seq Analysis of Heterogeneity in HPV-Related Head and Neck Squamous Cell Carcinoma

Clinical Cancer Research: New Manuscript Submission: MCPV integration in Merkel cell carcinoma