This book presents protocols for identification of genetic drivers of cancer.
Author: Timothy K. Starr
Publisher: Humana Press
This book presents protocols for identification of genetic drivers of cancer. Chapters guide readers through a brief history of cancer gene discovery, in silico approaches, in vitro approaches, and in vivo approaches using forward genetic screens in mice. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Cancer Driver Genes: Methods and Protocols aims to provide protocols that will be used and adapted by cancer researchers to expand the knowledge base of molecular mechanisms contributing to initiation, progression, and metastasis of cancer.
Colorectal Cancer is the third most commonly diagnosed cancer and third leading cause of cancer death in both men and women.
Author: Gurkan Bebek
Category: Colon (Anatomy)
Colorectal Cancer is the third most commonly diagnosed cancer and third leading cause of cancer death in both men and women. Recent studies have led to the discovery of cancer driver genes whose mutations are mostly observed in low frequencies compared to the total number of tumors analyzed, suggesting that mutations in distinct subsets of driver genes are sufficient for tumorigenesis. We hypothesize that comparing the networks by which the driver genes contribute to tumorigenesis can generate more meaningful functional similarities. We develop an algorithm for within-species network similarity quantification. We compare subnetworks corresponding to colorectal cancer driver genes using this tool to group driver genes. We validate our approach with cell line experiments and show we can predict driver gene similarities based on functional similarities. The relationships between driver genes whose mutations lead to similar phenotypic outcomes are best understood through comparison of pathways that are dysregulated.
The work presented here shows that replication-incompetent retroviral vector insertional mutagenesis screens are a powerful approach to identify cancer driver genes in diverse cancer types.
Author: Victor Malakwen Bii
Replication-incompetent gammaRV and lentiviral (LV) vectors were then directly compared to evaluate their ability to identify driver genes that mediate androgen-independent prostate cancer (PC) progression. gammaRV and LV vectors have distinct integration profiles and genotoxicity that makes them potentially complementary vectors to identify cancer driver genes. Transduced LNCaP PC cells were injected into the prostate gland of immunodeficient mice. The mice were castrated to model androgen deprivation therapy in PC patients. Metastatic tumors that developed under androgen-independent conditions were analyzed using a high-throughput modified genomic sequencing PCR (MGS-PCR) approach. TAOK3, MBNL2, SERBP1, SLC7A1, SLC25A24, MAN1A2, PLEKHA2, SPTAN1, and ABCC1 candidate PC genes were identified. TAOK3 and ABCC1 were validated by showing that their expression increased the clonogenic potential of PC cells. TAOK3 and ABCC1 expression predicted disease recurrence in PC patients after androgen deprivation therapy. The work presented here shows that replication-incompetent retroviral vector insertional mutagenesis screens are a powerful approach to identify cancer driver genes in diverse cancer types.
This is clearly illustrated in our marker paper that displays insights into cancer through the synthesis of findings from TCGA PanCancer Atlas [Ding et al., 2018].
Author: Matthew Hawkins Bailey
Category: Electronic dissertations
The implementation of next-generation genomic sequencing has exploded over the past dozen years. Large consortia, such as The Cancer Genome Atlas (TCGA); the International Cancer Genetics Consortium (ICGC); and the Pediatric Cancer Genome Projects (PCGP), made great strides in democratizing big data for the scientific community. These data sets provide a rich resource to build tools for somatic variant discovery and exploratory analysis. Public repositories hold the answer to many novel biological and clinical revelations i.e., the discovery of complex indels, splice creating mutations, alternative super enhancer binding sites, machine learning models to predict mutation impact, and cancer subtype classification and identification. At the end of 2014, seven additional cancer types and 11 different pediatric tumor cohorts were publicly available when compared to the Ding lab's first PanCancer effort [Kandoth et al., 2013]. Motivated by the possibility of novel cancer driver gene discovery, we launched a new PanCan2 effort. We assembled sequence data from 8,018 cancer cases representing a combined 30 pediatric and adult cancer types from 8 organ systems. Analysis of the resulting data corpus identified 270 cancer-associated genes, 107 of which have not been previously reported in Pan- Cancer studies. Pediatric-enriched mutant genes (e.g., IL7R, PAX5, and H3F3A) were found in tumors from the hematopoietic and central nervous systems, consistent with their roles in early development. Distinctive mutational architectures were identified for each of the 8 organ sys- tems, reflecting the tissue of origin and likely exposure to similar environmental factors. TP53 mutant vs. TP53 wild-type tumors had largely distinct patterns of co-occurring mutations, suggesting a pivotal role of TP53 in shaping the mutational network. Cis-activation of receptor tyrosine kinases at mutational, expression, and phosphorylation levels, as well as trans-activation of hormone-related transcription factors, were identified through the integration of multiple data types. In the end, this effort did not result in a publication because we did not perform uniform variant calling across all samples and relied primarily on publicly available data sets. Armed with the knowledge that reviewers would require a complete reboot of the TCGA variant calls before another PanCancer paper would be considered, Dr. Li Ding thoughtfully sub- mitted a proposal to acquire funding necessary for the recalling of all TCGA exome sequencing bams using many different calls. This effort is referred to as the Multi-center Mutation Calling in Multiple Cancers (MC3). TCGA cancer genomics data set includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files required re- analysis. A comprehensive encyclopedia of somatic mutation calls for the TCGA data was created to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The data set created by this analysis includes 3.5 million somatic variants and forms the basis for PanCancer Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives large genomics projects. Having a complete overhaul of all somatic mutations available in the TCGA, we sought to use these data for a complete TCGA PanCancer analysis. However, instead of relying wholly on in-house algorithms we also performed PanSoftware analysis spanning 26 computational tools from multiple institutions to catalog driver genes and mutations. In total, 9,423 tumor exomes (comprising all 33 of TCGA projects) we identified 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors. One of many new waves in the genomics era will be the cohesive integration of multi-omics data. At present, our current understanding of molecular processes in oncogenesis is governed by known-knowns. This is clearly illustrated in our marker paper that displays insights into cancer through the synthesis of findings from TCGA PanCancer Atlas [Ding et al., 2018]. In closing the final chapters of TCGA, we addressed three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the micro-environment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing. In quick succession, both The Cancer Genome Atlas and the International Cancer Genetics Consortium provided the cancer research community with consensus somatic mutation calls for captured exome sequencing created by the Multi-center Mutations Calling in Multiple Cancers effort (MC3) and whole genome sequence provided by the PanCancer (PCAWG). 746 of the samples underwent sequencing by MC3 and PCAWG. We found that that ~80% of possible mutations in covered exomic regions matched using the two technologies. Using a statistical model we estimated that 15-30% of the unique mutations are attributable to noise caused by variant allele fraction and clonal heterogeneity. We also observed that ~30% of the mutations uniquely identified by PCAWG could be traced to mutations made by a single caller by MC3 and are not reported in the publicly available MC3 data set. Due to the numerous modes of comparison, we built MAFit an online tool to facilitate engagement with these data. Finally, we highlight the advantages of using whole genome technologies in regions of high and low GC content and perform significantly mutated gene analysis, thus, increasing the targeted/captured exomic space by ~50% to discover additional genes that could only be found using whole genome sequencing approach.
International cancer sequencing projects have generated comprehensive catalogs of alterations found in tumor genomes, as well as germline variant data for thousands of individuals.
Author: Hana Sušak
International cancer sequencing projects have generated comprehensive catalogs of alterations found in tumor genomes, as well as germline variant data for thousands of individuals. In this thesis, we describe two statistical methods exploiting these rich datasets in order to better understand tumor initiation, tumor progression and the contribution of genetic variants to the lifetime risk of developing cancer. The first method, a Bayesian inference model named cDriver, utilizes multiple signatures of positive selection acting on tumor genomes to predict cancer driver genes. Cancer cell fraction is introduced as a novel signature of positive selection on a cellular level, based on the hypothesis that cells obtaining additional advantageous driver mutations will undergo rapid proliferation and clonal expansion. We benchmarked cDriver against state of the art driver prediction methods on three cancer datasets demonstrating equal or better performance than the best competing tool. The second method, termed REWAS is a comprehensive framework for rare-variant association studies (RVAS) aiming at improving identification of cancer predisposition genes. Nonetheless, REWAS is readily applicable to any case-control study of complex diseases. Besides integrating well-established RVAS methods, we developed a novel Bayesian inference RVAS method (BATI) based on Integrated Nested Laplace Approximation (INLA). We demonstrate that BATI outperforms other methods on realistic simulated datasets, especially when meaningful biological context (e.g. functional impact of variants) is available or when risk variants in sum explain low phenotypic variance. Both methods developed during my thesis have the potential to facilitate personalized medicine and oncology through identification of novel therapeutic targets and identification of genetic predisposition facilitating prevention and early diagnosis of cancer.
Background: Somatic mutations accumulate in human cells throughout life.
Author: Yahya Bokhari
Category: Cancer cells
Background: Somatic mutations accumulate in human cells throughout life. Some may have no adverse consequences, but some of them may lead to cancer. A cancer genome is typically unstable, and thus more mutations can accumulate in the DNA of cancer cells. An ongoing problem is to figure out which mutations are drivers - play a role in oncogenesis, and which are passengers - do not play a role. One way of addressing this question is through inspection of somatic mutations in DNA of cancer samples from a cohort of patients and detection of patterns that differentiate driver from passenger mutations. Results We propose QuaDMutEx an QuadMutNetEx, a method that incorporates three novel elements: a new gene set penalty that includes non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty, through a combination of heuristic Monte Carlo optimization and exact binary quadratic programming. QuaDMutNetEx is our proposed method that combines protein-protein interaction networks to the method elements of QuaDMutEx. In particular, QuaDMutEx incorporates three novel elements: a non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty. In the new method, we incorporated a new quadratic rewarding term that prefers gene solution set that is connected with respect to protein-protein interaction networks. Compared to existing methods, the proposed algorithm finds sets of putative driver genes that show higher coverage and lower excess coverage in eight sets of cancer samples coming from brain, ovarian, lung, and breast tumors. Conclusions Superior ability to improve on both coverage and excess coverage on different types of cancer shows that QuaDMutEx and QuaDMutNetEx are tools that should be part of a state-of-the-art toolbox in the driver gene discovery pipeline. It can detect genes harboring rare driver mutations that may be missed by existing methods.
FMCR were shown to target cancer related genes and novel CRC driver genes were proposed. Finally, preliminary studies supported the tumour suppressor role of NFKBIA, one of the novel candidate driver genes affected by a deletion FMCR.
Author: George Burghel
Colorectal cancer (CRC) is the 3rd most common cancer and the 4th highest cause of cancer deaths in the world. Genetic factors play a major role in its predisposition, initiation and development. Inherited variants in the CASP8 gene, a key regulator of apoptosis, have a potential yet controversial association with CRC risk. Sporadic CRC develop through different molecular pathways of genomic instabilities and mutations in key cancer driver genes. Classification of sporadic CRC into these molecular pathways has potential implications for diagnosis and treatment and it is an integral part of CRC studies, however, current published research suffers from lack of standardisation. Chromosomal Instability (CIN) drives CRC by affecting cancer driver genes, many of which are still to be identified. This project aimed to: (a) further investigate the role of CASP8 inherited variants in CRC risk, (b) to molecularly classify sporadic CRC tumour DNA samples using standard techniques and definitions, and (c) to identify novel CRC driver genes affected by CIN. A CASP8 promoter in/del variant was genotyped in 1193 CRC cases and 1388 matching controls. The coding region of the CASP8 gene was sequenced in 94 CRC cases to identify potential novel variants and a copy number variant was also investigated. A cohort of 53 paired CRC tumour and normal DNA samples were molecularly classified using standard techniques and definitions. Common aberration analysis was performed on high resolution array comparative genome hybridisation data from 45 chromosomally unstable CRC cases to identify focal minimal common regions (FMCR). CASP8 inherited variants did not significantly affect CRC risk in the investigated cohort. CRC molecular classification confirmed the heterogeneity of sporadic CRC vii and a novel molecular subtype was proposed. FMCR were shown to target cancer related genes and novel CRC driver genes were proposed. Finally, preliminary studies supported the tumour suppressor role of NFKBIA, one of the novel candidate driver genes affected by a deletion FMCR.