Mixture modelling to characterize diversity in dna regions
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The degrees of expression of the hundreds of genes in a eukaryotic cell influence its phenotype and functioning. The binding of proteins such as transcription factors to particular regulatory regions on the DNA play an important role in this process of regulation of gene expression. Mutations in these regulatory regions can affect the gene expression and can often lead to misregulation resulting in disorders and diseases. Therefore, to profile a wide range of regulation related biochemical activities, a variety of high-throughput experimental assays have been designed. They give a genome wide map of the regions having certain common characteristics for which they have been profiled. Some of the examples include STARR-seq which recognizes active enhancers, ATAC-seq, that detects accessible chromatin, ChIP-seq, which is used to identify TF binding sites. These assays report regions that are 200 to 1000 bases long, although the functional elements present in these regions are and#8776; 15 bases in length. Current computational algorithms look for a common characteristic within these reported regions to identify these short sequence signatures. Evidence, however, suggests that these regions reported by the experiments have considerable heterogeneity in them. In fact, while these methods can pick up on the stronger signals, they can easily miss out on the weaker or less frequent ones. In order to explicitly characterize the heterogeneity in these regions, I considered the question as a mixture modelling problem. Our first method, DIVERSITY, was developed to cluster regions from ChIP-seq experiment into groups while simultaneously learning sequence signatures specific to each group in a de novo manner. DIVERSITY provides novel insights into the different ways in which a protein can bind DNA, including co-operative binding with other proteins. We next looked at regions identified by exonuclease-based ChIP experiments.