The identification of cis-regulatory elements: a review from a machine learning perspective

From National Research Council Canada

DOI	Resolve DOI: https://doi.org/10.1016/j.biosystems.2015.10.002
Author	Search for: Li, Yifeng¹; Search for: Chen, Chih-Yu; Search for: Kaye, Alice M.; Search for: Wasserman, Wyeth W.
Affiliation	National Research Council of Canada. Information and Communication Technologies
Format	Text, Article
Subject	cis-regulatory elements; gene regulation; enhancers; promoters; machine learning; deep learning; ensemble learning; data integration
Abstract	The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field.
Publication date	2015-10-21
Publisher	Elsevier
In	Biosystems 138: 6–17.
Language	English
Peer reviewed	Yes
NPARC number	23001686
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	ea1c02bf-72b9-41fe-9899-9bb12bc3d681
Record created	2017-03-17
Record modified	2020-04-22

Date modified:: 2024-07-27