National Research Council of Canada. Information and Communication Technologies
DNA sequence; GC-content; hybridization; melting temperature; oligonucleotides; Pearson correlation
In many bioinformatics applications DNA duplex hybridization is traditionally estimated using GC-content and melting temperature calculations based on the sequence base composition. Here we show that GC-content is a far from perfect predictor of DNA strand hybridization strength compared to experimentally-determined melting temperatures. We built a manually curated set of 373 experimental data points collected from 21 publications, each point representing a DNA strand with length between 4 and 35 nucleotides and its corresponding experimentally determined melting temperature measured under specific sequence and salt concentrations. For each data point we calculated the corresponding GC-content and we separated the set into 12 subsets to minimize the variability of experimental conditions. Based on calculated Pearson product-moment correlation coefficients we conclude that GC-content only seldom correlates well with experimentally determined melting temperatures and thus it is not a strictly necessary constraint when used to control the uniformity of DNA strands.