Résumé | In RNA-seq data processing, short reads are usually aligned from one species against its own genome sequence; however, in plant-pathogen interaction systems, reads from both host and pathogen samples are blended together. In contrast with single-genome analyses, both pathogen and host reference genomes are involved in the alignment process. In such circumstances, the order in which the alignment is carried out, whether the host or pathogen is aligned first, or if both genomes are aligned simultaneously, influences the read counts of certain genes. This is a problem, especially at advanced infection stages. It is crucial to have an appropriate strategy for aligning the reads to their respective genomes, yet the existing strategies of either sequential or parallel alignment become problematic when mapping mixed reads to their corresponding reference genomes. The challenge lies in the determination of which reads belong to which species, especially when homology exists between the host and pathogen genomes. This chapter proposes a combo-genome alignment strategy, which was compared with existing alignment scenarios. Simulation results demonstrated that the degree of discrepancy in the results is correlated with phylogenetic distance of the two species in the mixture which was attributable to the extent of homology between the two genomes involved. This correlation was also found in the analysis using two real RNA-seq datasets of Fusariumchallenged wheat plants. Comparisons of the three RNA-seq processing strategies on three simulation datasets and two real Fusarium-infected wheat datasets showed that an alignment to a combo-genome, consisting of both host and pathogen genomes, improves mapping quality as compared to sequential alignment procedures. |
---|