Data Analysis

General Workflow

Reads from a spike-in RNA-Seq experiment are processed alongside the other reads from one NGS library comprising steps like quality control, de-multiplexing, and – depending on the library preparation protocol – trimming. These reads are then mapped to a combination of, if available, the genomic reference and the SIRVome (the artificial “genome” detailing the spike-in sequences and annotations). Only a small percentage of the reads are expected to map to the SIRVome. Only these shares need to be analyzed to derive a representative information about the majority of reads which mapped to the endogenous sample RNA.

SIRV reads can be analyzed by standard and custom bioinformatic tools which compare at different levels from raw read mapping up to transcript identification and quantification the results from measured with the expected read distribution.

For the data evaluation of ERCC reads in RNA-Seq experiments, the NIST provides a software package called the “ERCC dashboard” (Munro et al., 2014), and further evaluations are described in publications of the SEQC/MAQC-III Consortium (2014).

Data derived from the long SIRVs can be evaluated with standard bioinformatic tools to analyze e.g. input-output correlation, full-length coverage, and 5’ / 3’ mapping ratios.