SIRV Sets

SIRV Set Selection Guide

The Spike-In RNA Variants of the isoform and ERCC modules (see Modular Design) are realized as defined mixes, and for some modules, different mixes are available.

Mixes, and combinations thereof, are available in the form of SIRV sets. Table 1 presents a SIRV set selection guide to help you finding the right set of modules and mixes for your application.

Table 1 ǀ SIRV set selection guide. SIRV-Set 1 (Cat. No 025.03) contains the isoform mixes E0, E1 and E2 of the isoform module, SIRV-Set 2 (Cat. No 050.01 and 050.03) provides the isoform Mix E0 only, whereas SIRV-Set 3 (Cat. No 051.01 and 051.03) has the SIRV Isoform Mix E0 in a mixture with the ERCCs.   yes_icon: applicable,   no : not applicable, and partly applicable (or parts of the sets applicable).

SIRV-Set 1 SIRV-Set 2 SIRV-Set 3
    Cat. No 025.03 050.0N 051.0N
Module(s) Isoforms Isoform Mixes E0, E1, E2 Isoform Mix E0 Isoform Mix E0
ERCC no no ERCC Mix 1
Property Isoform detection, and quantification yes_icon yes_icon yes_icon
Dynamic range partially no yes_icon
Applications Pipeline Validation yes_icon partially partially
Sample Control no yes_icon yes_icon

For more information please consult the respective User Guides in the Downloads section.

SIRV-Set 1: Isoform Mixes E0, E1 and E2

The isoform module is available in SIRV-Set 1 (Cat. No. 025) in 3 SIRV mixes, termed E0, E1 and E2, with each mix containing all 69 SIRV isoform transcripts (from 7 SIRV genes) but in different concentration ratios. E0 is ideal for assessing the detection capabilities of a given RNA-Seq workflow, since all 69 transcripts are present in equimolar concentrations, and their detection should be unbiased and not a function of read depth or similar. E1 already contains a moderate concentration distribution of transcript variants of a given gene, and E2 represents the natural situation, whereby a dominant, abundant transcript variant is transcribed from a gene together with (up to 17) other variants present at lower expression levels (down to <1%). The latter situation is already quite challenging for correct transcript determination based on short read assembly but also tests efficiently the linearity and sensitivity of long-read sequencing platforms and protocols that cannot rely on millions of reads (Figure 1).

SIRVs_Figure2

Figure 1 ǀ Distribution of the 4 SubMixes in the 3 isoform mixes and the resulting intra- and inter-mix ratios. Left, the intra-mix concentration ratios provide three different concentration settings to evaluate accuracy in relative concentration measurements. Right, the present fold-changes allow for 3 possible inter-mix comparisons to evaluate differential gene expression measurements. SubMixes 1-4 are indicated by their respective colors, and transcript isoforms of each of the 7 SIRV genes are distributed across all SubMixes.

Remarkably, by comparing the SIRV transcript quantifications in different mixes (E0 vs E1, E0 vs E2, or E1 vs E2) differential gene expression can be evaluated on the transcript isoform and variant level. It can be assessed, if biases in transcript detection are similar for both mixes tested and therefore having a significant effect on transcript quantification in a given mix, while having no or only a limited effect on differential transcript/gene expression quantification. Combining the individual SIRV transcript expression quantifications yields a value for SIRV gene expression, and the accuracy of this evaluation might differ significantly from the deviations seen on the transcript level.

SIRV-Set 2: Isoform Mix E0

The isoform Mix E0 is available on its own as SIRV-Set 2 (Cat. No 050) with all 69 isoforms being present at equimolar concentrations. Its applications include RNA-Seq experiments that need to be validated for the detection of a complex mixture of isoforms without applying a high read depth to cover transcripts at different concentrations. Among these are sequencing runs on long-read NGS platforms as provided by Oxford Nanopore Technologies and Pacific Biosciences. These can produce full-length reads to identify the isoforms faithfully. However, unlike short-read platforms they do not provide the millions of reads necessary to detect and quantify isoforms across a larger dynamic range, in particular if these spike-ins only constitute a very small fraction of the total RNA.

Deviations from the expected equimolar outcome can be quantified, which allows for evaluation of the performance of isoform-centered workflows. On the gene level, quantifications of the individual SIRV isoform can be summed up for each SIRV gene, which permits the validation of pipelines working with data stemming from individual isoforms but focused on gene expression calculations only.

SIRV-Set 2 is very suitable for the calculation of concordance, since the experiments’ fingerprints depends solely on their dealing with the SIRV isoform’s complexity but not on input concentration differences between these isoform transcripts.

Reads from SIRV-Set 2 can be downsampled to emulate data representing the situations in lower concentration ranges. A repeated mapping and assignment provides adequate measures for the RNA-Seq experiment ability to detect variants and measure its concentrations in a different band with. Using such iterative approach SIRV-Set 2 is capable to map the entire abundance spectrum.

SIRV-Set 3: Isoform Mix E0 & ERCCs

SIRV-Set 3 (Cat. No. 051) contains the isoform Mix E0 and the ERCC Mix 1 in equal shares. Both contribute equally to the final mass.

The mixture of 69 SIRV isoform transcripts and 92 non-overlapping ERCC RNAs addresses the need for complex spike-in RNA controls that cover both, a high level of isoform complexity and a large concentration range. Together, they enable an even more comprehensive quality assessment and monitoring across the whole RNA-Seq workflow to derive technical details and telling fingerprints for comparing individual samples, and experiments.

The single-isoform ERCC transcripts cover concentrations of 6 orders of magnitude and are complemented by the equimolar SIRV isoforms. Figure 2 illustrates this added dimension by showing the covered complexity plotted versus the input concentration.

SIRV-Set_Figure2

Figure 2 ǀ Concentrations and complexity of SIRV isoforms and single-isoform ERCC transcripts in SIRV-Set 3. Top; the isoform module with 69 transcripts in 7 genes contains all species at the same molarity (green bar). It covers the medium to high range of natural occurring isoform complexity. The single-isoform module with 92 ERCC transcripts covers concentrations of 6 orders of magnitude (grey bar), which is sufficient to represent the entire dynamic range of natural occurring transcripts. (a) The amount of attomoles refers to the typical amount that is spiked into 100 ng total RNA with the aim of attracting approx. 1% of the mRNA-Seq reads – subject to mRNA content and pipeline parameter for which, of course, the modules control for.

The accuracy (systematic error) and precision (random error) in quantifying single-isoform transcripts in RNA-Seq experiments is predominantly concentration- and read-depth dependent. While reads usually map uniquely to ERCC transcripts, the precision remains coverage dependent with reads following typically a Poisson distribution.

Isoform detection and quantification requires a sufficient coverage, and therefore the isoform spike-ins are added all at the same concentration and in the upper range of the ERCCs. Thereby, the issue of identifying a given isoform is not mingled up with differing concentrations. The overall higher input concentration allows for sufficient reads to be obtained in RNA-Seq experiments for isoform identification. Still, quantification of SIRV isoforms remains challenging on both short-read systems (due to assignment issues) and long-read platforms (e.g. because of per base error, low read numbers, and amplification bias). This is indicated in Figure 3 by a larger error margin for the isoforms than e.g. for ERCCs at an even lower concentration.

SIRVs_Figure3ab

Figure 3 ǀ Read coverages of SIRV isoforms and single-isoform ERCC transcripts depend on input concentration, library preparation efficiency, and read depth. The error in quantifying the 92 single-isoform transcripts is mostly coverage dependent and hence correlates inversely with the input concentrations. With 6-18 isoforms mapping to the 7 SIRV isoform genes, NGS read assignment and quantification is more challenging and therefore, results in a larger error margin (exemplary green box plot) than single-isoforms alone at this concentration. The blue areas represent the expected, ideal read coverage in the sense, the red areas the ideal read coverage in the antisense direction. The grey areas represent the exemplary read coverages from NGS experiments, which depends on the transcript concentrations and the number of isoforms. (a) The number of zettamoles refer to the total amount per SIRV-Set 3 vial. (b) reflects the FPKM bandwidth of the controls when those occupy around 1% of the reads in an mRNA-Seq experiment.