FAQ

Frequently Asked Questions

Please find a list of the most frequently asked questions below. If you cannot find the answer to your question here or want to know more about our products, please contact support@lexogen.com.

The SIRV isoform design is based on 7 human model genes. The annotated transcripts of these genes were extended by additional isoforms and variants to comprehensively cover alternative splicing, start- and end-site variations, antisense and overlapping transcripts. The exonic sequences of the resulting 69 transcript structures (6-18 per gene) were derived from database-derived genomes, which were the altered to completely lose alignment identity, and blasted against the NCBI database on the nucleotide and the protein level to ensure they are non-identical. Intronic sequences were generated randomly.

The SIRV sequences conform to the canonical exon-intron junction rule: 96.9 % of all SIRV junctions are GT-AG, with the less frequent variants being present at 1.7% (GC-AG) and 0.6% (AT-AC). Two non-canonical splice sites were included at 0.4% each (CT-AG and CT-AC).

The SIRVs as well as the ERCCs are in vitro transcription products, and therefore all RNAs start with a 5’ ppp-G. Hence, cap-specific cDNA preparation methods are not feasible. The 3’ end of each SIRV RNA is polyadenylated (SIRV isoforms and long SIRVs with 30 adenosines, ERCCs with 24 ± 1.05 nt adenosines), enabling oligo(dT) based selection and priming. The SIRVs User Guide contains a graphic representation of the SIRV design.

You can download the sequences – pdf  SIRV sequence design overview (XLSX)

In short
The SIRVs isoforms and long SIRVs are produced by T7 transcription from synthetic genes, and a series of tailored methods was applied to purify full-length SIRV RNAs with a minimal amount of side products.

In detail
Synthetic gene constructs were obtained that comprised 5’ to 3’ a T7 RNA polymerase promoter whose 3’ G is the first nucleotide of the actual SIRV sequence, which is seamlessly followed by a (A30)-tail. These gene cassettes are cloned into a vector, colony-amplified and singularized. All SIRV sequence in the purified plasmids are verified by Sanger-sequencing to identify the correct clones. Linearized, silica-purified plasmids then serve as templates in in vitro transcription reactions using T7 transcription kits The DNase-treated, phenol-extracted and silica-purified in vitro transcription products are assessed for concentration and purity by spectrophotometry and for integrity by capillary electrophoresis.

In the context of variant verification RNA integrity is a very important measure. Fragments arising from incomplete transcription might impose errors on the correct determination of variants which share those sequences and thereby also affect the overall gene coverage. The integrity of the transcription products is very heterogeneous as expected, given the broad sequence variation and the length of the SIRV isoform transcripts (average 1.1 kb with 14 RNAs between 2.0 and 2.5 kb). Therefore, a set of tailored purification procedures is applied to obtain full-length RNAs with a minimal amount of side products despite the broad sequence and length variation of the SIRVs.. After purification, the 69 SIRV RNAs are assessed for the ratios of pre-peak fraction, main-peak fraction (corresponding to RNAs of correct length), and post-peak fraction. Finally, the SIRVs are quantified by absorbance spectroscopy to adjust all stock solutions to a base concentration of close to, but above 50 ng/µl, and to monitor RNA purity by absorbance ratio of 260/280 nm, and 260/230 nm.

SIRV-Set 1 contains 3 Isoform mixes, and while the concentration of most individual SIRV isoforms is different in Mixes E0, E1, and E2, they each have the a total concentration of 25.3 ng/µa  and a total molarity of about 70 fmol/µl (69.0, 68.5 and 70.8 fmol/µl. In Mix E0, all SIRVs are present at the same molarity.

SIRV-Set 2 contains the Iso Mix E0 with 69 SIRV isoforms at 1.0 fmol each. The total amount of RNA is 69.0 fmol corresponding to 25.2 ng.

SIRV-Set 3 combines the equimolar mix of the 69 SIRV isoforms (Iso Mix E0) at 0.6 fmol each and 92 single-isoform transcripts (ERCCs) ranging from 0.014 amol to 15.0 fmol. The total amount of RNA is 93.2 fmol or 30.3 ng per tube.

SIRV-Set 4 combines the equimolar mix of the 69 SIRV isoforms (Iso Mix E0) at 0.6 fmol each, the 92 single-isoform transcripts (ERCCs) ranging from 0.014 amol to 15.0 fmol, and the equimolar mix of 15 long SIRVs at 0.6 fmol each. The total amount of RNA is 102.2 fmol or 53.5 ng per tube.

Table 1 ǀ Compositions of SIRV-Set 2, SIRV-Set 3, and SIRV-Set 4

# transcripts ng fmol fmol / transcript
Set 2 Iso Mix E0 69 100 % 25.2 100 % 69.0 100 % 1.0 1.4 %
Set 3 Iso Mix E0 69 43 % 15.1 50 % 41.4 44 % 0.6 0.6 %
ERCC 92 57 % 15.2 50 % 51.8 56 % 7 x 10-6 to 15 8 x 10-6 to 16 %
Iso Mix E0 / ERCC 161 100 % 30.3 100 % 93.2 100 %
Set 4 Iso Mix E0 69 39 % 15.1 28 % 41.4 41 % 0.6 0.6 %
ERCC 92 52 % 15.2 28 % 51.8 51 % 7 x 10-6 to 15 7 x 10-6 to 15 %
long SIRVs 15 9 % 23.2 43 % 9.0 9 %  0.6 0.6 %
Iso Mix E0 / ERCC / long SIRVs 176 100 % 53.5 100 % 102.2 100 %

In SIRV-Set 1, each mix of E0, E1, and E2 contains all 69 SIRV isoforms in different concentration ratios. The concentration ratios in E0 are identical (1:1), E1 covers one order of magnitude (up to 1:8), and E2 extends over more than two orders of magnitude (up to 1:128). This allows to either assess all SIRV transcripts spiked-in at the same level or to mimic the transcript variant concentration distribution encountered in real samples. Moreover, the comparison of 3 samples, each spiked with either E0, E1, and E2 allows for a detailed assessment of differential gene expression on the transcript level. The inter-mix concentration ratios range from 1/64- to 16-fold. The final concentration of the SIRV isoform mixes contain identical mass concentrations of 25.2, 25.2, and 25.4 ng/µl, and molar concentration of 69.0, 68.5 and 70.8 fmol/µl for mixes E0, E1 and E2 respectively.

Each SIRV transcript enters the final mixes via one of eight PreMixes, allowing for the unique identification of each SIRV by capillary electrophoresis while entering the mixing scheme (Paul et al., 2016). Then, four pairs of PreMixes are combined in equal ratios to SubMixes, before those SubMixes are combined in defined ratios. The combination of accurate volumes of the stock solutions in sufficiently large batch sizes and a transparent monitoring of the sequential processing steps warrant the most accurate preparation of the mixes above process inherent lower boundaries. Pipetting errors vary depending on transfer volumes and range from ±4 % for 2 µl to ±0.8 % for >100 µl transfers. The precision was experimentally determined with blank solutions. Starting with the stock concentration measurement (NanoDrop accuracy ±2 ng/µl for 50 ng/µl), and accounting for the entire mixing pathway, the accumulative concentration error is expected to range between ±8 % and ±4.7 %. Therefore, in the data evaluation one has to account for the experimental fuzziness by allowing for lower accuracy thresholds of ±8 % on the linear scale, or ±0.11 on the log2-fold scale, respectively. The SIRV concentration ratios between two mixes are more precise because only one final pipetting step defines the concentration differences and synchronizes all SIRVs, which belong to the same SubMix. Here, a maximal error of ±4 % (or ±0.057 on the log2-fold scale) can be expected between SubMixes, while all SIRVs of the same SubMix must propagate coherently into the final mixes. Bioanalyzer traces are used to monitor the relative propagation of the SIRVs, PreMixes and SubMixes during the mixing. In addition, the accurate pipetting of the 8 PreMixes is controlled by checksums of Nanodrop concentration measurements by weighing on an analytical balance.

The spike-in ratios have to be chosen in concordance with the desired final SIRV content. For RNA to be poly(A) selected (starting from 100 ng of Human Brain Reference Total RNA, HBRR) we recommend to use 2.4 µl of a SIRV-Set 1 or SIRV-Set 2 1:1000 dilution, and for RNA to be rRNA-depleted we recommend to use 3.6 µl of a 1:1000 dilution respectively, which results in both cases in 2.83 % SIRVs in the final mixture. For the other SIRV-Sets, the volume has to be adjusted according to their concentrations, which also applies tosamples with different input amounts, mRNA content, and depletion or enrichment methods. For samples with unknown mRNA content we recommend to use the volume given above and then – by comparing the share of reads aligning to the reference genome and the “SIRVome”, respectively – derive the mRNA content. For further details, please consult the User Guide.

The SIRV mixes can be used with crude cell extract, purified total RNA, rRNA-depleted RNA or poly(A) enriched RNA. The spiked-in RNA can be used for all common RNA-Seq library preparations to be analyzed on any platform (Illumina®, Ion Torrent®, Pacific Bioscience™, Oxford Nanopore™, and others).

The SIRV isoforms range from 191 to 2528 nt with the shortest mRNAs being antisense mono-exonic transcripts. The ERCCs are 281 to 2036 nt long, whereas the long SIRVs cover the length range of 4 kb to 12 kb.

SIRVs-Length-complexity

Figure 1 | Length complexity of SIRV-Sets. Transcripts of the ERCC module range up to 2 kb in length, the ones of the SIRV isoform module up to 2.5 kb. The long SIRV module contains three transcripts in each of the length categories 4 kb, 6 kb, 8 kb, 10 kb and 12kb.

The ERCC Spike-In Controls (ERCCs, Ambion, Thermo Fisher) allow to asses dynamic range, dose response, lower limit of detection, and fold-change response of RNA sequencing pipelines within the limitation of their mono-exonic, single-isoform RNA sequences. Because the ERCCs contain no transcript variants, one of the main challenges of sequencing complex transcriptomes – to identify and distinguish splice variants – cannot be evaluated using the ERCCs.

In contrast, the SIRV (Spike-In Transcript Variants) isoforms can be used to validate isoform-specific RNA sequencing workflows and to compare experiments by extrapolating the results from the well-defined isoform ground truth of a small fraction of control reads to the sample reads. Within the context of variant detection assessment of dynamic range, dose response, lower limit of detection, and fold-change response is possible as well.

The long SIRVs addresses the need for spike-in transcripts exceeding 2.5 kb. Their range (4 kb to 12 kb) is particularly applicable to validate long-read platforms and RNA-Seq setups.

The number of reactions depends on the spike-in amount required.

From the SIRV-Set 1 mixes you can draw 4 times 1 µl. This 1 µl should then be stepwise diluted to 1:1000, of which for a typical experiment using 100 ng total RNA input (for example, spiking of Human Brain Reference RNA (HBRR)) 3.6 µl are required for an rRNA depletion experiment, respectively 2.4 µl for an mRNA-Seq experiment. Hence from the 1 µl original SIRV Mix, around 300 – 400 samples can be spiked depending on the RNA input.
In another example: If 10 ng total RNA input are to be spiked, then 1 µl can be diluted 1:10000. See also Table 3, p.10 of the User Guide.
In any case we do not recommend to keep the dilution for very long as the diluted RNA solutions are increasingly unstable.

From SIRV-Set 2, 3, and 4 mixes you can draw 9 times 1 µl, and similar dilutions can be made. For details, please see the SIRV-Set 2, SIRV-Set 3, and SIRV-Set 4 User Guide, which is available in the download section.

The 15 long SIRV transcript each have a unique sequence to clearly separate the length aspect from the isoform complexity aspect.