Description

Mix2 RNA-Seq Data Analysis Software

A software tool for the accurate estimation of RNA concentration from RNA-Seq data.

Model

Fragment bias in RNA-Seq poses a serious challenge to the accurate quantification of gene isoforms. Mixmakes no assumptions about coverage bias but fits for each gene isoform a mixture model to the data (Fig. 1). Mix2 can therefore, for instance, accurately represent the 5’ bias, as shown in Fig. 1 (a and b), whereas Cufflinks is restricted to the uniform distribution (Fig. 1c).

mix2figure

 

Figure 1 | Exemplary representation for positional fragment bias over a 2000 bps transcript modeled with a mixture of 8 normal distributions. (a) the green curve shows the combined probability density function over the whole transcript, while the blue curves show the individual mixture distributions. (b) and (c) panels display fragment distributions in a locus with two transcripts sharing one junction, as modeled by Mix2 or Cufflinks. Long and short transcripts start at 5000 and 5500 bp from the beginning of the locus, and are 2000 and 1000 bp long, respectively. The junction spans the 6000 – 6499 bp region.

The Mix2 software yields accurate isoform quantification from RNA-Seq data

Implementation and run-time performance

The Mix2 software runs as a 64-bit Linux command line tool. For an up-to-date list of supported distributions please refer to the User Guide of the Mix2 software.

Mix2 Cufflinks w/o bias correction Cufflinks with bias correction 
Dataset Min GB Min xRT GB xMEM Min xRT GB xMEM
Avg (UHR) 7 1.26 34 4.9 0.99 0.79 542 77.4 1.32 1.05
Avg (HBR) 5 1.02 32 6.4 0.90 0.88 536 107.2 1.22 1.20

 

Table 1 | Memory usage and average run-time statistics on the MAQC UHR and HBR datasets. Min stands for run-time in minutes, GB for memory usage in gigabytes. xRT and xMEM are the factors by which run-time and memory usage increases, respectively, in comparison to Mix2.

Mix Workflow

Featured Publications

List of the most recent Mix2 publications.

Examples

Mix2 was tested on the publicly available MicroArray Quality Control (MAQC) [1] and Association of Biomolecular Resource Facilities (ABRF) [2] datasets, containing RNA-Seq data from multiple sequencing facilities and library preparations which started with differently degraded RNA.

The higher accuracy of the concentration estimates of Mix2 leads to better correlation between qPCR and FPKM fold-changes and consequently to higher accuracy in the detection of differential expression (Fig. 2).

mix2figure02

Figure 2 | Correlation between qPCR and FPKM fold changes between UHR and HBR RNA for Mix2 vs Cufflinks, and the ROC curve for a classification experiment based on FPKM values of UHR and HBR RNA lanes. Since the FPKM and qPCR fold changes should be identical, the range of FPKM fold changes was restricted to the range of qPCR values, as shown in (a) and (b), and thus to a range between 10-4 and 103. (b) Cufflinks produces a large number of transcripts whose FPKM fold change lies considerably above or below the majority, as can be seen by the long straight clusters at FPKM fold changes of 10-4 and 103. The Mix2 model, on the other hand, greatly improves the correlation between qPCR and FPKM fold changes for the UHR and HBR RNA samples, and as shown in the classification experiment (c) leads to a substantially higher accuracy in the detection of differential expression. The dotted line in (c) indicates a false positive rate of 0.1.

FAQ

Frequently Asked Questions

Please find a list of the most frequently asked questions below. If you cannot find the answer to your question here or want to know more about our products, please contact bioinfo@lexogen.com.

Open All

Compared to some other quantification methods, Mix² doesn’t make any assumptions about coverage bias, but it fits a mixture model to the data for each gene isoform; therefore, yields more accurate concentration estimates.
Yes, the Mix² software supports multi cores.
Yes, the Mix² software can be run on a computer cluster. Please contact us at bioinfo@lexogen.com for more information.
The command line version is restricted to Linux distributions; however, contact us at bioinfo@lexogen.com for other solutions compatible with Windows OS.
Please contact us at bioinfo@lexogen.com if you would like to run the Mix² software on MAC OS.
The Mix² software expects a BAM file which contains the mapping reads and a GTF file which contains the annotations.
The Mix² software produces the row count per transcript / gene as well as the FPKM values.
No, the Mix² software doesn’t do any differential expression calls; however, the output of the Mix² software can easily be used for differential expression analysis using existing tools like DESeq and edgeR.
Yes, Mix² accepts both types of reads. The alignment file can include both single-end and paired-end reads.
Those are the genes for which there are no reads which are compatible at least with one reference transcript.
There are no restrictions for the aligners that can be used.
The compatible fragments locus (fragment count) represents the number of the fragments for a gene loci. For the transcript fragment count, the specified gene count must be multiplied with the corresponding abundance.

Downloads

pdf  User Guide – update 28.12.2016 (Added a section for the investigation of the positional bias.)
pdf  Application Note

Buy from the webstore