Dear Customers,
SENSE mRNA-Seq Library Prep Kits (Cat. No. 001.24 and 001.96) are being discontinued. We recommend instead to use our CORALL mRNA-Seq Library Prep Kit.
Sequence of indices (barcodes) used for multiplexing
SENSE libraries can be easily multiplexed. 9,216 external barcode combinations can be introduced during the PCR amplification step of the library prep for Illumina.
i7 indices allowing up to 96 samples to be sequenced per lane on an Illumina flow cell are included in the kit (7001-7096). For multiplexing of up to 9,216 samples 96 additional i5 indices can be introduced with the (Cat. No. 047). External indices require an additional index-specific sequencing reaction and are 6 nt long.
SENSE mRNA Data Analysis
The following analysis pipeline demonstrates a principal workflow.
The used tools are freely available and to a great extend part of the Galaxy platform1. All proprietary Software, for instance CLC workbench, should perform similarly.
The script examples uses mainly the bash internal for-loop; in Galaxy for instance, the single files have to be processed manually one by one.
Following software packages (and their dependencies) should be installed when following the given protocol:
- Samtools
- FastQC
- FastX toolkit
- tophat2
Create a working directory and the subdirs fastq, qualitycheck and download the de-multiplexed fastq files into the fastq dir.
Let the fastq files be:
- runID_control1_S1_L001_R1_001.fastq
- runID_control1_S1_L001_R2_001.fastq
- runID_control2_S2_L001_R1_001.fastq
- runID_control2_S2_L001_R2_001.fastq
- runID_treatment1_S3_L001_R1_001.fastq
- runID_treatment1_S3_L001_R2_001.fastq
- runID_treatment2_S4_L001_R1_001.fastq
- runID_treatment2_S4_L001_R2_001.fastq
Let your bowtie2 index files be in /data/bowtie/human, your reference annotation in /data/reference_annotation/human
#### use fastqc to check your data; please adjust the number of threads to your machine
fastqc --outdir qualitycheck --format fastq --threads 8 fastq/runID*
#### check the result with a browser
######## preparation for mapping
###go to fastq directory cd fastq ### remove the starter sequence from the R1 read
for sample in runID*R1_001.fastq; do fastx_trimmer -f 10 -Q33 -i ${sample} -o ${sample}_trimmed ; done
### remove the stopper sequence from the R2 read
for sample in runID*R2_001.fastq; do fastx_trimmer -f 7 -Q33 -i ${sample} -o ${sample}_trimmed ; done
### remove low quality tails
for sample in *trimmed ;do fastq_quality_trimmer -t 10 -l 20 -Q33 -v -i ${sample} -o ${sample}_clean; done
### create symbolic links for better handling
### for-loops can be used according to name and structure
ln -s runID_control1_S1_L001_R1_001.fastq_trimmed_clean control1_R1.fastq
ln -s runID_control1_S1_L001_R2_001.fastq_trimmed_clean control1_R2.fastq
...
ln -s runID_treatment2_S4_L001_R2_001.fastq_trimmed_clean treatment2_R2.fastq
######### mapping
###############################
# create for each sample a folder in tophat_out/
cd ..
mkdir tophat_out
mkdir tophat_out/control1
mkdir tophat_out/control2
...
### run tophat2
### denote, sometimes it is useful to run all control and treatment samples pooled and seperate the alignments afterwards.
### This will result in better junction detection.
tophat -o tophat_out/control1 -p 8 /data/bowtie/human/h_sapiens fastq/control1_R1.fastq fastq/control1_R2.fastq
tophat -o tophat_out/control2 -p 8 /data/bowtie/human/h_sapiens fastq/control2_R1.fastq fastq/control2_R2.fastq
tophat -o tophat_out/treatment1 -p 8 /data/bowtie/human/h_sapiens fastq/treatment1_R1.fastq fastq/treatment1_R2.fastq
tophat -o tophat_out/treatment2 -p 8 /data/bowtie/human/h_sapiens fastq/treatment2_R1.fastq fastq/treatment2_R2.fastq
#Indexed bam files are necassary for many visualization and downstream analysis tools
cd tophat_out
for bamfile in */accepted_hits.bam ; do samtools index ${bamfile}; done
#From this point any further analysis can be applied.
1Goecks, J, Nekrutenko, A, Taylor, J and The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010 Aug 25;11(8):R86.