Lexogen’s Data Solutions on the BlueBee® Genomics Platform

At Lexogen, we are excited to offer friendly data analysis pipelines on the BlueBee Genomics Platform for cost efficient and streamlined RNA sequencing. Lexogen and BlueBee have partnered together to deploy Lexogen-validated analytical pipelines which deliver production-ready, robust infrastructure that is regulatory compliant and easy to use for any researcher. On the BlueBee Genomics Platform, Lexogen offers pipelines for analyzing data from QuantSeq, CORALL, and SLAMseq experiments.

Lexogen’s pipelines on BlueBee, especially with our newly implemented web solutions, are easy to use and no prior bioinformatic experience is required. With just a few clicks, you can go from uploading your fastq files to downloading your analyzed data. Additionally, you will save both time and money by performing your own analyses! Most importantly, BlueBee offers a secure and compliant environment to support both research and clinical applications at any scale. For more information on our pipelines and new web solutions, you can watch our webinar here.

QuantSeq 3′ mRNA-Seq

The QuantSeq Kit provides a library preparation protocol designed to generate Illumina® compatible libraries of sequences close to the 3’ end of polyadenylated RNA. QuantSeq enables cost-efficient gene expression profiling and alternative polyadenylation studies.

The QuantSeq data analysis pipeline hosted on the BlueBee Genomics Platform provides researchers the opportunity to analyze QuantSeq samples in a convenient and fast way, even to the users without bioinformatics experience. The workflow for our QuantSeq pipeline is shown on the right.

Lexogen’s pipeline on BlueBee is accessible through a complementary code provided with each QuantSeq kit. To get started, register an account using your BlueBee activation code and get started analyzing your data!

PROMOTION: Until August 31st 2020, extra QuantSeq data analysis codes for 1.5 GB input are offered free of charge! Please contact sales@lexogen.com for more information.

Visit QuantSeq Webpage

pdf 015UG108V0300 – User Guide – QuantSeq Data Analysis Pipeline on BlueBee Platform – update 20.07.2020

Go to QuantSeq Platform
QuantSeq Data Analysis Pipeline

CORALL Total RNA-Seq

The CORALL Total RNA-Seq Library Prep Kit enables fast and cost-efficient generation of UMI labelled, stranded libraries for whole transcriptome analyses using Illumina® NGS platforms.

The CORALL data analysis pipeline on the BlueBee Genomics Platform allows researchers to analyze CORALL samples in a convenient and fast way, even to users without bioinformatics experience.

To get started, register an account using your BlueBee activation code and get started analyzing your data!

PROMOTION: Until August 31st 2020, data analysis codes are offered free of charge for CORALL users! Please contact sales@lexogen.com for more information.

Visit CORALL Webpage

pdf 095UG279V0100 – User Guide – CORALL Data Analysis Pipeline on BlueBee Platform – release 20.07.2020

Go to CORALL Platform
009_CORALL_Workflow-Data-Analysis_V0200

SLAMseq Metabolic RNA Labeling

SLAMseq is a high-sensitivity method for time-resolved measurement of newly synthesized and existing RNA in cultured cells by metabolic labeling and RNA sequencing.

The SLAMdunk software on the BlueBee Genomics Platform allows easy analysis of sequencing data from a SLAMseq experiment. The workflow for our SLAMdunk pipeline is shown on the right.

To get started, register an account using your BlueBee activation code and get started analyzing your data!

pdf 063UG147V0200 – User Guide – SLAMdunk Data Analysis Pipeline on BlueBee Platform – update 11.08.2020

Visit SLAMseq Webpage

Go to SLAMdunk Platform
SLAMdunk_Figure_01

FAQ

Frequently Asked Questions

Please find a list of the most frequently asked questions below. If you cannot find the answer to your question here or want to know more about our products, please contact support@lexogen.com.

All QuantSeq FWD and REV kits ordered after November 2016 are supplied with a code to access the QuantSeq data analysis pipelines on the BlueBee Genomics Platform. The code supplied with the QuantSeq FWD Kits (Cat. No. 015) enables two different pipeline options:

  • FWD pipeline – For standard QuantSeq FWD library data (no UMIs).
  • FWD-UMI pipeline – only for libraries prepared with the UMI Second Strand Synthesis Module for QuantSeq FWD (Illumina, Read 1, Cat. No. 081).

Each code contains an equal number of pipeline runs as reactions provided in the kits. You should choose either the FWD or FWD-UMI pipeline, depending on how your libraries were prepared. If you start the wrong pipeline for your data, you can abort or stop the run before it is completed without losing your allocated runs. If the run is completed, then additional runs will need to be purchased. If you wish to run both options, additional activation codes can he purchased at Lexogen’s online webstore.

To analyze data from QuantSeq FWD libraries that contain UMIs (prepared with the UMI Second Strand Synthesis Module (Cat. No. 081)), simply use the activation code included with your QuantSeq FWD kit and select the respective “FWD-UMI” pipeline when setting up your data analysis run on BlueBee (for further information see UMI FAQ 4.7 and 4.10).

QuantSeq FWD libraries prepared from blood using Globin Block (RS-GBHs or RS-GBSs (Cat. No. 070 and 071)) should be analyzed using the standard FWD pipeline for data analysis; unless UMIs are also included and then the FWD-UMI pipeline should be selected.

Do not run the “FWD-UMI” pipelines for QuantSeq FWD (standard) libraries that do not contain UMIs. This will incorrectly collapse reads and result in incorrect read count data. Use only the FWD pipelines for standard QuantSeq FWD libraries.

Be sure to only select the “FWD-UMI” pipeline for UMI-containing libraries, otherwise duplicate reads will not be collapsed.

Data Analysis access is only free using the provided code, for fastq input files up to 1.5 GB in size. Larger fastq file sizes can only be processed using activation codes for large file sizes, which can be purchased additionally (Cat. No. 093 for FWD or Cat. No. 094 for REV library types).

Do not use activation codes provided with the QuantSeq REV Kit (Cat. No. 016) for analysis of FWD libraries.

Analysis pipelines for QuantSeq (FWD, FWD-UMI, and REV) are currently available for the following species:

Common name (if available) Species
African Oil Palm Elaeis guineensis
Arabidopsis Arabidopsis halleri
Arabidopsis Thale cress Arabidopsis thaliana
Baker’s Yeast Saccharomyces cerevisiae
Barley Hordeum vulgare
Brain-Eating Amoeba Naegleria fowleri
Chicken Gallus gallus
CHO-K1 Cell Line Cricetulus griseus
Common Yellow Monkeyflower Mimulus guttatus
COVID19 SARS-CoV-2
Cow Bos taurus
Cacao Tree Theobroma cacao criollo
Dog Canis lupus familiaris
Drummond’s Rockcress Boechera stricta
Ferret Mustela putorius furo
Fruit Fly Drosophila melanogaster
Fungus Fusarium oxysporum
Fungus Yarrowia lipolytica
Goat Capra hircus
Human Homo sapiens
Maize/Corn Zea mays
Melon Fly Bactrocera cucurbitae
Mouse Mus musculus
Nematode Roundworm Caenorhabditis elegans
Painted Turtle Chrysemys picta bellii
Pig Sus scrofa
Potato Solanum tuberosum
Purple False Brome Brachypodium distachyon
Rabbit Oryctolagus cuniculus
Rat Rattus norvegicus
Rice Oryza sativa
Salmon Salmon salar
Sorghum Sorghum bicolor
Sponge Amphimedon queenslandica
Starlet Sea Anemone Nematostella vectensis
Tomato Solanum lycopersicum
Water flea Daphnia pulex
Western Balsam Poplar Tree Populus trichocarpa
Yeast Candida albicans, Candida auris, Candida parapsilosis
Yeast Kluyveromyces lactis
Zebrafish Danio rerio
New species can be added upon request. Please contact support@lexogen.com if your species of interest is not listed and for pricing information.

All QuantSeq FWD and REV kits ordered after November 2016 are supplied with an activation code to access the QuantSeq Data Analysis pipelines on the BlueBee Genomics Platform.

The activation codes are printed on a sticker that is attached to the side of the small reagent box, (stored at -20 °C) which comes inside the main QuantSeq Kit box (Fig. 1). Activation codes for additional runs can be purchased from Lexogen’s online webstore. Activation codes registered after September 20, 2018 are valid for two years. The input file size is limited to 1.5 GB per fastq(.gz) file. If you have larger input files or for further inquiries, please contact support@lexogen.com.

Bluebee_Sticker_Location

Figure 1 | Location of the BlueBee Data Analysis activation code on QuantSeq Kit reagent boxes.

Below is an overview of the standard workflow for the QuantSeq Forward (FWD) pipeline (Fig. 2). The steps include: Quality control; Trimming; Quality control; Alignment to reference genome; Mapping statistics and mapped read QC; Read Counting, and optional Differential Expression Analyses. Lexogen uses the same analysis pipeline internally for services and for support data analysis evaluations using our QSDAP internal automated workflows. The pipeline available on BlueBee does not include analysis of rRNA mapping reads. This can be performed separately after trimming the reads and before mapping to the reference genome.

QuantSeq-Data-Analysis-Pipeline

Figure 2 | Workflow for the QuantSeq Forward (FWD) pipeline.

There are several options:

  • Option 1: The facility runs the analysis on Bluebee (they setup and run the fastq files and any requested DE analysis). Then they share the project with the customer (Customers can also create BlueBee accounts and use the system to view/download/upload data but without activation codes they can’t execute the pipelines themselves). The facility customer with whom the project is shared will need to have a BlueBee account in order to do this. In order to register an account, the customer of the facility will need to have a dummy code issued. Please contact support@lexogen.com for this code.
  • Option 2: The facility creates a project and uploads the fastq files to the project, enters the activation code into the project, and shares the project with full access rights with the customer. The customer can then execute the pipeline runs themselves. This is preferred compared to Option 1 as the customer can then decide which DE analyses to run, and run multiple DE analyses (up to 500 are included with each code). This won’t work if facilities are splitting kits between different customers – only one kit per customer. The facility customer with whom the project is shared will need to have a BlueBee account in order to do this. In order to register an account, the customer of the facility will need to have a dummy code issued. Please contact support@lexogen.com for this code.
  • Option 3: The facility can split activation codes from single kits. To do this, they can create different projects and allocate runs from the code to the different projects. This way they could create projects with each customer’s fastq files and allocate the number of runs for the number of samples. For splitting activation codes, refer to FAQ #1.6. Following splitting, the rest of Option 2 applies. To allocate runs to different projects:
    • There is an option to assign the runs that come with an activation codes to different projects.
    • To do this, the user must have the ‘Activation Code User’ security profile assigned.
    • Go to the ‘Activation Codes’ menu and add the new activation code.
    • Open the activation code view by double clicking it.
    • At the bottom of the window you can find a check box ‘Project enabled’.
    • After selecting it, you can see a table with the different pipeline bundles and projects.
    • In the table, you can assign the number of runs for a specific pipeline bundle to the individual projects.
    • Keep in mind that after activating this option, the default for all projects will be zero runs. This means that you have to assign runs to all projects.
  • Option 4: The facilities need to pass on the BlueBee activation code to the customer from the kit they use for the preps along with the fastq files. Here, the customer performs all the BlueBee analysis themselves (connection, upload download entering the code, running pipelines). The onus here is on facilities to keep track of the kits used for each customer. If they are splitting kits between customers then a code exchange can be performed. For splitting activation codes, refer to FAQ #1.6.

Yes. This is especially useful for core facilities so they can distribute separate codes to each user for BlueBee analysis. In order to split a code into several codes, please do the following:

  1. The code must not yet be activated in order to be exchanged.
  2. Contact support@lexogen.com to acquire BlueBee Code Exchange file.
  3. Fill in your BlueBee code under the respective kit size – FWD or REV must be specified.
  4. Enter in the rows below for up to 4 new codes and include how many runs each code should contain.
  5. Send the completed form to support@lexogen.com and we will deactivate the original code, re-issue the new codes, paste these into the rows above the new code run numbers entered, and then send the form back to you. Below is an example of how this form looks:

Example of the BlueBee Code Exchange file.

Figure 3 | Example form.

Alex Dobin, author of STAR aligner, wrote: “Briefly, ‘too short’ relates to the best alignment mapped length, not the read length itself”. The actual output is:

UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 3.36%

The ‘too short’ field means that the alignment between read and reference is rather due to the randomness. Therefore, check the follow:

  • Per Sequence GC content: check to see if there are any prominent peaks. If so, this could be an indication of contamination in the library.
  • Overrepresented sequences: Check the origin of the top sequences by copying the sequence and pasting into BLASTn. If the sequences do not align to the species used for library preparation, there may be a contamination issue. A common cell culture contaminant is Mycoplasma.
  • RNA-seq library preparation: ensure appropriate QC is performed on final libraries. For additional support, contact support@lexogen.com.
  • Processing data errors: ensure the appropriate species was selected.

Data storage depends on the file type:

  • Fastq files uploaded – if not used for a Merging/Data Analysis Pipeline run: 6 days after upload
  • Fastq files uploaded – used for running Merging pipeline: 6 days after run is completed.
  • Fastq files generated by Merging pipeline (output of merging pipeline): 6 days after run is completed
  • .bam files (output from Data Analysis Pipeline): 1 month (then deleted permanently)
  • Data Analysis pipeline output files (except .bam files): 1 year

If Mycoplasma RNA is present in a sample used as input for QuantSeq library prep, the following can be observed in data analysis:

  • Reduced overall mapping rates and reduced unique mapping rates to the target genome. In the most severe of cases, only ~30% of the reads can be mapped uniquely.
  • When mapping the reads to Mycoplasma, uniquely mapped reads can be identified. It may be that different mycoplasma species could be involved so mapping to different species could be advised. To check sequences, use BLASTn.
The count summary files from the results of the data analysis pipeline on BlueBee contain mapped read counts for Ensembl gene IDs. Ensembl gene IDs can be converted into other gene ID types, or additional annotations added using the BioDBnet tools. Simply input the list of Ensembl gene IDs, select the input format as “Ensembl Gene ID”, and then select the output format(s) desired.

QuantSeq generates inherently low-complexity libraries; Reverse transcription is initiated by oligodT priming, which anchors the first strand synthesis to the very 3’ end of the transcript. Furthermore, the protocol yields mean library sizes of about 335 – 456 bp, with mean insert sizes of 203 – 324 bp. Consequently, the great majority of library fragments corresponding to a particular gene come from a region <400 nt immediately upstream of the poly(A) tail. This reduces the likelihood of any given read to have unique mapping coordinates, even though the second strand synthesis is initiated by random priming.

Schematic overview of QuantSeq library generation

Figure 4 

As a reference, see the distribution of % Duplicate Reads, as reported by FASTQC, from Lexogen’s internal runs:

Distribution of % Duplicate reads from Lexogen runs

Figure 5

During this promotional period until August 2020, CORALL kits are supplied with a code to access the CORALL data analysis pipelines on the BlueBee Genomics Platform. Each code enables 24 data analysis runs (i.e., 24 .fastq.gz files may be analyzed).

To use your activation code, register with BlueBee and upload your data (.fastq.gz files) to a new project. To execute runs, navigate to your project and go to runsadd run select the data you wish to analyze and the species from the dropdown list then click run.

Data Analysis access is only free using the provided code, for fastq input files up to 5 GB in size. Larger fastq file sizes can only be processed using activation codes for larger file sizes. Additional codes can be purchased from Lexogen’s online webstore.

Analysis pipelines for CORALL are currently available for the following species:

Common name (if available) Species
African Oil Palm Elaeis guineensis
Arabidopsis Arabidopsis halleri
Arabidopsis Thale cress Arabidopsis thaliana
Baker’s Yeast Saccharomyces cerevisiae
Barley Hordeum vulgare
Brain-Eating Amoeba Naegleria fowleri
Chicken Gallus gallus
CHO-K1 Cell Line Cricetulus griseus
Common Yellow Monkeyflower Mimulus guttatus
COVID19 SARS-CoV-2
Cow Bos taurus
Cacao Tree Theobroma cacao criollo
Dog Canis lupus familiaris
Drummond’s Rockcress Boechera stricta
Ferret Mustela putorius furo
Fruit Fly Drosophila melanogaster
Fungus Fusarium oxysporum
Fungus Yarrowia lipolytica
Goat Capra hircus
Human Homo sapiens
Maize/Corn Zea mays
Melon Fly Bactrocera cucurbitae
Mouse Mus musculus
Nematode Roundworm Caenorhabditis elegans
Painted Turtle Chrysemys picta bellii
Pig Sus scrofa
Potato Solanum tuberosum
Purple False Brome Brachypodium distachyon
Rabbit Oryctolagus cuniculus
Rat Rattus norvegicus
Rice Oryza sativa
Salmon Salmon salar
Sorghum Sorghum bicolor
Sponge Amphimedon queenslandica
Starlet Sea Anemone Nematostella vectensis
Tomato Solanum lycopersicum
Water flea Daphnia pulex
Western Balsam Poplar Tree Populus trichocarpa
Yeast Candida albicans, Candida auris, Candida parapsilosis
Yeast Kluyveromyces lactis
Zebrafish Danio rerio
New species can be added upon request. Please contact support@lexogen.com if your species of interest is not listed and for pricing information.

CORALL kits are supplied with an activation code to access the CORALL Data Analysis pipelines on the BlueBee Genomics Platform.

Activation codes for additional runs can be purchased from Lexogen’s online webstore. Activation codes registered after September 20, 2018 are valid for two years. The input file size is limited to 5 GB per fastq(.gz) file. If you have larger input files or for further inquiries, please contact support@lexogen.com.

Below is an overview of the main workflow of the pipeline. The provided pipeline enables kit users to perform read quality control, mapping, Unique Molecular Identifier (UMI) deduplication, and transcript quantification.

009_CORALL_Workflow-Data-Analysis_V0200

Figure 1 | Workflow for the CORALL pipeline.

Additional information can be found on Lexogen’s website under CORALL data analysis.

There are several options:

  • Option 1: The facility runs the analysis on Bluebee (they setup and run the fastq files and any requested DE analysis). Then they share the project with the customer (Customers can also create BlueBee accounts and use the system to view/download/upload data but without activation codes they can’t execute the pipelines themselves). The facility customer with whom the project is shared will need to have a BlueBee account in order to do this. In order to register an account, the customer of the facility will need to have a dummy code issued. Please contact support@lexogen.com for this code.
  • Option 2: The facility creates a project and uploads the fastq files to the project, enters the activation code into the project, and shares the project with full access rights with the customer. The customer can then execute the pipeline runs themselves. This is preferred compared to Option 1 as the customer can then decide which DE analyses to run, and run multiple DE analyses (up to 500 are included with each code). This won’t work if facilities are splitting kits between different customers – only one kit per customer. The facility customer with whom the project is shared will need to have a BlueBee account in order to do this. In order to register an account, the customer of the facility will need to have a dummy code issued. Please contact support@lexogen.com for this code.
  • Option 3: The facility can split activation codes from single kits. To do this, they can create different projects and allocate runs from the code to the different projects. This way they could create projects with each customer’s fastq files and allocate the number of runs for the number of samples. For splitting activation codes, refer to FAQ #2.6. Following splitting, the rest of Option 2 applies. To allocate runs to different projects:
    • There is an option to assign the runs that come with an activation codes to different projects.
    • To do this, the user must have the ‘Activation Code User’ security profile assigned.
    • Go to the ‘Activation Codes’ menu and add the new activation code.
    • Open the activation code view by double clicking it.
    • At the bottom of the window you can find a check box ‘Project enabled’.
    • After selecting it, you can see a table with the different pipeline bundles and projects.
    • In the table, you can assign the number of runs for a specific pipeline bundle to the individual projects.
    • Keep in mind that after activating this option, the default for all projects will be zero runs. This means that you have to assign runs to all projects.
  • Option 4: The facilities need to pass on the BlueBee activation code to the customer from the kit they use for the preps along with the fastq files. Here, the customer performs all the BlueBee analysis themselves (connection, upload download entering the code, running pipelines). The onus here is on facilities to keep track of the kits used for each customer. If they are splitting kits between customers then a code exchange can be performed. For splitting activation codes, refer to FAQ #2.6.

Yes. This is especially useful for core facilities so they can distribute separate codes to each user for BlueBee analysis. In order to split a code into several codes, please do the following:

  1. The code must not yet be activated in order to be exchanged.
  2. Contact support@lexogen.com to acquire BlueBee Code Exchange file.
  3. Fill in your BlueBee code under the respective kit size – FWD or REV must be specified.
  4. Enter in the rows below for up to 4 new codes and include how many runs each code should contain.
  5. Send the completed form to support@lexogen.com and we will deactivate the original code, re-issue the new codes, paste these into the rows above the new code run numbers entered, and then send the form back to you. Below is an example of how this form looks:

Example of the BlueBee Code Exchange file.

Figure 2 | Example form.

Alex Dobin, author of STAR aligner, wrote: “Briefly, ‘too short’ relates to the best alignment mapped length, not the read length itself”. The actual output is:

UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 3.36%

The ‘too short’ field means that the alignment between read and reference is rather due to the randomness. Therefore, check the follow:

  • Per Sequence GC content: check to see if there are any prominent peaks. If so, this could be an indication of contamination in the library.
  • Overrepresented sequences: Check the origin of the top sequences by copying the sequence and pasting into BLASTn. If the sequences do not align to the species used for library preparation, there may be a contamination issue. A common cell culture contaminant is Mycoplasma.
  • RNA-seq library preparation: ensure appropriate QC is performed on final libraries. For additional support, contact support@lexogen.com.
  • Processing data errors: ensure the appropriate species was selected.

Data storage depends on the file type:

  • Fastq files uploaded – if not used for a Merging/Data Analysis Pipeline run: 6 days after upload
  • Fastq files uploaded – used for running Merging pipeline: 6 days after run is completed.
  • Fastq files generated by Merging pipeline (output of merging pipeline): 6 days after run is completed
  • .bam files (output from Data Analysis Pipeline): 1 month (then deleted permanently)
  • Data Analysis pipeline output files (except .bam files): 1 year

If Mycoplasma RNA is present in a sample used as input for QuantSeq library prep, the following can be observed in data analysis:

  • Reduced overall mapping rates and reduced unique mapping rates to the target genome. In the most severe of cases, only ~30% of the reads can be mapped uniquely.
  • When mapping the reads to Mycoplasma, uniquely mapped reads can be identified. It may be that different mycoplasma species could be involved so mapping to different species could be advised. To check sequences, use BLASTn.

The count summary files from the results of the data analysis pipeline on BlueBee contain mapped read counts for Ensembl gene IDs. Ensembl gene IDs can be converted into other gene ID types, or additional annotations added using the BioDBnet tools. Simply input the list of Ensembl gene IDs, select the input format as “Ensembl Gene ID”, and then select the output format(s) desired.

The SLAMdunk analysis pipeline is available via the BlueBee Genomics platform. The pipeline is specifically designed for analysis of sequencing data from SLAMseq libraries prepared with the QuantSeq 3’ mRNA-Seq Library Prep Kits. To access SLAMdunk, please purchase a Data Analysis Package from Lexogen’s online webstore to obtain an activation code. Data Analysis Packages are based on total data size (i.e. the size of all .fastq files to be analyzed in a single run). The available package sizes are: 0-2 GB, 0-6 GB, 0-12 GB, 0-25 GB. EXAMPLE: A set of 24 .fastq files has a total size of ~7.5 GB. To analyze this full dataset, you would need the 0-12 GB package.

Each data analysis package can be used for only a single run of the pipeline. Unused GB are not able to be carried over and used again for a subsequent run(s) and are essentially discarded. Therefore, all .fastq files for analysis should be uploaded and analyzed in the same run.

Although the current SLAMdunk pipelines are specifically implemented for QuantSeq FWD sequencing data, it is also possible to analyze REV data. First, the REV reads need to be trimmed and reverse complemented. After this, the .fastq files can be uploaded to BlueBee and the pipeline is run without trimming. Please contact us at support@lexogen.com for details of the trimming and reverse complement steps.

SLAMdunk uses .fastq files containing raw sequencing reads and performs the steps indicated below. The results files for all steps are provided including: FASTQC reports, aligned read .bam files, filtered .bam files with reads mapping only to 3’ UTRs, count files containing the T>C-containing and total read counts, plus statistical overview plots (e.g. PCA), and multiqc summaries of the read QC statistics.

SLAMdunk_Figure_01

Figure 1 | Workflow for the SLAMdunk pipeline.

Pipelines are available for human and mouse. Please contact support@lexogen.com if your species of interest is not listed.