Detecting genetic variation and base modifications together in the same single molecules of DNA and RNA at base pair resolution using a magnetic tweezer platform

Zhen Wang, Jérôme Maluenda, Laurène Giraut, Thibault Vieille, Andréas Lefevre, David Salthouse, Gaël Radou, Rémi Moulinas, Sandra Astete-Morales, Pol d’Avezac, Geoff Smith, Charles André, Jean-François Allemand, David Bensimon, Vincent Croquette, Jimmy Ouellet, Gordon Hamilton

bioRxiv, doi:10.1101/2020.04.03.002501

Accurate decoding of nucleic acid variation is important to understand the complexity and regulation of genome function. Here we introduce a single-molecule platform based on magnetic tweezer (MT) technology that can identify and map the positions of sequence variation and multiple base modifications together in the same single molecules of DNA or RNA at single base resolution. Using synthetic templates, we demonstrate that our method can distinguish the most common epigenetic marks on DNA and RNA with high sensitivity, specificity and precision. We also developed a highly specific CRISPR-Cas enrichment strategy to target genomic regions in native DNA without amplification. We then used this method to enrich native DNA from E. coli and characterized the differential levels of adenine and cytosine base modifications together in molecules of up to 5 kb in length. Finally, we enriched the 5‘UTR of FMR1 from cells derived from a Fragile X carrier and precisely measured the repeat expansion length and methylation status of each molecule. These results demonstrate that our platform can detect a variety of genetic, epigenetic and base modification changes concomitantly within the same single molecules.

Features TeloPrime Full-Length cDNA Amplification Kit

The giant sequoia genome and proliferation of disease resistance genes

Alison D. Scott, Aleksey V. Zimin, Daniela Puiu, Rachael Workman, Monica Britton, Sumaira Zaman, Madison Caballero, Andrew C. Read, Adam J. Bogdanove, Emily Burns, Jill Wegrzyn, Winston Timp, Steven L. Salzberg, David B. Neale

bioRxiv, doi:10.1101/2020.03.17.995944

The giant sequoia (Sequoiadendron giganteum) of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains. As they grow primarily in isolated groves within a narrow range, conservation of existing trees has been a national goal for over 150 years. Genomic data are limited in giant sequoia, and the assembly and annotation of the first giant sequoia genome has been an important goal to allow marker development for restoration and management. Using Illumina and Oxford Nanopore sequencing combined with Dovetail chromosome conformation capture libraries, 8.125 Gbp of sequence was assembled into eleven chromosome-scale scaffolds. This giant sequoia assembly represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant sequoia conservation and management. Beyond conservation and management applications, the giant sequoia assembly is a resource for answering questions about the life history of this enigmatic and robust species. Here we provide an example by taking an inventory of the large and complex family of NLR type disease resistance genes.

Features TeloPrime Full-Length cDNA Amplification Kit

HIV-1 spliced RNAs display transcription start site bias

Jackie M. Esquiaqui, Siahrei Kharytonchyk, Darra Drucker and Alice Telesnitsky

RNA, doi:10.1261/rna.073650.119

HIV-1 transcripts have three fates: to serve as genomic RNAs, unspliced mRNAs, or spliced subgenomic mRNAs. Recent structural studies have shown that sequences near the 5′ end of HIV-1 RNA can adopt at least two alternate 3-dimensional conformations, and that these structures dictate genome vs. unspliced mRNA fates. HIV-1’s use of alternate transcription start sites can influence which RNA conformer is generated, and this choice in turn dictates the fate of the unspliced RNA. The structural context of HIV-1’s major 5′ splice site differs in these two RNA conformers, suggesting that the conformers may differ in their ability to support HIV-1 splicing events. Here we tested the hypothesis that transcription start sites that shift the RNA monomer/dimer structural equilibrium away from the splice site sequestering dimer-competent fold would favor splicing. Consistent with this hypothesis, the results showed that the 5′ ends of spliced HIV-1 RNAs were enriched in 3GCap structures and depleted of 1GCap RNAs relative to the total intracellular RNA population. These findings expand the functional significance of HIV-1 RNA structural dynamics by demonstrating roles for RNA structure in defining all three classes of HIV-1 RNAs, and suggest that HIV-1 transcription start site choice initiates a cascade of molecular events that dictate the fates of nascent HIV-1 RNAs.

Features TeloPrime Full-Length cDNA Amplification Kit

Cap homeostasis is the cyclical process of decapping and recapping that maintains the translation and stability of a subset of the transcriptome. Previous work showed levels of some recapping targets decline following transient expression of an inactive form of RNMT (ΔN-RNMT), likely due to degradation of mRNAs with improperly methylated caps. The current study examined transcriptome-wide changes following inhibition of cytoplasmic cap methylation. This identified mRNAs with 5′-terminal oligopyrimidine (TOP) sequences as the largest single class of recapping targets. Cap end mapping of several TOP mRNAs identified recapping events at native 5′ ends and downstream of the TOP sequence of EIF3K and EIF3D. This provides the first direct evidence for downstream recapping. Inhibition of cytoplasmic cap methylation was also associated with mRNA abundance increases for a number of transcription, splicing, and 3′ processing factors. Previous work suggested a role for alternative polyadenylation in target selection, but this proved not to be the case. However, inhibition of cytoplasmic cap methylation resulted in a shift of upstream polyadenylation sites to annotated 3′ ends. Together, these results solidify cap homeostasis as a fundamental process of gene expression control and show cytoplasmic recapping can impact regulatory elements present at the ends of mRNA molecules.

Features TeloPrime Full-Length cDNA Amplification Kit and QuantSeq 3’ mRNA-Seq Library Prep Kit REV for Illumina

Long-read Assays Shed New Light on the Transcriptome Complexity of a Viral Pathogen and on Virus-Host Interaction

Dóra Tombácz, István Prazsák, Zoltán Maróti, Norbert Moldován, Zsolt Csabai, Zsolt Balázs, Béla Dénes, Tibor Kalmár, Michael Snyder, Zsolt Boldogkői

bioRxiv, doi:10.1101/2020.01.27.921056

Characterization of global transcriptomes using conventional short-read sequencing is challenging because of the insensitivity of these platforms to transcripts isoforms, multigenic RNA molecules, and transcriptional overlaps, etc. Long-read sequencing (LRS) can overcome these limitations by reading full-length transcripts. Employment of these technologies has led to the redefinition of transcriptional complexities in reported organisms. In this study, we applied LRS platforms from Pacific Biosciences and Oxford Nanopore Technologies to profile the dynamic vaccinia virus (VACV) transcriptome and assess the effect of viral infection on host gene expression. We performed cDNA and direct RNA sequencing analyses and revealed an extremely complex transcriptional landscape of this virus. In particular, VACV genes produce large numbers of transcript isoforms that vary in their start and termination sites. A significant fraction of VACV transcripts start or end within coding regions of neighboring genes. We distinguished five classes of host genes according to their temporal responses to viral infection. This study provides novel insights into the transcriptomic profile of a viral pathogen and the effect of the virus on host gene expression.

Features TeloPrime Full-Length cDNA Amplification Kit

Dual-initiation promoters with intertwined canonical and TCT/TOP transcription start sites diversify transcript processing

Chirag Nepal, Yavor Hadzhiev, Piotr Balwierz, Estefanía Tarifeño-Saldivia, Ryan Cardenas, Joseph W. Wragg, Ana-Maria Suzuki, Piero Carninci, Bernard Peers, Boris Lenhard, Jesper B. Andersen & Ferenc Müller

Nature Communications, doi:10.1038/s41467-019-13687-0

Variations in transcription start site (TSS) selection reflect diversity of preinitiation complexes and can impact on post-transcriptional RNA fates. Most metazoan polymerase II-transcribed genes carry canonical initiation with pyrimidine/purine (YR) dinucleotide, while translation machinery-associated genes carry polypyrimidine initiator (5’-TOP or TCT). By addressing the developmental regulation of TSS selection in zebrafish we uncovered a class of dual-initiation promoters in thousands of genes, including snoRNA host genes. 5’-TOP/TCT initiation is intertwined with canonical initiation and used divergently in hundreds of dual-initiation promoters during maternal to zygotic transition. Dual-initiation in snoRNA host genes selectively generates host and snoRNA with often different spatio-temporal expression. Dual-initiation promoters are pervasive in human and fruit fly, reflecting evolutionary conservation. We propose that dual-initiation on shared promoters represents a composite promoter architecture, which can function both coordinately and divergently to diversify RNAs.

Features TeloPrime Full-Length cDNA Amplification Kit

Template-switching artifacts resemble alternative polyadenylation

Zsolt Balázs, Dóra Tombácz, Zsolt Csabai, Norbert Moldován, Michael Snyder & Zsolt Boldogkői

BMC Genomics, doi:10.1186/s12864-019-6199-7


Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming.


Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a filtering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when filtering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming filters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads.


Our findings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous filtering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.

Features TeloPrime Full-Length cDNA Amplification Kit

High-quality chromosome-scale assembly of the walnut (Juglans regia L) reference genome

Annarita Marrano, Monica Britton, Paulo A. Zaini, Aleksey V. Zimin, Rachael E. Workman, Daniela Puiu, Luca Bianco, Erica Adele Di Pierro, Brian J. Allen, Sandeep Chakraborty, Michela Troggio, Charles A. Leslie, Winston Timp, Abhaya Dandekar, Steven L. Salzberg, David B. Neale

bioRxiv, doi:10.1101/809798

The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes. Here we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, and the full sequence of all 16 chromosomal pseudomolecules, nine of which present telomere sequences at both ends. Using full-length transcripts from single-molecule real-time sequencing, we predicted 40,491 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) are full-length, which represents a significant improvement compared to Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during catkin development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of a new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars. Overall, Chandler v2.0 is a valuable resource to understand and explore walnut biology better.

Features TeloPrime Full-Length cDNA Amplification Kit

Structural rearrangements drive extensive genome divergence between symbiotic and free-living Symbiodinium

Raúl A. González-Pech, Timothy G. Stephens, Yibi Chen, Amin R. Mohamed, Yuanyuan Cheng, David W. Burt, Debashish Bhattacharya, Mark A. Ragan, Cheong Xin Chan

bioRxiv, doi:10.1101/783902

Symbiodiniaceae are predominantly symbiotic dinoflagellates critical to corals and other reef organisms. Symbiodinium is a basal symbiodiniacean lineage and includes symbiotic and free-living taxa. However, the molecular mechanisms underpinning these distinct lifestyles remain little known. Here, we present high-quality de novo genome assemblies for the symbiotic Symbiodinium tridacnidorum CCMP2592 (genome size 1.3 Gbp) and the free-living Symbiodinium natans CCMP2548 (genome size 0.74 Gbp). These genomes display extensive sequence divergence, sharing only ~1.5% conserved regions (≥90% identity). We predicted 45,474 and 35,270 genes for S. tridacnidorum and S. natans, respectively; of the 58,541 homologous gene families, 28.5% are common to both genomes. We recovered a greater extent of gene duplication and higher abundance of repeats, transposable elements and pseudogenes in the genome of S. tridacnidorum than in that of S. natans. These findings demonstrate that genome structural rearrangements are pertinent to distinct lifestyles in Symbiodinium, and may contribute to the vast genetic diversity within the genus, and more broadly in Symbiodiniaceae. Moreover, the results from our whole-genome comparisons against a free-living outgroup support the notion that the symbiotic lifestyle is a derived trait in, and that the free-living lifestyle is ancestral to, Symbiodinium.

Features TeloPrime Full-Length cDNA Amplification Kit

Illuminating the dark side of the human transcriptome with TAMA Iso-Seq analysis

Richard I. Kuo, Yuanyuan Cheng, Jacqueline Smith, Alan L. Archibald, David W. Burt

bioRxiv, doi:10.1101/780015

The human transcriptome is one of the most well-annotated of the eukaryotic species. However, limitations in technology biased discovery toward protein coding spliced genes. Accurate high throughput long read RNA sequencing now has the potential to investigate genes that were previously undetectable. Using our Transcriptome Annotation by Modular Algorithms (TAMA) tool kit to analyze the Pacific Bioscience Universal Human Reference RNA Sequel II Iso-Seq dataset, we discovered thousands of potential novel genes and identified challenges in both RNA preparation and long read data processing that have major implications for transcriptome annotation.

Features TeloPrime Full-Length cDNA Amplification Kit

Polarella glacialis genomes encode tandem repeats of single-exon genes with functions critical to adaptation of dinoflagellates

Timothy G. Stephens, Raúl A. González-Pech, Yuanyuan Cheng, Amin R. Mohamed, David W. Burt, Debashish Bhattacharya, Mark A. Ragan, Cheong Xin Chan

bioRxiv, doi:10.1101/704437

Dinoflagellates are taxonomically diverse, ecologically important phytoplankton in marine and freshwater environments. Here, we present two draft diploid genome assemblies of the free-living dinoflagellate Polarella glacialis, isolated from the Arctic and Antarctica. For each genome, guided using full-length transcriptome data, we predicted >50,000 high-quality genes. About 68% of the genome is repetitive sequence; long terminal repeats likely contribute to intra-species structural divergence and distinct genome sizes (3.0 and 2.7 Gbp). Of all genes, ∼40% are encoded unidirectionally, ∼25% comprised of single exons. Multi-genome comparison unveiled genes specific to P. glacialis and a common, putatively bacterial, origin of ice-binding domains in cold-adapted dinoflagellates. Our results elucidate how selection acts within the context of a complex genome structure to facilitate local adaptation. Since most dinoflagellate genes are constitutively expressed, Polarella glacialis has enhanced transcriptional responses via unidirectional, tandem duplication of single-exon genes that encode functions critical to survival in cold, low-light environments.

Features TeloPrime Full-Length cDNA Amplification Kit


Novel antimicrobial treatments are urgently needed. Previous work has shown that the mucus of the brown garden snail (Cornu aspersum) has antimicrobial properties, in particular against type culture collection strains of Pseudomonas aeruginosa. We hypothesised that it would also be effective against clinical isolates of the bacterium and that investigation of fractions of the mucus would identify one or more proteins with anti-pseudomonal properties, which could be further characterised.

Materials and methods

Mucus was extracted from snails collected from the wild. Antimicrobial activity against laboratory and clinical isolates of Ps. aeruginosa was determined in disc diffusion assays. Mucus was purified using size exclusion chromatography and fractions containing anti-pseudomonal activity identified. Mass spectroscopy and high performance liquid chromatography analysis of these fractions yielded partial peptide sequences. These were used to interrogate an RNA transcriptome generated from whole snails.


Mucus from C. aspersum inhibited growth of type collection strains and clinical isolates of Ps. aeruginosa. Four novel C. aspersum proteins were identified; at least three are likely to have antimicrobial properties. The most interesting is a 37.4 kDa protein whilst smaller proteins, one 17.5 kDa and one 18.6 kDa also appear to have activity against Ps. aeruginosa.


The study has identified novel proteins with antimicrobial properties which could be used to develop treatments for use in human medicine.

Features TeloPrime Full-Length cDNA Amplification Kit

Transcriptome profiling of mouse samples using nanopore sequencing of cDNA and RNA molecules

Camille Sessegolo, Corinne Cruaud, Corinne Da Silva, Marion Dubarry, Thomas Derrien, Vincent Lacroix, Jean-Marc Aury

BioRxiv, doi:10.1101/575142


Our vision of DNA transcription and splicing has changed dramatically with the intro-duction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules.


Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies. In addition, we tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts.


Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing stretches of A’s. Furthermore, bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.

Features TeloPrime Full-Length cDNA Amplification Kit and SIRVs (Spike-in RNA Variant Control Mixes) – SIRV-Set 1

Long-read sequencing uncovers a complex transcriptome topology in varicella zoster virus

István Prazsák, Norbert Moldován, Zsolt Balázs, Dóra Tombácz, Klára Megyeri, Attila Szűcs, Zsolt Csabai and Zsolt Boldogkői

BMC Genomics, doi: 10.1186/s12864-018-5267-8

Varicella zoster virus (VZV) is a human pathogenic alphaherpesvirus harboring a relatively large DNA molecule. The VZV transcriptome has already been analyzed by microarray and short-read sequencing analyses. However, both approaches have substantial limitations when used for structural characterization of transcript isoforms, even if supplemented with primer extension or other techniques. Among others, they are inefficient in distinguishing between embedded RNA molecules, transcript isoforms, including splice and length variants, as well as between alternative polycistronic transcripts. It has been demonstrated in several studies that long-read sequencing is able to circumvent these problems.

In this work, we report the analysis of the VZV lytic transcriptome using the Oxford Nanopore Technologies sequencing platform. These investigations have led to the identification of 114 novel transcripts, including mRNAs, non-coding RNAs, polycistronic RNAs and complex transcripts, as well as 10 novel spliced transcripts and 25 novel transcription start site isoforms and transcription end site isoforms. A novel class of transcripts, the nroRNAs are described in this study. These transcripts are encoded by the genomic region located in close vicinity to the viral replication origin. We also show that the ORF63 exhibits a complex structural variation encompassing the splice sites of VZV latency transcripts. Additionally, we have detected RNA editing in a novel non-coding RNA molecule.

Our investigations disclosed a composite transcriptomic architecture of VZV, including the discovery of novel RNA molecules and transcript isoforms, as well as a complex meshwork of transcriptional read-throughs and overlaps. The results represent a substantial advance in the annotation of the VZV transcriptome and in understanding the molecular biology of the herpesviruses in general.

Features TeloPrime Full-Length cDNA Amplification Kit

High-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-distance information is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. While long-read sequencing technology can provide this long-distance information, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here, we introduce and validate a long-read ONT-based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single-cell samples.

Features TeloPrime Full-Length cDNA Amplification Kit

RNA-sequencing has revolutionized transcriptomics and the way we measure gene expression (Wang et al., 2009). As of today, short-read RNA sequencing is more widely used, and due to its low price and high throughput, is the preferred tool for the quantitative analysis of gene expression. However, the annotation of transcript isoforms is rather difficult using only short-read sequencing data, because the reads are shorter than most transcripts (Steijger et al., 2013). Long-read sequencing, on the other hand, can provide full contig information about transcripts, including exon-connectivity, and its merits in transcriptome profiling are being increasingly acknowledged (Sharon et al., 2013; Abdel-Ghany et al., 2016; Wang et al., 2016; Kuo et al., 2017). Due to the relatively low throughput of current long-read sequencing technologies, they can only characterize smaller transcriptomes in high-depth (Weirather et al., 2017).

The Human cytomegalovirus (HCMV) is a ubiquitous betaherpesvirus, which can cause mononucleosis-like symptoms in adults (Cohen and Corey, 1985), and severe life-threatening infections in newborns (Wen et al., 2002). Latent HCMV infection has recently been implicated to affect cancer formation (Dziurzynski et al., 2012; Jin et al., 2014). Examining the transcriptome of the virus can go a long way in helping understand its molecular biology. Short-read RNA sequencing studies have discovered splice junctions and non-coding transcripts (Gatherer et al., 2011) and have shown that the most abundant HCMV transcripts are similarly expressed in different cell types (Cheng et al., 2017). Our long-read RNA sequencing experiments using the Pacific Biosciences (PacBio) RSII platform revealed a great number of transcript isoforms, polycistronic RNAs and transcriptional overlaps (Balázs et al., 2017a).

Here, we present the dual-platform long-read RNA sequencing dataset of two HCMV-infected fibroblast samples. We have sequenced the same RNA population that we have previously sequenced with the PacBio RS II platform (Balázs et al., 2017b), but now using the PacBio Sequel and Oxford Nanopore Technologies (ONT) MinION platforms. These data, apart from providing a more profound picture of the lytic HCMV transcriptome, can also be used to compare the current technologies. A further sample was prepared, using lytic HCMV RNAs. This sample was subjected to ONT Cap-selected cDNA sequencing (Cap-Seq) in order to allow better characterization of the transcription start sites, and also to direct (d)RNA sequencing in order to avoid reverse-transcription (RT) and PCR artifacts. We report of sequencing of approximately 100 GB raw data (Supplementary Table 1). The CapSeq by the MinION platform yielded the highest read count, the throughputs of the Sequel platform and the ONT dRNA sequencing both lagged behind (summarized in Figure 1A); both technologies nonetheless offer significant benefits. The Sequel platform is more accurate and the dRNA sequencing is free of RT and PCR artifacts. The read length distribution shows that the Sequel platform has a similar molecule-size preference to the RSII platform, while the MinION platform sequences more short reads (Figure 1B). The length-distribution of the non-cap selected cDNA sequencing reads are different from the other ONT reads, because this library was size-selected (>500 nt).

Features TeloPrime Full-Length cDNA Amplification Kit

Comparative genome analysis of programmed DNA elimination in nematodes

Jianbin Wang, Shenghan Gao, Yulia Mostovoy, Yuanyuan Kang, Maxim Zagoskin, Yongqiao Sun, Bing Zhang, Laura K. White, Alice Easton, Thomas B. Nutman, Pui-Yan Kwok, Songnian Hu, Martin K. Nielsen and Richard E. Davis

Genome Research, doi: 10.1101/gr.225730.117

Programmed DNA elimination is a developmentally regulated process leading to the reproducible loss of specific genomic sequences. DNA elimination occurs in unicellular ciliates and a variety of metazoans, including invertebrates and vertebrates. In metazoa, DNA elimination typically occurs in somatic cells during early development, leaving the germline genome intact. Reference genomes for metazoa that undergo DNA elimination are not available. Here, we generated germline and somatic reference genome sequences of the DNA eliminating pig parasitic nematode Ascaris suum and the horse parasite Parascaris univalens. In addition, we carried out in-depth analyses of DNA elimination in the parasitic nematode of humans, Ascaris lumbricoides, and the parasitic nematode of dogs, Toxocara canis. Our analysis of nematode DNA elimination reveals that in all species, repetitive sequences (that differ among the genera) and germline-expressed genes (approximately 1000–2000 or 5%–10% of the genes) are eliminated. Thirty-five percent of these eliminated genes are conserved among these nematodes, defining a core set of eliminated genes that are preferentially expressed during spermatogenesis. Our analysis supports the view that DNA elimination in nematodes silences germline-expressed genes. Over half of the chromosome break sites are conserved between Ascaris and Parascaris, whereas only 10% are conserved in the more divergent T. canis. Analysis of the chromosomal breakage regions suggests a sequence-independent mechanism for DNA breakage followed by telomere healing, with the formation of more accessible chromatin in the break regions prior to DNA elimination. Our genome assemblies and annotations also provide comprehensive resources for analysis of DNA elimination, parasitology research, and comparative nematode genome and epigenome studies.

Features TeloPrime Full-Length cDNA Amplification Kit

Transcriptomic study of Herpes simplex virus type-1 using full-length sequencing techniques

Zsolt Boldogkői, Attila Szűcs, Zsolt Balázs, Donald Sharon, Michael Snyder & Dóra Tombácz

Scientific Data, Article number: 180266 (2018)

Herpes simplex virus type-1 (HSV-1) is a human pathogenic member of the Alphaherpesvirinae subfamily of herpesviruses. The HSV-1 genome is a large double-stranded DNA specifying about 85 protein coding genes. The latest surveys have demonstrated that the HSV-1 transcriptome is much more complex than it had been thought before. Here, we provide a long-read sequencing dataset, which was generated by using the RSII and Sequel systems from Pacific Biosciences (PacBio), as well as MinION sequencing system from Oxford Nanopore Technologies (ONT). This dataset contains 39,096 reads of inserts (ROIs) mapped to the HSV-1 genome (X14112) in RSII sequencing, while Sequel sequencing yielded 77,851 ROIs. The MinION cDNA sequencing altogether resulted in 158,653 reads, while the direct RNA-seq produced 16,516 reads. This dataset can be utilized for the identification of novel HSV RNAs and transcripts isoforms, as well as for the comparison of the quality and length of the sequencing reads derived from the currently available long-read sequencing platforms. The various library preparation approaches can also be compared with each other.

Features TeloPrime Full-Length cDNA Amplification Kit

Lytic Transcriptome Dataset of Varicella Zoster Virus Generated by Long-read Sequencing

Dóra Tombácz, Donald Sharon, Attila Szűcs, Norbert Moldován, Michael Snyder, Zsolt Boldogkői

Frontiers in Genetics, doi: 10.3389/fgene.2018.00460


Varicella zoster virus (VZV) belongs to the Alphaherpesvirinae subfamily of the Herpesviridae family. It is the etiological agent of chickenpox (varicella) caused by primary infection and shingles (zoster), which is due to reactivation of the virus from latency (Kennedy, 2002). Many countries have adopted recommendations for routine immunization of children and susceptible adults against VZV. The VZV virion is composed of an icosahedral nucleocapsid surrounded by a tegument layer, which is covered by an envelope derived from the host cell membrane with incorporated viral glycoproteins (Maresova et al., 2005). The genome of VZV consists of a linear double-stranded DNA molecule and is approximately 125-kbp in size, which contains more than 70 annotated open reading frames (ORFs) (Tyler et al., 2007). The transcription of the virus is strictly regulated by cascade-like processes. First, the immediate-early (IE) transcripts are expressed, which is then followed by the expression of the early (E), and then the late (L) kinetic classes of transcripts (Reichelt et al., 2009). The IE ORF62 gene of VZV encodes the major transactivator, which controls the expression of other viral genes. The viral E genes encode proteins that are used in DNA replication, while L genes code for the structural elements of the virus.
High-throughput short-read sequencing (SRS) techniques have revolutionized transcriptome research (Delseny et al., 2010). These techniques have also been utilized in the investigation of herpesvirus gene expression (e.g. Chambers et al., 1999; Ebrahimi et al., 2003; Baird et al., 2014; Oláh et al., 2015). However, the SRS approach has severe limitations in comparison to long-read sequencing (LRS), including Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) platforms. LRS techniques have been used before in transcriptome studies of the herpesviruses (Tombácz et a, 2016; O’Grady et al., 2016; Tombácz et al., 2017; Balázs et al., 2017a; 2017b; Moldován et al., 2018). These studies uncovered a very complex transcriptome, which included the identification of a large number of novel RNA molecules and transcript isoforms (Tombácz et al., 2015; Tombácz et al., 2017; Balázs et al., 2017a). Moreover, an extended meshwork of overlaps between the transcripts was also detected by these studies (Tombácz et a, 2016; Moldován et al., 2018).
The presented data report is aimed toward providing a new, comprehensive transcript catalog of VZV using an LRS approach for the first time. In this study, we applied the ONT MinION device and various full-length cDNA sequencing protocols that capture the entire poly(A)-transcriptome of VZV.

Features TeloPrime Full-Length cDNA Amplification Kit

Transcriptome-wide survey of pseudorabies virus using next- and third-generation sequencing platforms

Dóra Tombácz, Donald Sharon, Attila Szűcs, Norbert Moldován, Michael Snyder & Zsolt Boldogkői

Scientific Data, doi:10.1038/sdata.2018.119

Pseudorabies virus (PRV) is an alphaherpesvirus of swine. PRV has a large double-stranded DNA genome and, as the latest investigations have revealed, a very complex transcriptome. Here, we present a large RNA-Seq dataset, derived from both short- and long-read sequencing. The dataset contains 1.3 million 100 bp paired-end reads that were obtained from the Illumina random-primed libraries, as well as 10 million 50 bp single-end reads generated by the Illumina polyA-seq. The Pacific Biosciences RSII non-amplified method yielded 57,021 reads of inserts (ROIs) aligned to the viral genome, the amplified method resulted in 158,396 PRV-specific ROIs, while we obtained 12,555 ROIs using the Sequel platform. The Oxford Nanopore’s MinION device generated 44,006 reads using their regular cDNA-sequencing method, whereas 29,832 and 120,394 reads were produced by using the direct RNA-sequencing and the Cap-selection protocols, respectively. The raw reads were aligned to the PRV reference genome (KJ717942.1). Our provided dataset can be used to compare different sequencing approaches, library preparation methods, as well as for validation and testing bioinformatic pipelines.

Features TeloPrime Full-Length cDNA Amplification Kit

Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus

Norbert Moldován, Dóra Tombácz, Attila Szűcs, Zsolt Csabai, Michael Snyder and Zsolt Boldogkői

Frontiers in Microbiology, doi:10.3389/fmicb.2017.02708

Third-generation sequencing is an emerging technology that is capable of solving several problems that earlier approaches were not able to, including the identification of transcripts isoforms and overlapping transcripts. In this study, we used long-read sequencing for the analysis of pseudorabies virus (PRV) transcriptome, including Oxford Nanopore Technologies MinION, PacBio RS-II, and Illumina HiScanSQ platforms. We also used data from our previous short-read and long-read sequencing studies for the comparison of the results and in order to confirm the obtained data. Our investigations identified 19 formerly unknown putative protein-coding genes, all of which are 5′ truncated forms of earlier annotated longer PRV genes. Additionally, we detected 19 non-coding RNAs, including 5′ and 3′ truncated transcripts without in-frame ORFs, antisense RNAs, as well as RNA molecules encoded by those parts of the viral genome where no transcription had been detected before. This study has also led to the identification of three complex transcripts and 50 distinct length isoforms, including transcription start and end variants. We also detected 121 novel transcript overlaps, and two transcripts that overlap the replication origins of PRV. Furthermore, in silico analysis revealed 145 upstream ORFs, many of which are located on the longer 5′ isoforms of the transcripts.

Features TeloPrime Full-Length cDNA Amplification Kit

This doctoral thesis consist of two parts: The first part describes a global survey of cisregulatory divergence in mammalian translation, where I applied mRNA sequencing and deep sequencing-based polysome profiling to quantify translational efficiency in F1 hybrid mice. The F1 progeny between Mus musculus C57BL/6J and Mus spretus SPRET/EiJ was chosen as a model system because the two have the largest number of genetic variants among all mouse strains with high-quality genome assemblies available. This large genomic divergence 1) provides a large number of potential regulatory variants between the two strains and 2) enables a sequencing-based approach to distinguish allelic RNA transcripts. The high quality of the data was demonstrated by employing two independent validation approaches, PacBio full-length sequencing and ribosome profiling. In total, 1008 genes (14.1%) were identified exhibiting significant allelic difference in translational efficiency. Several sequence features were associated with the observed allelic divergence in translation, including local RNA secondary structure near the start codon and proximal out-of-frame upstream AUGs. Finally, cis-effects are quantitatively comparable between transcriptional and translational regulation and these effects are more frequently compensatory between the two processes, suggesting a role of the translational regulation in buffering transcriptional noise and thereby maintaining the robustness of protein expression.

In the second part, I developed novel technology CAPTRE to measure the translational status of distinct mRNA TL isoforms. In mouse fibroblasts, a total of 22,357 TSSs derived from 10,875 protein-coding genes were identified. Among 4153 genes expressing multiple TSSs, 745 exhibited significant TE difference between their alternative TL isoforms. Longer isoforms were more frequently associated with lower TE and the global impact of several regulatory elements was also revisited, such as uORFs, cap-adjacent stable RNA secondary structures as well as 5′-terminal oligopyrimidine tract. In addition, several novel sequence motifs that can affect translation activity were identified and their effect was validated using two reporter systems. Finally, quantitative models combining different features identified in this study explained approximately 60% of the variance of the TE difference observed between TL isoforms.
This study provides novel mechanistic insights into translational regulation and characterizes the potential coupling between translational and transcriptional regulation in mammalian cells.

Features TeloPrime Full-Length cDNA Amplification Kit

Thyroglobulin Represents a Novel Molecular Architecture of Vertebrates

Guillaume Holzer, Yoshiaki Morishita, Jean-Baptiste Fini, Thibault Lorin, Benjamin Gillet, Sandrine Hughes, Marie Tohmé, Gilbert Deléage, Barbara Demeneix, Peter Arvan and Vincent Laudet

JBC.M116.719047. doi: 10.1074/jbc.M116.719047

Thyroid hormones modulate not only multiple functions in vertebrates (energy metabolism, central nervous system function, seasonal changes in physiology and behavior), but also in some non-vertebrates where they control critical post-embryonic developmental transitions such as metamorphosis. Despite their obvious biological importance, the thyroid hormone precursor protein, thyroglobulin (Tg), has been experimentally investigated only in mammals. This may bias our view of how thyroid hormones are produced in other organisms. In this study, we searched genomic databases and found Tg orthologs in all vertebrates including the sea lamprey (Petromyzon marinus). We cloned a full-size Tg coding sequence from western clawed frog (Xenopus tropicalis) and zebrafish (Dano rerio). Comparisons between the representative mammal, amphibian, teleost fish, and basal vertebrate indicate that all of the different domains of Tg, as well as Tg regional structure, are conserved throughout the vertebrates. Indeed, in Xenopus, zebrafish and lamprey Tgs, key residues, including the hormonogenic tyrosines and the disulfide bond-forming cysteines critical for Tg function are well conserved, despite overall divergence of amino acid sequences. We uncovered upstream sequences that include start codons of zebrafish and Xenopus Tgs, and experimentally proved that these are full-length secreted proteins, which are specifically recognized by antibodies against rat Tg. By contrast, we have not been able to find any orthologs of Tg among non-vertebrate species. Thus, Tg appears to be a novel protein elaborated as a single event at the base of vertebrates and virtually unchanged thereafter.

Features TeloPrime Full-Length cDNA Amplification Kit

cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing

Maria Cartolano, Bruno Huettel, Benjamin Hartwig, Richard Reinhardt, Korbinian Schneeberger

PLoS ONE 11(6):e0157779. doi:10.1371/journal.pone.0157779

The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci.

Features TeloPrime Full-Length cDNA Amplification Kit

Transcription initiated at alternative sites can produce mRNA isoforms with different 5ʹUTRs, which are potentially subjected to differential translational regulation. However, the prevalence of such isoform‐specific translational control across mammalian genomes is currently unknown. By combining polysome profiling with high‐throughput mRNA 5ʹ end sequencing, we directly measured the translational status of mRNA isoforms with distinct start sites. Among 9,951 genes expressed in mouse fibroblasts, we identified 4,153 showed significant initiation at multiple sites, of which 745 genes exhibited significant isoform‐divergent translation. Systematic analyses of the isoform‐specific translation revealed that isoforms with longer 5ʹUTRs tended to translate less efficiently. Further investigation of cis‐elements within 5ʹUTRs not only provided novel insights into the regulation by known sequence features, but also led to the discovery of novel regulatory sequence motifs. Quantitative models integrating all these features explained over half of the variance in the observed isoform‐divergent translation. Overall, our study demonstrated the extensive translational regulation by usage of alternative transcription start sites and offered comprehensive understanding of translational regulation by diverse sequence features embedded in 5ʹUTRs.

Features TeloPrime Full-Length cDNA Amplification Kit