Differential gene expression tools


RNASeq has a wide variety of applications and it is one of the most used techniques to analyse RNA sequencing data. Major steps in RNASeq data analysis include experimental design, quality control, read alignment, quantification of gene and transcript level visualization, gene expression, alternative splicing, variant detection, pathway and functional analysis, gene fusion detection etc.

RNA Seq workflow

Figure1: RNA Seq workflow [1]

Expression is quantified to study cellular changes in response to external stimuli, differences between healthy and diseased states, and other research questions. Gene expression is often used as a proxy for protein abundance, but these are often not equivalent due to post transcriptional events such as RNA interference and nonsense-mediated decay.

Expression is quantified by counting the number of reads that mapped to each locus in the transcriptome assembly step. Expression can be quantified for exons or genes using contigs or reference transcript annotations. These observed RNA-Seq read counts have been robustly validated against older technologies, including expression microarrays and qPCR.

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Gene Expression

Figure 2: Garber, M., et al. (2011), Nature Methods, 8(6), 469–477

The original goal of RNA sequencing was to identify which genomic loci are expressed in a cell (population) at a given time over the entire expression range, i.e., to offer a superior alternative to cDNA microarrays. Indeed, RNA-seq was shown to detect lowly expressed transcripts while suffering from strongly reduced false positive rates in comparison to microarray based expression quantification (Illumina, 2011; Nookaew et al., 2012; Zhao et al., 2014). Since RNA-seq does not rely on a pre-specified selection of cDNA probes, there are numerous additional applications of RNA-seq that go beyond the counting of expressed transcripts of known genes, such as the detection and quantification of non-genic transcripts, splice isoforms, novel transcripts and protein-RNA interaction sites. However, the detection of gene expression changes (i.e., mRNA levels) between different cell populations and/or experimental conditions remains the most common application of RNA-seq.

General workflow of a differential gene expression analysis is as follows:

RNA extraction

Library Preparation

Sequencing

Processing of sequencing reads

Estimation of individual gene expression levels

Normalization

Identification of differentially expressed (DE) genes

Talking about the Bioinformatics analysis part i.e starting from the processing of sequencing reads.

FastQC is a JAVA based quality control tool from Babraham Institute. Import of data is possible from FastQ files, BAM or SAM format. This tool provides an overview to inform about problematic areas, summary graphs and tables to rapid assessment of data. Results are presented in HTML permanent reports. FastQC can be run as a stand-alone application or it can be integrated into a larger pipeline solution.

Homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

BBDuk

Ultrafast, multithreaded tool to trim adapters and filter or mask contaminants based on kmer-matching, allows hamming or edit-distance, as well as degenerate bases. Open-source, written in Java; supports all platforms with no recompilation and no other dependencies.

Tasks performed

  • Quality-trimming and filtering
  • Format conversion
  • Contaminant concentration reporting
  • GC-filtering
  • Length-filtering
  • Entropy-filtering
  • Chastity-filtering
  • Text histograms
  • Interconverts between fastq, fasta, sam, scarf, interleaved and 2-file paired, gzipped, bzipped, ASCII-33 and ASCII-64.

Cutadapt

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. Comes in handy when read lengths are longer than the sequenced molecule, example, small-RNA and microRNA reads.

Homepage: http://cutadapt.readthedocs.io/en/stable/index.html

FASTX Toolkit

A set of command line tools for short read pre-processing, input formats: FASTA or FASTQ format. Available tools allow:

  • Conversion from FASTQ to FASTA format
  • Quality statistics
  • Adapter removal or trimming
  • Base trimming based on quality scores

Homepage: http://hannonlab.cshl.edu/fastx_toolkit/

PRINSEQ

PRINSEQ can be used to filter, reformat, or trim your genomic and metagenomic sequence data. It generates summary statistics of your sequences in graphical and tabular format.

Homepage: http://prinseq.sourceforge.net/

TagCleaner

This tool can be used to automatically detect and efficiently remove tag sequences (e.g. WTA tags) from genomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.

Homepage: http://tagcleaner.sourceforge.net/

Trimmomatic

Performs trimming for Illumina reads and works with FASTQ reads (single or pair-ended). Enables the following:

  • Trim adapters
  • Trim bases based on quality thresholds
  • Trim reads to a specific length
  • Convert quality scores to Phred-33/64.

Homepage: http://www.usadellab.org/cms/?page=trimmomatic

NxTrim

Adapter trimming and virtual library creation routine for Illumina Nextera Mate Pair libraries.

 Deconseq

Detect and remove contaminations from sequence data.

Bowtie

It is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

Bowtie also forms the basis for other tools, including

  • TopHat: a fast splice junction mapper for RNA-seq reads
  • Cufflinks: a tool for transcriptome assembly and isoform quantization
  • Crossbow: a cloud-computing software tool for large-scale resequencing data
  • Myrna: a cloud computing tool for calculating differential gene expression in large RNA-seq datasets. [10]

Homepage: http://bowtie-bio.sourceforge.net/index.shtml

BWA

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It supports three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.

The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.[12]

Homepage: http://bio-bwa.sourceforge.net/

Maq

Maq builds assemblies by mapping short reads to a reference genome. Maq first aligns reads to reference sequences and then calls the consensus. At the mapping stage, maq performs ungapped alignment.

Maq homepage: http://maq.sourceforge.net/

Mapping stage

SHRiMP -> SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem’s colourspace genomic representation.

SHRiMP was originally designed and written by Michael Brudno and Stephen M. Rumble, with considerable input and testing by the SidowLab. Since then, Adrian Dalca, Marc Fiume and Vladimir Yanovsky have made considerable contributions to probability calculations and 2-pass SMS mapping algorithms.[13] Link to download SHRiMP is as follows:

http://compbio.cs.toronto.edu/shrimp/releases/SHRiMP_2_2_3.lx26.x86_64.tar.gz

Workflow of SHRiMP:

Workflow of SHRiMP

Stampy -> It is a package for the mapping of short reads from illumina sequencing machines onto a reference genome. It’s recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq. Stampy excels in the mapping of reads containing that contain sequence variation relative to the reference, in particular for those containing insertions or deletions. It can map reads from a highly divergent species to a reference genome for instance. Stampy achieves high sensitivity and speed by using a fast hashing algorithm and a detailed statistical model. [14] Link to download Stampy is as follows:

http://www.well.ox.ac.uk/software-download-registration

There are many reads which span the exon-exon junction and cannot be aligned directly by short aligners, for which we need spliced aligners.

MapAl -> It is a tool for RNA-Seq expression profiling that builds on the established programs Bowtie and Cufflinks. Allowing an incorporation of ‘gene models’ already at the alignment stage almost doubles the number of transcripts that can be measured reliably.[15] It is present in a form of a package and can be downloaded from the following link:

http://www.bioinf.boku.ac.at/pub/MapAl/pkg/MapAl.tar.gz

RUM (RNASeq unified mapper) -> RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.RUM can also be used effectively for DNA sequencing (e.g. ChIP-Seq) and microarray probe mapping. RUM also has a strand specific mode. RUM is highly configurable, however it does not require fussing over options — the defaults generally give good results.[16] RUM is an alignment pipeline that maps reads in three phases. First it maps against the genome using Bowtie, then it maps against a transcriptome database using Bowtie, then it maps against the genome using Blat. The information from the three mappings is merged into one mapping. This leverages the advantages of genome and transcriptome mapping as well as combining the speed of Bowtie with the sensitivity and flexibility of Blat. Coverage plots are generated, normalized intensities for genes, introns and exons are generated, and files describing the junctions are generated. Files are also generated that have the alignment for each read, one per line, in RUM and SAM format. Installation instructions of RUM are as under:

https://github.com/itmat/rum/wiki/Installing-RUM

Pipeline of RUM is as follows:

Pipeline of RUM

SpliceSeq -> SpliceViewer is a Java application that allows researchers to investigate alternative mRNA splicing patterns in data from high-throughput mRNA sequencing studies. Sequence reads are mapped to splice graphs that unambiguously quantify the inclusion level of each exon and splice junction. The graphs are then traversed to predict the protein isoforms that are likely to result from the observed exon and splice junction reads. UniProt annotations are mapped to each protein isoform to identify potential functional impacts of alternative splicing. This tool may be used on a single RNASeq sample to identify genes with multiple spliceforms, on a pair of samples to identify differential splicing between the two, or on groups of samples to identify statistically significant group level differences in splicing patterns. SpliceSeq can be run from the install page as a java web start application to explore the sequencing data on our server or can be installed locally to analyze your own mRNA-Seq data.[17] The installation link is given on the website as :

http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview

Tophat Pipeline

RNA Express

The RNA Express BaseSpace® app combines the capabilities of the STAR aligner and DE-Seq analysis tools in one simple workflow. The aim of this app is to provide the most commonly used set of RNA analysis features in a convenient and rapid analysis package. [2]

Up to 192 total control and comparison samples. Major Outputs * Aligned reads – BAM format * Read counts – number of reads mapped to each gene for each sample * DESeq2 results. Before running the RNA Express app, please be aware of the following limitations: * Reads must be at least 35 bp and no more than 500 bp in length. * Individual samples must be between 100,000 and 400 million reads. * The total read count across all samples must be less than 2 billion reads. * Only UCSC hg19 (human), UCSC mm10 (mouse), and UCSC rn5 (rat) are currently supported. [20][21] It is an application which can be accessed by following the steps given in the following link:

https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/basespace/rna-express-user-guide-15052918a.pdf

RNA express workflow is as under:

RNA Express Workflow
RNA Control Vs Comparison
Control Vs Comparison

Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, SAGE and CAGE. [4][5][6][7][8] EdgeR is a bioconductor package.

EdgeR can be found on the following link :

https://bioconductor.org/packages/release/bioc/html/edgeR.html

Following are some plots which are generated using EdgeR package. This guide provides an overview of the Bioconductor package edgeR for differential expression analyses of read counts arising from RNA-Seq, SAGE or similar technologies[8]. The package can be applied to any technology that produces read counts for genomic features. Of particular interest are summaries of short reads from massively parallel sequencing technologies such as IlluminaTM, 454 or ABI SOLiD applied to RNA-Seq, SAGE-Seq or ChIP-Seq experiments and pooled shRNA-seq or CRISPR-Cas9 genetic screens. edgeR provides statistical routines for assessing differential expression in RNA-Seq experiments or differential marking in ChIP-Seq experiments.

A particular feature of edgeR functionality, both classic and glm, are empirical Bayes methods that permit the estimation of gene-specific biological variation, even for experiments with minimal levels of biological replication. edgeR can be applied to differential expression at the gene, exon, transcript or tag level. In fact, read counts can be summarized by any genomic feature. edgeR analyses at the exon level are easily extended to detect differential splicing or isoform-specific differential expression.

edgeR analyses Average Log CPM
edgeR analyses Average Log2 CPM
edgeR analyses Treathrcc
MDS plot

Ballgown is a software package designed to facilitate flexible differential expression analysis of RNA-Seq data. It also provides functions to organize, visualize, and analyze the expression measurements for your transcriptome assembly.[19]

Before using the Ballgown R package, a few preprocessing steps are necessary:

  1. RNA-Seq reads should be aligned to a reference genome.
  2. A transcriptome should be assembled, or a reference transcriptome should be downloaded.
  3. Expression for the features (transcript, exon, and intron junctions) in the transcriptome should be estimated in a Ballgown readable format.

Ballgown can be obtained by typing the following line on the R studio terminal:

source(“http://bioconductor.org/biocLite.R”)

biocLite(“ballgown”)

A plot for transcript structures from Ballgown package is as under[22]:

Transcript Structure from Gene
Transcript Clustering

This package identifies differential expression in high-throughput ‘count’ data, such as that derived from next-generation sequencing machines, calculating estimated posterior likelihoods of differential expression (or more complex hypotheses) via empirical Bayesian methods.
To install this package and to run in it R we have to type:

source(“https://bioconductor.org/biocLite.R”)

biocLite(“baySeq”)

Package source is also available at http://www.bioconductor.org/packages/release/bioc/html/baySeq.html

There is a package called CEDER which detects  differentially expressed genes (DEG) using RNA-Seq by combining significance of exons within a gene.It contains R scripts and perl scripts.

The perl script summarizes the details of mapped reads within  each exon of transcript. The R script requires DESeq package and is use to finally detect differentially expressed genes.[23]

It can be readily downloaded from the following link:

Just click CEDER on the link http://www-rcf.usc.edu/~fsun/Programs/CEDER/CEDERmain.html

Infer alternative splicing from paired-end RNA-seq data. The model is based on counting paths across exons, rather than pairwise exon connections, and estimates the fragment size and start distributions non-parametrically, which improves estimation precision.[24]
To install and run this package in R type:

source(“https://bioconductor.org/biocLite.R”)

biocLite(“casper”)

Its source can be downloaded from the link :

http://bioconductor.org/packages/release/bioc/html/casper.html

The Cufflinks suite of tools can be used to perform a number of different types of analyses for RNA-Seq experiments. The Cufflinks suite includes a number of different programs that work together to perform these analyses. The complete workflow, performing all the types of analyses Cufflinks can execute, is summarized in the graph below. The left side illustrates the “classic” RNA-Seq workflow, which includes read mapping with TopHat, assembly with Cufflinks, and visualization and exploration of results with CummeRbund.[25]

Cufflinks does not work alone, it works with cuffmerge, cuffdiff, cuffquant and cuffcompare. These all come in a different pipeline as shown in the following figure.

It is the most widely used pipeline to find differentially expressed genes.[26]

Pipeline to find differentially expressed genes

Now how cufflinks works along with Tophat is shown in the following figure[26]:

tophat-cufflinks-works
Assembly Abundance Estimation
Transcripts and Abundances

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

A basic task in the analysis of count data from RNA-Seq is the detection of differentially expressed genes. The count data are presented as a table which reports, for each sample, the number of reads that have been assigned to a gene. Analogous analyses also arise for other assay types, such as comparative ChIP-Seq. The package DESeq provides methods to test for differential expression by use of the negative binonial distribution and a shrinkage estimator for the distribution’s variance.[27]

It can be obtained through the following link:

http://bioconductor.org/packages/release/bioc/html/DESeq.html

Following are some plots given from the analysis of data in the DESeq reference manual[28]:

Data in the DESeq

Emperical (black dots) and fitted (red lines) dispersion values plotted against the mean of normalised counts

Log2 fold change

Plot of normalised mean versus log2 fold change for the contrast untreated versus treated.

Gene Standard Deviation

Per gene standard deviation, against the rank of mean, for the shifted logarithm

DiffSplice is a novel tool for discovering and quantitating alternative splicing variants present in an RNA-seq dataset, without relying on annotated transcriptome or pre-determined splice pattern. For two groups of samples, DiffSplice further utilizes a non-parametric permutation test to identify significant differences in expression at both gene level and transcription level. DiffSplice takes as input the SAM files that supply the alignment of the RNA-seq reads on the reference genome, obtained from an RNA-seq aligner like MapSplice. The results of DiffSplice are summarized as a decomposition of the genome and can be visualized using the UCSC genome browser[29].

EQP-cluster (Exon quantification pipeline)

EQP-cluster is a Unix-based RNA-seq quantification pipeline which takes a set of sample Fastq files as input, aligns them against reference files, and generates files with the gene, exon, or junction counts for each sample.

It provides scripts to facilitate the distributed execution of alignment and quantification operations for each of the samples and is designed to support the Univa* Grid Engine (UGE) batch-queuing/scheduling system (job submission via qsub).

If only the quantification step is of interest and your Fastq files are pre-aligned, then you can also use
https://github.com/novartis/EQP-QM
which is a Unix based RNA-seq quantification module; it uses SAM/BAM genome alignment files as input and creates gene, exon, and junctions counts[30].

This pipeline can be downloaded from the following link and installation instructions are also given in this:

https://github.com/Novartis/EQP-cluster

One of the figures which EQP can generate are:

EQP generate Gene G and Read R

Figure showing compatibility of read alignment

featureCounts

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.

featureCounts takes as input SAM/BAM files and an annotation file including chromosomal coordinates of features. It outputs numbers of reads assigned to features (or meta-features). It also outputs stat info for the overall summarization results, including number of successfully assigned reads and number of reads that failed to be assigned due to various reasons.

featureCounts is also available in the Bioconductor R package Rsubread. You need to have R installed on your computer to run featureCounts in Rsubread. Rsubread is part of the Bioconductor project[31].

To download and install featureCounts instructions are given on the following link:

http://bioinf.wehi.edu.au/subread-package/

globalSeq

The method may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size.

source(“https://bioconductor.org/biocLite.R”)

biocLite(“globalSeq”)

gobalSeq can be downloaded from the following link:

http://bioconductor.org/packages/release/bioc/html/globalSeq.html

LIMMA tool

LIMMA is a library for the analysis of gene expression microarray data, especially the use of linear models for analysing designed experiments and the assessment of differential expression. LIMMA provides the ability to analyse comparisons between many RNA targets simultaneously in arbitrary complicated designed experiments. Empirical Bayesian methods are used to provide stable results even when the number of arrays is small. The linear model and differential expression functions apply to all gene expression technologies, including microarrays, RNA-seq and quantitative PCR[32]

Limma can be downloaded from the following link:

http://bioconductor.org/packages/release/bioc/html/limma.html

MetaDiff

MetaDiff is a Java/R-based software package that performs differential expression analysis on RNA-Seq based data. By utilizing a meta-regression framework, it is able to take advantage of the information regarding the variance of the estimates to make the inference more accurate. Meta-regression also enables incorporation of covariates other than experimental group, which makes it extremely simple to adjust for confounding parameters in an experiment.

A compiled JAR package is ready for download at:

http://github.com/jiach/MetaDiff/blob/master/out/artifacts/MetaDiff_jar/MetaDiff.jar?raw=true

It is a pipeline for estimating isoform expression and allelic imbalance in diploid organisms based on RNA-Seq. The pipeline employs tools like Bowtie, TopHat, ArrayExpressHTS and SAMtools. Also, edgeR or DESeq to perform differential expression[34].

Following is the link to download MMSEQ:

http://bgx.org.uk/software/mmseq_1.0.2.zip

Analogy between meta regression and isoform differential expression analysis in RNASeq

Meta Regression and Isoform Differential Expression Analysis

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, It can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate. Genomic-coordinate files can be visualized by both UCSC Genome browser and Broad Institute’s Integrative Genomics Viewer (IGV). Transcript-coordinate files can be visualized by IGV. RSEM also has its own scripts to generate transcript read depth plots in pdf format. The unique feature of RSEM is, the read depth plots can be stacked, with read depth contributed to unique reads shown in black and contributed to multi-reads shown in red. In addition, models learned from data can also be visualized. Last but not least, RSEM contains a simulator[33].

RSEM can be downloaded from the following link:

http://deweylab.github.io/RSEM/

Following are some plots generated using RSEM:

RSEM Alignment statistics
Number of Alignments per read

Transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.

They have proposed a statistical method to estimate transcript isoform abundances from RNA-Seq data. Their method can handle gapped alignments of reads against reference sequences so that it allows insertion or deletion errors within reads. The proposed method optimizes the number of transcript isoforms by variational Bayesian inference through an iterative procedure, and its convergence is guaranteed under a stopping criterion.

It can be installed with the instructions given in the following link:

https://github.com/nariai/tigar

.

  1. Robinson, MD, and Smyth, GK (2008). Small sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321–332.
  2. Lun, ATL, Chen, Y, and Smyth, GK (2016). It’s DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Methods in Molecular Biology 1418, 391–416.
  3. Chen, Y, Lun, ATL, and Smyth, GK (2014). Differential expression analysis of complex RNA-seq experiments using edgeR. In: Statistical Analysis of Next Generation Sequence Data, Somnath Datta and Daniel S Nettleton (eds), Springer, New York.
  4. Zhou X, Lindsay H, and Robinson MD (2014). Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Research, 42, e91.
  5. Dai, Z, Sheridan, JM, Gearing, LJ, Moore, DL, Su, S, Wormald, S, Wilcox, S, O’Connor, L, Dickins, RA, Blewitt, ME and Ritchie, ME (2014). edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens. F1000Research 3, 95.
  6. Robinson, M., McCarthy, D., and Smyth, G. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140.
  7. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biol10:R25.
  8. Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60. [PMID: 19451168]
  9. https://sourceforge.net/projects/bio-bwa/files/latest/download?source=files
  10. SHRiMP: Accurate Mapping of Short Color-space Reads. Stephen M. Rumble, Phil Lacroute, Adrian V. Dalca, Marc Fiume, Arend Sidow, Michael Brudno. https://doi.org/10.1371/journal.pcbi.1000386
  11. Lunter and Goodson. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011. 21:936-939.
  12. http://bioinf.boku.ac.at/
  13. Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM)Gregory R. Grant, Michael H. Farkas, Angel Pizarro, Nicholas Lahens, Jonathan Schug, Brian Brunk, Christian J. Stoeckert Jr, John B. Hogenesch and Eric A. Pierce.
  14. Ryan MC, Cleland J, Kim R, Wong WC, Weinstein JN(2012). SpliceSeq: A Resource for Analysis and Visualization of RNA-Seq Data on Alternative Splicing and Its Functional Impacts. Bioinformatics, 10.1093.
  15. Dobin et al, Bioinformatics 2012; doi: 10.1093/bioinformatics/bts635
    http://bioinformatics.oxfordjournals.org/content/29/1/15
  16. Flexible analysis of transcriptome assemblies with Ballgown Alyssa C. Frazee1 , Geo Pertea2,3 ,Andrew E. Jaffe1,3,4 , Ben Langmead1,2,3,5 , Steven L. Salzberg1,2,3,5 , & Jeffrey T. Leek1,3∗. doi: http://dx.doi.org/10.1101/003665
  17. Dobin, A., Davis, C., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., et al. (2012). STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England), 15-21. DOI: 10.1093/bioinformatics/bts635. http://bioinformatics.oxfordjournals.org/content/early/2012/10/25/bioinformatics.bts635
  18. Anders, S., and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology (London, England), R106. DOI: 10.1186/gb-2010-11-10-r106. http://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-10-r106
  19. https://github.com/alyssafrazee/ballgown
  20. Wan L, Sun FZ (2011): CEDER: Accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq. IEEE/ACM Transactions on Computational Biology and Bioinformatics (APBC2012) 9(5): 1281-1292
  21. R, C. SA, M. K and A. S (2014). “Quantifying alternative splicing from paired-end RNA-seq data.”Annals of Applied Statistics8(1), pp. 309-330.
  22. http://cole-trapnell-lab.github.io/cufflinks/manual/
  23. Nat Biotechnol.2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Trapnell C1Williams BAPertea GMortazavi AKwan Gvan Baren MJSalzberg SLWold BJPachter L
  24. Differential expression of RNA-Seq data at the gene level – the DESeq package Simon Anders1 , Wolfgang Huber
  25. https://bioconductor.org/packages/devel/bioc/vignettes/DESeq/inst/doc/DESeq.pdf
  26. https://github.com/Novartis/EQP-cluster
  27. Liao Y, Smyth GK and Shi W (2013). The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108
  28. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W and Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research43(7), pp. e47.
  29. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. Bo Li and Colin N Dewey
  30. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Ernest TurroEmail author, Shu-Yi Su, Ângela Gonçalves, Lachlan JM Coin, Sylvia Richardson and Alex Lewin