Affiliation:
1. Biology Department, Centre for Novel Agricultural Products (CNAP) University of York Wentworth Way York YO10 5DD UK
2. Information and Computational Sciences James Hutton Institute Dundee DD2 5DA UK
Abstract
SUMMARYAccurate quantification of gene and transcript‐specific expression, with the underlying knowledge of precise transcript isoforms, is crucial to understanding many biological processes. Analysis of RNA sequencing data has benefited from the development of alignment‐free algorithms which enhance the precision and speed of expression analysis. However, such algorithms require a reference transcriptome. Here we generate a reference transcript dataset (LsRTDv1) for lettuce (cv. Saladin), combining long‐ and short‐read sequencing with publicly available transcriptome annotations, and filtering to keep only transcripts with high‐confidence splice junctions and transcriptional start and end sites. LsRTDv1 identifies novel genes (mostly long non‐coding RNAs) and increases the number of transcript isoforms per gene in the lettuce genome from 1.4 to 2.7. We show that LsRTDv1 significantly increases the mapping rate of RNA‐seq data from a lettuce time‐series experiment (mock‐ and Botrytis cinerea‐inoculated) and enables detection of genes that are differentially alternatively spliced in response to infection as well as transcript‐specific expression changes. LsRTDv1 is a valuable resource for investigation of transcriptional and alternative splicing regulation in lettuce.
Funder
Rural and Environment Science and Analytical Services Division
Biotechnology and Biological Sciences Research Council