Aquaculture Europe 2023

September 18 - 21, 2023

Vienna,Austria

Add To Calendar 19/09/2023 11:00:0019/09/2023 11:15:00Europe/ViennaAquaculture Europe 2023THE LONGER ROAD TO FUNCTIONAL ANNOTATION: THE USE OF FULL-LENGTH NANOPORE RNA-SEQ FOR ALTERNATIVE ISOFORM DISCOVERY IN ATLANTIC SALMON Salmo salar EMBRYOGENESISStolz 0The European Aquaculture Societywebmaster@aquaeas.orgfalseDD/MM/YYYYaaVZHLXMfzTRLzDrHmAi181982

THE LONGER ROAD TO FUNCTIONAL ANNOTATION: THE USE OF FULL-LENGTH NANOPORE RNA-SEQ FOR ALTERNATIVE ISOFORM DISCOVERY IN ATLANTIC SALMON Salmo salar EMBRYOGENESIS

Oliver Eve*1, Manu Kumar Gundappa1, Diego Perojil-Morata1, and Daniel Macqueen1

 

1Roslin Institute, College of Medicine & Veterinary Medicine, University of Edinburgh, EH25 9RG

Email: oliver.eve@ed.ac.uk

 



Introduction

Atlantic salmon is an important commercial aquaculture species worth billions to the global economy. The generation of a high-quality reference genome for Atlantic salmon (Lien et al., 2016) and the increased availability of ‘omic tools has led to a significant drive towards the annotation of functional genome elements and causative genetic variants underpinning phenotypic traits (Macqueen et al., 2017; Houston & Macqueen, 2019).

Transcriptomics via RNA-seq is a powerful tool for genome functional annotation. Whilst highly accurate, short-read RNA-seq methods such as Illumina struggle to properly capture exonic chaining and identify alternative splice sites, which can lead to inaccurate gene and transcript annotation. Long-read Nanopore RNA-seq captures the full-length of transcripts in a single molecule. Compared with short-read RNA-seq, this aids identification of alternative splice sites, thus allowing better characterisation of transcript diversity and identification of novel isoforms (Kuo et al., 2017; Kuo et al., 2020). Thus supporting the discovery of causative gene and isoform variants through improved transcriptome annotation.

Embryogenesis is a critical stage of ontogeny where many cell types arise and differentiate. As such, the embryonic transcriptome is a valuable resource for functional annotation due to the high diversity of cell types and patterns of gene expression present during early development.

This study falls within the framework of the European project AQUA-FAANG, which aims to develop functional annotation maps for 6 commercial aquaculture species, including Atlantic salmon. Long-read nanopore RNA-seq was performed on Atlantic salmon embryos, at 6 key stages of development (mid-blastula, mid-gastrula, early-, mid-, late-somitogenesis and eyed stage) as defined by the AQUA-FAANG consortium.

Methods

Total RNA was extracted using a phenol-chloroform method before mRNA isolation via Dynabeads mRNA isolation kit. Sequencing libraries were prepared in accordance with protocols detailed in ONT Direct cDNA Sequencing kit (SQK-DCS109). Samples were barcoded before sequencing for 72h on a PromethION device using R9.4.1 chemistry.

Basecalling and demultiplexing was carried out with Guppy_v5.0.11. Reads with q-score <7 were filtered from the data using NanoFilt_v2.7.1. Full-length reads were identified using Pychopper_v2.5.0 and then mapped to the latest Ssal_v3.1 genome assembly using minimap2_v2.22 (Li et al., 2018). Reads were collapsed into consensus transcript models using TAMA (Kuo et al., 2020). Then, single-exon transcript models with read support <50, and all models with read support <3 were discarded. The final long-read transcriptome was compared with the reference annotation using SQANTI3 (Tardiguila et al., 2018).

Results

Approximately 50 million raw reads comprising 56 Gb were obtained in 72 hours from a single PromethION flowcell. Of those, approximately 10 million high-quality, full-length reads (N50 = 1,366 bp) were obtained. 31,230 genes and 243,991 unique isoforms were characterised by the long-read transcriptome. 78% (189,751) of isoforms were deemed be novel by SQANTI3 possessing either a new combination of known splice sites or at least one novel splice site. An example of the identification of novel transcript isoforms can be seen in Figure 1.

Discussion

Long-read RNA-seq is a powerful tool for isoform discovery. In this study, the average number of isoforms per gene doubled, from 3.9 in the reference Ensembl annotation, to 7.8 in the long-read annotation, thus describing a wealth of previously unannotated transcripts. Such improvements in genome annotation could aid in identification of causative genes and variants for traits of relevance to salmonid production and welfare, as well as traits of ecological importance.

Further investigation will examine differential transcript isoform usage across the 6 developmental stages. This data could contribute to future improvement in publicly available transcriptome annotations through the AQUA-FAANG project.

References

Houston, R.D. and Macqueen, D.J. (2019). Animal Genetics, 50(1), pp.3-14.

Kuo, R.I., Tseng, E., Eory, et al. (2017). BMC Genomics, 18(1), pp. 1-19.

Kuo, R.I., Cheng, Y., Zhang, R., et al. (2020). BMC Genomics, 21(1), pp. 1-22.

Li, H. (2018). Bioinformatics, 34(18), pp. 3094-3100.

Lien, S., Koop, B.F., Sandve, S.R., et al. (2016). Nature, 533(7602), pp. 200-205.

Macqueen, D.J., Primmer, C.R., Houston, R.D., et al. (2017). BMC Genomics, 18, p. 484.

Tardaguila, M., De La Fuente, L., Marti, C., et al. (2018). Genome Research, 28(3), pp. 396-411.