Introducrtion
Air breathing catfishes (clariids ) are a group of stenohaline freshwater fish that can withstand various environmental conditions and farming practices. Therefore, considerable efforts are being made to increase and optimize industrial production of the African catfish (Clarias gariepinus) .
However, there is tremendous lack of genomic resources indispensable to study genetic traits and variations, that are of high relevance in domestication and adaptation of clariids to aquaculture environments. In particular, there is a need of a high quality assembled and annotated draft genome as reference basis for addressing questions in catfish research in general, and specifically, to understand and enhance aquaculture production and performance traits of the catfish.
Here, we sequenced the genome of the African catfish C. gariepinus , one of the most commonly farmed clariids , and generated a gapless telomere-to-telomere (T2T) de novo chromosome-level assembly with high-resolution haplotypes, by integrating long-range sequencing (Hi-C) with PacBio single-molecule (HiFi), Oxford Nanopore, and Illumina sequencing data.
Results
The diploid genome assembly yielded 58 contigs with a total length of 969.72 Mb and a contig N50 of 33.71 Mb. We reported 25,655 predicted protein-coding genes and 49.94% repetitive elements in the C. gariepinus genome. Interspersed repeat are the most abundant class of repetitive elements (46%). Retroelements and DNA transposons accounted for only 12 and 6 percent of the repeatome, respectively . Approximately 99% of the assembled genome is spanned by the 28 chromosomes of the primary assembly, without gaps . The distribution of genes and repeats across the chromosomes followed the typical distribution in vertebrate genomes, with higher gene densities in GC-rich regions and lower gene densities in repeat-rich distal and pericentromeric regions .
We performed various assessments to support the high quality and completeness of our African catfish genome assembly. The BUSCO completeness was 97.5% with only 2% of the genes missing, showing that the gene space spanned by our genome assembly is nearly com plete. Furthermore, approximately 92% of the C. gariepinus transcripts could map on our assemblies (>90% coverage and >90% identity), indicating their high functional completeness. We also mapped genomic reads to our assemblies to assess structural accuracy and found that more than 96.69% of raw PE reads were concordantly aligned. The alignment rate of ONT, HiFi, and Hi-C reads to the primary assembly was 99.91%, 99.95%, and 100%, respectively.
Furthermore, we annotated 6,403 full-length ribosomal RNA, 154 microRNA, and 13,536 transfer RNA throughout the African catfish genome. Remarkably, 96% (6150/6406) of the predicted 5S rRNA genes were all found in a single cluster on a 2-Mbp region on both chromosome 4 (n = 2455) and chromosome 13 (n = 3725). Similarly, 84% (21/25) of the predicted 18S rRNA genes were clustered within the first 500 kbp upstream in the terminal telomeric region of chromosome 27 (Figure 1).
The comparative phylogenomic analyses performed with OrthoFinder assigned 336,681 (94%) of 390,198 genes to 27,587 orthogroups shared among catfishes and two outgroup species (common carp and goldfish). 16,281 genes in C. gariepinus were found to be orthologous between the 14 catfish species, with 378 of them being single-copy orthologs. According to our estimated phylogenetic tree using protein sequences of all homologous single-copy genes, air breathing catfishes (Claridae clade) split as a monophyletic group around 98 Mya, which is roughly comparable to the divergence time between rodents and humans (96 Mya) (Figure 2).
Conclusion
Our genome assembly provides the first comprehensive gene annotation and haplotype information, such as the male-specific haplotype, enabling us to identify critical genes and molecular mechanisms underlying amphibious traits and terrestrial adaptation of air breathing catfishes. We found that several gene families involved in ion transport, osmoregulation, oxidative stress response, and muscle metabolism were expanded or positively selected in clariids , suggesting a potential role in their transition to air breathing capabilities .
The reported findings expand our understanding of the genomic mechanisms underpinning the resilience and adaptive mechanisms of C. gariepinus to adverse ecological conditions. They will serve as a valuable resource for future studies in elucidating these unique biological traits in related teleost’s and leverage these insights for aquaculture improvement.