Aquaculture Europe 2025

September 22 - 25, 2025

Valencia, Spain

Add To Calendar 24/09/2025 16:15:0024/09/2025 16:30:00Europe/ViennaAquaculture Europe 2025AUTOMATIC SIZE ESTIMATION OF SEA BREAM IN THEIR NATURAL UNDERWATER ENVIRONMENT USING DEEP LEARNING (DL)Goleta, Hotel - Floor 14The European Aquaculture Societywebmaster@aquaeas.orgfalseDD/MM/YYYYaaVZHLXMfzTRLzDrHmAi181982

AUTOMATIC SIZE ESTIMATION OF SEA BREAM IN THEIR NATURAL UNDERWATER ENVIRONMENT USING DEEP LEARNING (DL)

J. Martinez-Peiró1*; P. Muñoz-Benavent1; G. Andreu-García1; P. G. Holhorea2; A. Belenguer2; J. Pérez-Sánchez2

1Automatics and Industrial Informatics Institute (AI2), Universitat Politècnica València, Spain

2 Nutrigenomics and Fish Growth Endocrinology, Institute of Aquaculture Torre de la Sal (IATS- CSIC), Spain.

* Email contact: joama14j@upv.es



1. Introduction

 This  study  leverages Deep Learning (DL) models and stereoscopic vision techniques to develop tools that automatically estimate fish size in near real-time, using a non-intrusive approach. Accurate and efficient length measurement in intensive aquaculture systems requires advanced object detection algorithms capable of extracting highly discriminative features specific to fish. However, practical fish detection still presents challenges, such as complex underwater backgrounds, varied fish orientations, and a lack of large-scale annotated datasets.

M onitoring  fish  growth  is essential in aquaculture  for optimizing feed quantity, oxygen supply, and overall  animal welfare.  Typically, length estimation relies on manual sample collection, a process that can induce stress, negatively affect fish growth and, in some cases,  result in  mortality. This highlights the urgent need for automated, non-invasive  alternatives to manual handling.  Stereoscopic vision techniques in computer vision have proven effective for estimating the size of freely swimming fish .

 The primary objective of this work is to automatically process underwater stereo videos  of  freely swimming fish  to  obtain accurate measurements of many individuals in near real-time. The main contributions of the proposed approach are as follows:

  • An encapsulated stereoscopic camera system was designed and prototyped to support long-term monitoring in underwater environments.
  •  A ground-truth dataset was developed, consisting of real images annotated with keypoints required for accurate analysis.
  •  A Convolutional Neural Network (CNN) was trained to automatically detect keypoints in the images, and this output was subsequently combined with stereoscopic vision to estimate fish length.
  •  Automatic measurements were validated through comparison with manual measurements.

 2.  Stereoscopic vision system and video acquisition

 A stereoscopic camera system was  prototyped using a watertight enclosure  with  sealed wiring, ensuring durability in aquatic environments. Its design required careful consideration of parameters such as focal length, inter-camera distance, shutter speed, and other optical and mechanical factors.  Video recordings of sea bream at different growth  stages were conducted at the Institute of Aquaculture Torre de la Sal (IATS- CSIC ) facilities, resulting in six hours of footage at varying resolutions and frame rates, with the configuration detailed in Figure 1.

 3. Results: Applying DL models to automatic  fish length estimation

 A dataset was  generated by manually annotating the snout and tail keypoints of individual fish,  resulting in 250 images and 700 annotated individuals. Two DL-based object detection models were trained: Keypoint R-CNN and YOLO. Keypoint R-CNN extends the Mask R-CNN architecture for keypoint detection. It first uses a Region Proposal Network (RPN) to generate candidate regions and then refines these proposals with a CNN-based object detection head.  YOLO (You Only Look Once) is a fast, single-stage object detection algorithm that treats detection as a regression problem. It divides an image into a grid and simultaneously predicts bounding boxes and keypoints  per  cell in  a single  pass through  the network , making  it well-suited for real-time applications.

Table 1 compares the average  fish lengths  obtained  from  manual  measurements  and  the  two DL models. Both models yielded accurate results, with YOLO performing slightly better ,  particularly in tank 25. The average processing times per frame were 0.347 ms for Keypoint R-CNN and 0.068 ms for YOLO. An example of automatic fish detection and length estimation is shown in Figure 2.

4. Conclusion

 A dataset of sea bream was developed with manually annotated snout and tail keypoints to train Keypoint R-CNN and YOLO networks for automatic fish length estimation. Both models demonstrated high accuracy across different growth stages when compared to manual measurements. These results support the potential of Deep Learning and stereoscopic vision as a robust, non-invasive, and automated method for fish length estimation in aquaculture.

Funding

The work was funded by MCIN NextGenerationEU (PRTR-C17.I1) and by Generalitat Valenciana (THINKINAZUL/2021/007).