Aquaculture Europe 2025

September 22 - 25, 2025

Valencia, Spain

Add To Calendar 24/09/2025 14:30:0024/09/2025 14:45:00Europe/ViennaAquaculture Europe 2025A STEREO VISION FRAMEWORK FOR FISH MONITORING IN AQUACULTUREGoleta, Hotel - Floor 14The European Aquaculture Societywebmaster@aquaeas.orgfalseDD/MM/YYYYaaVZHLXMfzTRLzDrHmAi181982

A STEREO VISION FRAMEWORK FOR FISH MONITORING IN AQUACULTURE

Qin Zhang1, Martin Føre1,*, Eleni Kelasidi2,3

1Department of Engineering Cybernetics, NTNU, Norway

2 Department of  Mechanical and Industrial Engineering, NTNU, Norway

3Department of Aquaculture Technology, SINTEF Ocean, Norway

 *E-mail: martin.fore@ntnu.no



Introduction

 The aquaculture industry is shifting from manual operations and experience-based reasoning towards a more objective, data-driven approach to meet growing global seafood demand and support the expansion of salmon farming into  larger and more exposed offshore sites. This shift is driven by the integration of intelligent sensors, mathematical models, and decision support or autonomous systems across production stages , aiming  to increase productivity, enhance sustainability, and improve fish welfare. This study presents a stereo vision framework for automated fish monitoring in finfish aquaculture, enabling tasks such as fish detection, tracking, behaviour identification, and size measurement. The proposed frame reduces the need for intrusive or labour-intensive monitoring methods, providing a more precise and efficient foundation for fish stock management .

This work was financed by the Research Council of Norway through the project: CHANGE   ̶  An underwater robotics concept for dynamically changing environments [1].

Materials and methods

Stereo videos of salmon fish were  collected from two separate experiments at SINTEF’s industrial-scale research fish farm site, Korsneset [2], in September 2022 (P1) and March 2023 (P2).  P1 aimed  to  investigate  fish behavioural responses to intrusive objects with varying appearance. A central structure, equipped with a stereo camera (comprising two synchronized Lucid TRI032S-CC GigE cameras)  on top and two Ping360 sonars (one on top and the other on the bottom),  was decorate d  in six different configurations varying in shape, size, and colour. The average fish weight during P1 was 1 kg.  P2 was designed to study the impact of  Remotely Operated Vehicle (ROV) motion on fish behaviour . The ROV was equipped with  the same stereo  camera  setup as in P1, and the average fish weight during P2 was 3.5 kg.

 The proposed framework employs two independent YOLOv8 models: one dedicated to fish detection using O riented B ounding Boxes (OBBs), and the other focused on anatomical landmark identification through integrated keypoint detection. The approach bega n with stereo camera calibration to correct lens distortions and establish spatial relationships between the left and right views. Two custom training datasets were  then  created by annotating fish with OBBs and label ling  specific body parts of the fish such as the mouth and fins to train the respective models .  During deployment, each stereo video frame is first split into  a calibrated and aligned left-right image pair . The OBB detection model is  then applied to both images to detect fish , with  detections  in the left image tracked using the ByteTrack algorithm [3] . T he keypoint  detection  model is subsequently used to localise desired anatomical landmarks on each detected fish body.  Hungarian algorithm [4]  thereafter associates  the tracked fish and their keypoints in the left image with corresponding detections in the right. Disparities are computed from these associations and used to estimate depth and 3D coordinates through the pre-determined calibration parameters and triangulation geometry.  This spatial reconstruction enables further estimation of fish 3D swimming trajectories and motion patterns by continuously tracking positions of body parts  across frames, as well as measuring fish length by calculating the distance between the mouth and central caudal fin keypoints.

Results

Figure 1a illustrates keypoint tracking, where the centr e of the OBB, mouth, gill, and caudal fin of an individual fish were identified and tracked. These keypoints were used to generate  detailed 3D trajectories and motion patterns , as visualised in Figures 1b and 1c.  The results revealed  distinct motion patterns across different body parts: the caudal fin displayed the most dramatic speed variations, followed by the mouth, while the gills and OBB centers showed comparatively lower variability across all metrics.

Fish body lengths estimated from the stereo videos were 39 cm for P1 and 56 cm for P2. To validate these measurements, a weight- length relationship based on commercial farm data [5] was used to calculate reference body lengths of 41 cm and 63 cm for P1 and P2, respectively. The stereo video estimates closely match the reference lengths, demonstrating the reliability of our length estimation methodology and highlighting its potential for future applications.

Conclusion and future work

The proposed stereo vision framework, utilis ing OBB-based object detection and keypoint detection models ,  provides a robust, automated, and non-invasive solution for fish monitoring in aquaculture.  Future developments will aim to integrate the  two  detection models into a unified end-to-end network to enhance processing speed. Additionally, extending the number of detectable keypoints will allow for capturing the contours and natural curvature of the fish body, providing a more accurate representation of its morphology for detailed sizing and behavioural analysis.

References

[1]                       CHANGE  ̶ An Underwater Robotics Concept for Dynamically Changing Environments. https://www.sintef.no/en/projects/2021/change-an-underwater-robotics-concept-for-dynamically-changing-environments/

[2]                       SINTEF ACE. https://www.sintef.no/en/all-laboratories/ace/ 

[3]                       Zhang et al., 2022, October. Bytetrack: Multi-object tracking by associating every detection box. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzerland.

[4]                       Kuhn, H.W., 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, 2(1‐2), pp.83-97.

[5]                       Zhang et al., 2024. Farmed Atlantic salmon (Salmo salar L.) avoid intrusive objects in cages: The influence of object shape, size and colour, and fish length. Aquaculture, 581, p.740429.