Introduction
The aquaculture industry is shifting from manual operations and experience-based reasoning towards a more objective, data-driven approach to meet growing global seafood demand and support the expansion of salmon farming into larger and more exposed offshore sites. This shift is driven by the integration of intelligent sensors, mathematical models, and decision support or autonomous systems across production stages , aiming to increase productivity, enhance sustainability, and improve fish welfare. This study presents a stereo vision framework for automated fish monitoring in finfish aquaculture, enabling tasks such as fish detection, tracking, behaviour identification, and size measurement. The proposed frame reduces the need for intrusive or labour-intensive monitoring methods, providing a more precise and efficient foundation for fish stock management .
This work was financed by the Research Council of Norway through the project: CHANGE ̶ An underwater robotics concept for dynamically changing environments [1].
Materials and methods
Stereo videos of salmon fish were collected from two separate experiments at SINTEF’s industrial-scale research fish farm site, Korsneset [2], in September 2022 (P1) and March 2023 (P2). P1 aimed to investigate fish behavioural responses to intrusive objects with varying appearance. A central structure, equipped with a stereo camera (comprising two synchronized Lucid TRI032S-CC GigE cameras) on top and two Ping360 sonars (one on top and the other on the bottom), was decorate d in six different configurations varying in shape, size, and colour. The average fish weight during P1 was 1 kg. P2 was designed to study the impact of Remotely Operated Vehicle (ROV) motion on fish behaviour . The ROV was equipped with the same stereo camera setup as in P1, and the average fish weight during P2 was 3.5 kg.
The proposed framework employs two independent YOLOv8 models: one dedicated to fish detection using O riented B ounding Boxes (OBBs), and the other focused on anatomical landmark identification through integrated keypoint detection. The approach bega n with stereo camera calibration to correct lens distortions and establish spatial relationships between the left and right views. Two custom training datasets were then created by annotating fish with OBBs and label ling specific body parts of the fish such as the mouth and fins to train the respective models . During deployment, each stereo video frame is first split into a calibrated and aligned left-right image pair . The OBB detection model is then applied to both images to detect fish , with detections in the left image tracked using the ByteTrack algorithm [3] . T he keypoint detection model is subsequently used to localise desired anatomical landmarks on each detected fish body. Hungarian algorithm [4] thereafter associates the tracked fish and their keypoints in the left image with corresponding detections in the right. Disparities are computed from these associations and used to estimate depth and 3D coordinates through the pre-determined calibration parameters and triangulation geometry. This spatial reconstruction enables further estimation of fish 3D swimming trajectories and motion patterns by continuously tracking positions of body parts across frames, as well as measuring fish length by calculating the distance between the mouth and central caudal fin keypoints.
Results
Figure 1a illustrates keypoint tracking, where the centr e of the OBB, mouth, gill, and caudal fin of an individual fish were identified and tracked. These keypoints were used to generate detailed 3D trajectories and motion patterns , as visualised in Figures 1b and 1c. The results revealed distinct motion patterns across different body parts: the caudal fin displayed the most dramatic speed variations, followed by the mouth, while the gills and OBB centers showed comparatively lower variability across all metrics.
Fish body lengths estimated from the stereo videos were 39 cm for P1 and 56 cm for P2. To validate these measurements, a weight- length relationship based on commercial farm data [5] was used to calculate reference body lengths of 41 cm and 63 cm for P1 and P2, respectively. The stereo video estimates closely match the reference lengths, demonstrating the reliability of our length estimation methodology and highlighting its potential for future applications.
Conclusion and future work
The proposed stereo vision framework, utilis ing OBB-based object detection and keypoint detection models , provides a robust, automated, and non-invasive solution for fish monitoring in aquaculture. Future developments will aim to integrate the two detection models into a unified end-to-end network to enhance processing speed. Additionally, extending the number of detectable keypoints will allow for capturing the contours and natural curvature of the fish body, providing a more accurate representation of its morphology for detailed sizing and behavioural analysis.
References
[1] CHANGE ̶ An Underwater Robotics Concept for Dynamically Changing Environments. https://www.sintef.no/en/projects/2021/change-an-underwater-robotics-concept-for-dynamically-changing-environments/
[2] SINTEF ACE. https://www.sintef.no/en/all-laboratories/ace/
[3] Zhang et al., 2022, October. Bytetrack: Multi-object tracking by associating every detection box. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzerland.
[4] Kuhn, H.W., 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, 2(1‐2), pp.83-97.
[5] Zhang et al., 2024. Farmed Atlantic salmon (Salmo salar L.) avoid intrusive objects in cages: The influence of object shape, size and colour, and fish length. Aquaculture, 581, p.740429.