The goal of the project was the creation of a model for detection of vehicles, persons and animals in a stream of video in real time.

The starting point was the YOLO3 model with pre-trained weights, which was modified and further trained on the customer dataset. 

Object detection is a common computer vision problem which deals with identifying and locating objects of certain classes in an image. With this kind of localization, object detection can be used to count objects in a scene and determine and track their precise locations, all while accurately labeling them. It is an area of big interests, as it is part of autonomous driving.

Interpreting the object localisation can be done in various ways, including creating a bounding box around the object or marking every pixel in the image which contains the object (called segmentation). The state-of-the-art methods can be categorized into two main types:

  • One-stage methods: prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
  • Two stage-methods: prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.

Uporabljena orodja: Python, Tensorflow