Simple Camera-based Object Distance Estimation

Abstract

There has been a quick and effective increase in computer vision research in recent years, and this will continue. Part of this success may be attributed to the adoption and adaptation of Machine Learning methods, while other parts can be attributed to the invention of novel representations and models for specific computer vision challenges, as well as the development of cost-effective solutions. Object detection is one area that has made significant strides in recent years. Object detection has been used in a variety of applications, including robotics, consumer electronics (e.g., smart phones), security, and transportation (e.g., autonomous and assisted driving). In this thesis, the detection task is the first job completed since it enables the acquisition of further information about the identified object as well as about the surrounding scene. Once an instance of an item has been detected, it is possible to gain more information, such as the ability to identify an object and estimated its distance. It is the goal of this study to give a detailed and in-depth explanation of how to find objects and figure out how far apart they are. This thesis is primarily concerned with the creation of object distance measurement and feature extraction algorithms using the You Only Look Once (YOLO) method combined with the Triangle Similarity and Monodepth2 approach for calculating distance with a single fixed camera. The purpose of this thesis is to investigate the detection ability of the method, YOLOv4-tiny, which is one of the most common nowadays. Furthermore, it is more accurate than other detection methods and executes more quickly. The YOLO method outperforms all of the measures we looked at while still delivering a high frame rate for real-time use. Instead of picking the most appealing part of an image, the YOLO technique predicts classes and bounding boxes for the entire image in a single algorithm run. We recommend using a combination of the YOLOv4-tiny and the Triangle Similarity and a very well-known approach called Monodepth2 of the lens camera to estimate the distance between the detected item and the camera. This will allow for a more accurate measurement of the distance. Using the YOLO approach, we detect an object in an image and extract its location and width from the image. This is also known as a virtual image. The items utilized in the tests are photographs of everyday things such as bottles, people, bags, and cars,... By comparing the real and imaginary widths of an object, the triangle similarity approach will be able to determine the focal length of a camera and, as a result, determine the best distance between it and the object. At the end of the process, the linear regression approach is used to forecast the error from the observed distance.

Description

Subject(s)

Neural network, Deep Learning, Object detection, Estimated distance, YOLO, Triangle Similarity, Monodepth2, computer vision, real-time

Citation