Introduction to YOLOv5

The higher a camera’s megapixel, the more defined a picture will be and closer to what we see through our naked eyes. This analogy is similar to Computer Vision, where engineers are trying to integrate information about our surroundings into machines. The aim of computer vision is to enable machines to analyze and detect surrounding objects as accurately as humans.

One method to train a machine for Computer Vision is through YOLO. In this article, we will learn what YOLOv5 is. Let’s dive right in.

What is YOLO?

An acronym for You Only Look Once (YOLO), this real-time object detection system is part of the Darknet framework. It is named so as the image passes through the Fully Convolutional Neural Network (FCNN) only once.

Since its introduction in 2015 with the YOLO paper, developers Joseph Redmon and the team have created a frenzy in the Computer Vision world. Its popularity largely grew because it is faster than the conventional Region based-CNN model, which is a multi-step detection process.

The system has now been upgraded to newer versions. We will look into YOLOv5 (version 5).


What is YOLOv5?

A project started by Glenn Jocher on GitHub’s Ultralytics organization, YOLOv5 is not a direct prodigy of Darknet. It was written using Python with the PyTorch framework as compared to C & CUDA used in other YOLO versions.

Though not a successor of YOLOv4, YOLOv5’s structural architecture largely remains the same. It consists of the following:

  1. Input: This is an image, patch, etc., provided to the system.
  2. Backbone: This consists of the system’s neural network and does all the learning. Cross Stage Partial (CSP) Networks form the backbone of YOLOv5.
  3. Neck: The neck is used to create feature pyramids. It has a set of layers that mixes and combines image characters before it is sent for prediction. YOLOv5 uses PANet as its neck.
  4. Head: The output of the neck is fed into the head from which it makes box and class predictions. The head can be one-stage for dense prediction or two-stage for sparse prediction. YOLOv5 uses the same head as YOLOv3 and YOLOv4.

YOLOv5 processes images using a single neural network and then divides them into different parts. This is done to increase accuracy, with each part getting its own anchor box through an auto anchor process.

The process is completely automated and it recomputes the anchor boxes if the default ones are not accurate. With each part having its own box, the system analyses and predicts the result.

Types of YOLOv5 Models

YOLOv5 has five models as listed below:

  1. YOLOv5n: This is the smallest and fastest model. It is also called the nano model and finds applications in mobile devices owing to its size.
  2. YOLOv5s: This is also a comparatively small model. It is used to run results on CPUs.
  3. YOLOv5m: As the “m” indicates, this is a medium-sized model.
  4. YOLOv5l: This is a large model.
  5. YOLOv5x: This is the top model of the variants. However, the size causes speed compromises. 

Refer to the table below to compare each model’s performance:

NameCPU Speed (ms)GPU Speed (ms)Accuracy (mAP 0.5)Params (in Millions)
Table comparing the performance of YOLOv5 models [Source]

Significance of YOLOv5

Though not a direct offering from Darknet, YOLOv5 has still gained a massive fan base. Some of its key advantages are that it is faster (programmed in Python and uses the PyTorch framework), claims to have better accuracy (no official paper on YOLOv5 has been published yet), and is comparatively easy to use (it’s like the Arduino of the Computer Vision world). It is also smaller than YOLOv4 by almost 90%.

It works on a Mosaic Augmentation model in which four different images are combined into
one. This is done so that the system learns to deal with difficult images. The system also
supports YAML files.

YOLOv5 can be used to detect anything or everything, depending on how you train the system. Most common applications include vehicle detention, human detection, and obstruction detection.

Issues with YOLOv5

One of the biggest issues with YOLOv5 is that its developers have still not published a paper to certify its performance and capabilities. This may be because the YOLO version is still under development, and users receive timely updates.

A more researched alternative is PP-YOLO which also has an architecture similar to the YOLOv4. Then there is the YOLOv7 which has speeds and accuracy rates better than YOLOv5. For a much more advanced variant, check out YOLOX. The system uses an anchor-free technique for object detection.

Final Thoughts

Computer Vision technology has only started to really develop in the last five years. With YOLOv5, the technology has been made accessible even to rookie programmers. This is good news for ML and AI as they require as much input for learning as possible.

With new developments in the YOLO world like the YOLOX anchor-free technology and superior speeds with the YOLOv7, this is just the beginning for the Computer Vision space. The article has provided an overview of Yolov5.

I hope you understood what Yolov5 is and why it is a big deal. All the best for your future coding endeavors. Happy coding!

Related Articles

Ashwin Joy

I'm the face behind Pythonista Planet. I learned my first programming language back in 2015. Ever since then, I've been learning programming and immersing myself in technology. On this site, I share everything that I've learned about computer programming.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts