Introduction to YOLOv4

YOLO (You Look Only Once) is an algorithm that is popular in object detection. Object detection is a technology that is related to computer vision.

Object detectors are used in locating the object in a digital image or sequence of images. Before the invention of YOLO, various object detectors such as sliding window object detection, R-CNN, Fast R-CNN, and RetinaNet were used. YOLO became popular because of its detecting speed and accuracy compared to others.

Computer Vision

Computer vision is related to artificial intelligence, which entitles the system to derive useful information from digital images or videos.

Computer vision is mainly concerned with the analysis and understanding of a single digital image. This article helps the reader to know the significance, history, uses, and how YOLO works.

Introduction to YOLO

YOLO is an algorithm that uses a Neural network technique to detect objects in digital images or videos. Neural Network is nothing but a series of algorithms that tries hard to recognize the relationship in a set of data in the same way that the human brain does.

YOLO is an object detector that locates and defects various objects. The object may be a person, a car, a motorbike, etc. YOLO was introduced by Joseph Redmon in the year of 2015.

YOLO is based on regression algorithms. It predicts the class and bounding boxes for the entire image. YOLO is widely used in the area of real-time detection.

In YOLO, a single neural network predicts the bounding boxes and class probabilities directly from an image. YOLO is extremely fast. It can process images at 45 frames per second. When compared to others, YOLO makes fewer background errors.


How Does YOLO Work?

Before we get started, let us be clear about some of the terms such as classification, localization, and detection.

Classification means to classify what kind of object is presented in the image. Localization means finding where the particular object is located. Detection is nothing but locating the object in the image and labeling it.

Residual Blocks

First of all, YOLO splits the image into small grids. It divides the image into equal dimensions of the S x S (19×19) grid. Each of the grids is recognized for detecting and locating the object.

Bounding Box

Bounding boxes are an outline created around the object as a highlight. It is created to detect and locate the object accurately.


A Vector with some components is formed. Those components are:

  1. Probability of class, Pc
  2. Center coordinator, (bx, by)
  3. Height, bh
  4. Width, by
  5. Class

Intersection Over Union

IoU is calculated by dividing the area of the bounding boxes which is overlapped. If IoU is greater than or equal to 0.5 or above, then the predicting bounding box can be the same as the ground truth bounding box. IoU= Area of overlap/Area of Union.

Anchor Box

Anchor boxes are used in YOLO when the image has one central point for two objects. By using these elements, an object is detected in an image. YOLO helps in ensuring accurate and faster object detection.

Object Detection

Object detection is one of the advanced techniques which detects and classifies objects in an image with bounding boxes that uses neural networks.

It involves the classification, recognition, localization, and detection of objects in real time. In short, it is the combination of two tasks: image classification and object localization.

Two Stage Object Detection

This kind of object detection method breaks the process into two stages:

  1. Detecting the possible object region.
  2. Classifying the image in those regions. 

Some two-stage object detectors are R-CNN, FPN, and Mask R-CNN.

One Stage Object Detection

One-stage detection requires only a single pass through the neural network and predicts the bounding box in one run. Single-stage detection is done with one run.

Some of the one-stage detectors are SSD and YOLO. YOLO is a popular real-time object detector that is a one-stage object detector.


YOLOv4 runs twice faster than Efficient Det. Improves YOLOv3’s AP by 10% and FPS by 12%. The main aim of this YOLOv4 design is to increase the operating speed and optimization of parallel computation.

It is an efficient and powerful object detection model. To train a super fast and accurate object detector, it makes use of 1080 Ti or 2080 Ti GPU.

Modifications are made to make them more efficient and suitable for single GPU training. YOLOv4 has obtained the state of the act on the COCO data set with 43.5% AP and at 65 FPS.

Bag of freebies and bag of special influence YOLOv4 in performing object detection more accurately.

Structure of YOLOv4

  • Input: Image, Patches, Image pyramid
  • Backbone: CSPDarknet 53
  • Neck: 
    • Additional blocks: Spatial Pyramid Pooling(SPP)
    • Path-aggregation blocks: PAN( PANet path aggregation)
  • Heads: YOLOv3
  • Dense prediction (one-stage):
    • Anchor-based: RPN, SSD, YOLO, Retina Net
    • Anchor free: CornerNet, CentreNet, MatrixNet, FCOS
  • Sparse prediction (two-stage): 
    • Anchor-based: Faster R-CNN, R-FCN, Mask R-CNN
    • Anchor free: RepPoints (anchor free)

Significance of YOLOv4

  • While comparing this design with others, YOLOv4 performs super fast and more accurately.
  • This novel architecture ensures a higher input network size for detecting multiple small-sized objects.
  • More layers are given for higher receptive fields to cover the increased size of the input network.
  • More parameters are made for the greater capacity of the model to detect multiple objects in a single image of different sizes.
  • SSP block was added over the backbone to ensure no reduction in network operation speed.
  • New methods of data augmentation, such as Mosaic and Self-Adversarial Training, were introduced for mixing several images and for good performance in training, respectively.
  • Optimal hyper parameters were used while applying genetic algorithms.
  • Some existing methods also were modified to improve performance and effective training, such as Modified SAM, Modified PAN, and cross mini-Batch Normalization.

Applications of YOLO

YOLO object detection is applied in various fields. Object detection helps to avoid accidents on roads when there is autonomous driving. It helps to detect wildlife by detecting different species of animals so that we get information about their migration.

In the developing technical world, object detection will be more useful in the field of robotics. YOLO Object detection techniques will be useful for security in places where people are restricted from passing through.

Bag of Freebies (BoF)

The set of methods that changes the training strategy or only increases the training cost shall be termed a bag of freebies. It increases the performance without any interference in time. And improvements are seen in data augmentation and data management. 

The data augmentation increases the variability of input images and ensures getting the utmost information from the dataset. The Bag of Freebies used in the backbone of YOLOv4 are CUtmix and Mosaic data augmentation, and for detectors are Self-adversarial training.

Bag of Specials (BoS) 

The “Bag of Specials” are plugin modules and post-processing methods that improve object detection accuracy and increase inference cost by small amounts. 

The bag of specials used in the backbone of YOLOv4 are CSP and MiWRC. The BoS used for detectors are Mish activationSPP-blocksSAM-blockPath-aggregation block, and DIoU-NMS.

Final Thoughts

YOLOv4 is a state-of-the-art- detector that is extremely fast and more accurate when compared to others. The conventional GPU, which has 8-16 GB-VRAM, can be used to train the detector.

In this technological world, object detectors like YOLOv4 having more features and selected techniques for accurate performance will be very useful.

YOLO was very useful early on, and in later stages, the shortcomings of YOLO were overcome step-by-step by introducing YOLOv2, YOLO9000, YOLOv3, and YOLOv4.

YOLOv4 will be a good base and a backbone for upcoming versions too. I hope you found this article helpful. Happy coding!

Related Articles

Ashwin Joy

I'm the face behind Pythonista Planet. I learned my first programming language back in 2015. Ever since then, I've been learning programming and immersing myself in technology. On this site, I share everything that I've learned about computer programming.

Leave a Reply

Your email address will not be published.

Recent Posts