Region-based Convolutional Neural Networks

In this article, let’s learn what a region-based convolutional network is, its advantages, disadvantages, and use cases. We will also see the differences between CNN and RCNN in detail.

Let’s dive right in.

What is RCNN?

Region-based Convolutional Neural Network was developed by Ross Girshick. It is an Object
Detection Algorithm which is region based. This family of machine learning models is mostly
used in computer vision, especially for object detection where multiple objects are involved.

Difference between CNN and RCNN

Convolutional Neural Network (CNN) is primarily used for image classification, whereas RCNN
is used for object detection. CNN can only give us information regarding the class of the objects, and
the location of the object being classified is not identified. CNN also does not perform well when
there are multiple different objects in the visual field due to interference.

On the other hand, when it comes to RCNN, object localization is the primary problem that needs to be solved for object detection. Object detection helps identify types or classes of objects whose presence is located. The brute force approach also called exhaustive search uses a sliding window of different sizes to locate the objects in the image.

R-CNN uses a combination of both segmentation and exhaustive search which is called selective search. The input image is segmented into different regions and the different shapes are separated by using different colors.

An iterative greedy algorithm approach repeatedly combines smaller regions with larger regions. The selective search proposes candidate object locations from region proposals. Features for these proposals were computed by feeding them into CNN architecture. A class-specific linear SVM model is used to classify each region.

Architecture of RCNN

The images given as input are run through the search algorithm to extract region proposals. The
regions that are warped are fed into CNN architecture and finally, the regions are classified
using support vectors.

Each class is independently trained by a binary SVM model. It takes the generated feature vectors and produces a confidence score as to whether the object is present in that particular region. As an extra step, a bounding box regressor is used to localize the objects which are present in the image more precisely.

The regressor is a scale-invariant linear regression model which precisely locates the bounding box in the image. This method sometimes results in extra bounding boxes which are handled by using a non-maximum suppression algorithm.

The algorithm discards objects whose confidence score falls below a set threshold value. Regions with the highest probability and don’t have an intersection over union with the predicted regions are selected.

Advantages and Applications of RCNN

Object localization and object detection are the two core operations in computer vision and therefore have a lot of real-world applications in a variety of fields.

It is used in autonomous vehicles for perceiving objects in their surroundings to ensure a safe driving experience.
In the field of construction, RCNN can be used for maintenance work like analyzing high-resolution pictures of rust
In the manufacturing industry, RCNN can be used for defective product identification and automated inspections.

Generally, since the base model of RCNN does not have the performance needed for real-time application, models built on this base such as Fast-RCNN, Faster RCNN, or mask RCNN are used. All these variations try to improvise the testing and the analysis of the generated region proposals.

Disadvantages of RCNN

The selective search algorithm is very rigid. Since it’s a fixed algorithm no learning happens during the search. This sometimes could result in the generation of bad region proposals.
It is very time-consuming since the number of region proposals is approximately 2000 or more. Multiple steps in the RCNN architecture have to be trained separately. Hence the implementation of the model is very slow.
Real-time application is not possible due to the amount of time it takes to test images.
Feature maps of the region proposals need to be saved. This increases the amount of memory space needed during training.

Conclusion

Regional Convolutional Neural Network is an object detection algorithm hence it is quite different from image classification. This led to more efficient models built on top of the base RCNN model which improves efficiency and speed.

It helped deal with a lot of problems CNN faced in terms of multi-objects and finding the location of the objects in the image. It led to a lot of variations built on top of this technology and is one of the core technology used in emerging fields like computer vision.