Region Proposal Network (RPN) — Backbone of Faster R-CNN
In object detection using R-CNN, RPN is the one true backbone and have proven to be very efficient till now. Let's explore it more.
In object detection using R-CNN, RPN is the one true backbone and have proven to be very efficient till now. It's purpose is to propose multiple objects that are identifiable within a particular image.
This method was proposed by Shaoqing Ren, Kaiming He, Ross Girshick and Jian Sun in a very popular paper on "Faster R-CNN : Towards Real Time Object Detection with Region Proposal Networks". This is a very popular algorithm which attracted attention of a lot of Data Scientists, Deep Learning and AI engineers. It has enormous application like detecting objects in a self-driving car, assisting differently abled person and helping them to get independent etc.
What is CNN?
CNN translates to Convolutional Neural Networks which is a very popular algorithm for image classification and typically comprises of convolution layers, activation function layers, pooling (primarily max_pooling) layers to reduce dimensionality without losing a lot of features. For this article, you should know that there is a feature map that is generated by the last layer of convolutional layer.
For example, If you feed a cat image or a dog image, the algorithm can tell you whether it is dog or cat.
But it does not stop here, with great computational capabilities comes great advancements.
Many pre-trained models are developed to directly use them without going through the pain of training models due to computational limitation. Many models got popular as well like VGG-16, ResNet 50, DeepNet, AlexNet by ImageNet. You can find pre-trained research models from Tensorflow by google here.
For this particular article, I specifically want to talk about an algorithm or an idea which I thought was very clever from the above stated paper. Many people implement Faster R-CNN to identify the objects but this algorithm specifically dwells into the logic and math behind how algorithm gets the box around the identified objects.
The developers of the algorithm called it Region Proposal Networks abbreviated as RPN.
To generate these so called "proposals" for the region where the object lies, a small network is slide over a convolutional feature map that is the output by the last convolutional layer.
Credit: Original Research Paper
Above is the architecture of Faster R-CNN. RPN generate the proposal for the objects. RPN has a specialized and unique architecture in itself. I want to further breakdown the RPN architecture.
Credits: Original Research Paper
RPN has a classifier and a regressor. The authors have introduced the concept of anchors. Anchor is the central point of the sliding window. For ZF Model which was an extension of AlexNet, the dimensions are 256-d and for VGG-16, it was 512-d. Classifier determines the probability of a proposal having the target object. Regression regresses the coordinates of the proposals. For any image, scale and aspect-ratio are two important parameters. For those who don't know, aspect ratio = width of image/height of image, scale is the size of the image. The developers chose 3 scale and 3 aspect-ratio. So, total of 9 proposals are possible for each pixel, this is how the value of k is decided, K=9 for this case, k being the number of anchors. For the whole image, number of anchors is W*H*K.
This algorithm is robust against translations, therefore one of the key property of this algorithm it is translational invariant.
Presence of multi-scale anchors in the algorithm results in "Pyramid of Anchors" instead of "Pyramid of Filters" which makes it less time consuming and more cost efficient than previously proposed algorithms like Multi-Box.
But How Does It Work?
These anchors are assigned label based on two factors:
- The anchors with highest Intersection-over-union overlap with a ground truth box.
- The anchors with Intersection-Over-Union Overlap higher than 0.7.
Ultimately, RPN is an algorithm that needs to be trained. So we definitely have our Loss Function.
i → Index of anchor, p → probability of being an object or not, t →vector of 4 parameterized coordinates of predicted bounding box, * represents ground truth box. L for cls represents Log Loss over two classes.
p* with regression term in the loss function ensures that if and only if object is identified as yes, then only regression will count, otherwise p* will be zero, so the regression term will become zero in the loss function.
Ncls and Nreg are the normalization. Default λ is 10 by default and is done to scale classifier and regressor on the same level.
For this paper, results were obtained after training this algorithm on famous PASCAL VOC dataset.
The more advancement is being carried on in the field of instance segmentation.
If you want to go more granular, here is the link to the paper: https://arxiv.org/pdf/1506.01497.pdf.
You might also like
How to Lead and Manage a Flexible Workforce
In this era of disruptions, resignations, and remote work, business leaders are faced with the challenge of adapting to new workforce realities while also offering support, encouragement, and motivation to their employees.Read article
How Data Helps Overcome Supply Chain Issues
5 tips to help businesses manage critical supply chain issues.Read article
A Deep-Dive into Downtime. Why Does it Happen?
Successfully handling sales peaks while avoiding downtime should be the goal of any business. We’ll be covering every aspect of downtime in a series of posts, including details of how to build resilience into your cloud architecture – ensuring you minimize your business’ exposure to any outages.Read article
The Race for the Ideal Retail Payment Solution
The payments market is expected to increase by more than 50% in the next five years. What is driving this growth?Read article
Does Your Company Need a Chief Purpose Officer?
Purpose is embedded and integrated within the organization and impacts both internal and external communities.Read article
How Route Optimization Improves Efficiency in Last Mile Delivery
It's not just about where your product ends up, but also how it got there.Read article
4 External Datasets to Help Retailers Make Smarter Decisions
Here are the crucial ways to utilize external data to focus on your customers through personalization, convenience, and shared values to drive your bottom line.Read article
How to Choose a Last Mile Logistics Solution: 5 Key Considerations
Does choice exist when you consider things very clearly? Read this deep dive to find out what factors influence last mile implementationRead article
How 4 Retail Strategies Will Future-Proof Your Business
The retail experience is not the same as it was even just a few years ago.Read article