Region Proposal Network (RPN) — Backbone of Faster R-CNN
In object detection using R-CNN, RPN is the one true backbone and have proven to be very efficient till now. Let's explore it more.
In object detection using R-CNN, RPN is the one true backbone and have proven to be very efficient till now. It's purpose is to propose multiple objects that are identifiable within a particular image.
This method was proposed by Shaoqing Ren, Kaiming He, Ross Girshick and Jian Sun in a very popular paper on "Faster R-CNN : Towards Real Time Object Detection with Region Proposal Networks". This is a very popular algorithm which attracted attention of a lot of Data Scientists, Deep Learning and AI engineers. It has enormous application like detecting objects in a self-driving car, assisting differently abled person and helping them to get independent etc.
What is CNN?
CNN translates to Convolutional Neural Networks which is a very popular algorithm for image classification and typically comprises of convolution layers, activation function layers, pooling (primarily max_pooling) layers to reduce dimensionality without losing a lot of features. For this article, you should know that there is a feature map that is generated by the last layer of convolutional layer.
For example, If you feed a cat image or a dog image, the algorithm can tell you whether it is dog or cat.
But it does not stop here, with great computational capabilities comes great advancements.
Many pre-trained models are developed to directly use them without going through the pain of training models due to computational limitation. Many models got popular as well like VGG-16, ResNet 50, DeepNet, AlexNet by ImageNet. You can find pre-trained research models from Tensorflow by google here.
For this particular article, I specifically want to talk about an algorithm or an idea which I thought was very clever from the above stated paper. Many people implement Faster R-CNN to identify the objects but this algorithm specifically dwells into the logic and math behind how algorithm gets the box around the identified objects.
The developers of the algorithm called it Region Proposal Networks abbreviated as RPN.
To generate these so called "proposals" for the region where the object lies, a small network is slide over a convolutional feature map that is the output by the last convolutional layer.
Credit: Original Research Paper
Above is the architecture of Faster R-CNN. RPN generate the proposal for the objects. RPN has a specialized and unique architecture in itself. I want to further breakdown the RPN architecture.
Credits: Original Research Paper
RPN has a classifier and a regressor. The authors have introduced the concept of anchors. Anchor is the central point of the sliding window. For ZF Model which was an extension of AlexNet, the dimensions are 256-d and for VGG-16, it was 512-d. Classifier determines the probability of a proposal having the target object. Regression regresses the coordinates of the proposals. For any image, scale and aspect-ratio are two important parameters. For those who don't know, aspect ratio = width of image/height of image, scale is the size of the image. The developers chose 3 scale and 3 aspect-ratio. So, total of 9 proposals are possible for each pixel, this is how the value of k is decided, K=9 for this case, k being the number of anchors. For the whole image, number of anchors is W*H*K.
This algorithm is robust against translations, therefore one of the key property of this algorithm it is translational invariant.
Presence of multi-scale anchors in the algorithm results in "Pyramid of Anchors" instead of "Pyramid of Filters" which makes it less time consuming and more cost efficient than previously proposed algorithms like Multi-Box.
But How Does It Work?
These anchors are assigned label based on two factors:
- The anchors with highest Intersection-over-union overlap with a ground truth box.
- The anchors with Intersection-Over-Union Overlap higher than 0.7.
Ultimately, RPN is an algorithm that needs to be trained. So we definitely have our Loss Function.
i → Index of anchor, p → probability of being an object or not, t →vector of 4 parameterized coordinates of predicted bounding box, * represents ground truth box. L for cls represents Log Loss over two classes.
p* with regression term in the loss function ensures that if and only if object is identified as yes, then only regression will count, otherwise p* will be zero, so the regression term will become zero in the loss function.
Ncls and Nreg are the normalization. Default λ is 10 by default and is done to scale classifier and regressor on the same level.
For this paper, results were obtained after training this algorithm on famous PASCAL VOC dataset.
The more advancement is being carried on in the field of instance segmentation.
If you want to go more granular, here is the link to the paper: https://arxiv.org/pdf/1506.01497.pdf.
You might also like
How Outdated Retail Systems and Practices Cost You Money
Learn 4 key strategies for retailers to modernize their retail operations, thereby lowering costs, increasing revenue, and providing a better customer experience.Download white paper
How to build your own Clubhouse - Part 2
How to Build your own Clubhouse
How AI Can Enhance Your Product and Customer Experience
A deep dive into implementing AI-based analytics to help transform your product experience and build strong brand loyalty.Read blog
AWS re:Invent in Review - Part 3
Let's go over the all the major announcements from the Week-3 of the AWS re:Invent 2020.Read blog
Fashion E-Commerce: Using Computer Vision to Find Clothing that Fits Like a Glove
Never let online trends get in the way of creating a great outfit for yourself.Read blog
A Deep-Dive into Downtime. Why Does it Happen?
Successfully handling sales peaks while avoiding downtime should be the goal of any business. We’ll be covering every aspect of downtime in a series of posts, including details of how to build resilience into your cloud architecture – ensuring you minimize your business’ exposure to any outages.Read blog
How to Enable Public Health by Innovation in Predictive Analytics - Part 2
Is there a way to let people know of a potential infection risk before even coming into contact with each other?Read blog