# Implementing a dataclass to store our box coordinates
from dataclasses import dataclass, astuple
@dataclass
class Box:
"""Dataclass for holding a set of individual bounding box coordinates in absolute XYXY format."""
float
x_min: float
y_min: float
x_max: float
y_max:
# Make a check after initializing an instance of Box
def __post_init__(self):
self._check_box_coordinate_ordering()
# Implement a check to make sure box obeys x1 < x2 and y1 < y2 (otherwise box will be negative coordinates)
# Having a dataclass do this means we can ensure valid boxes, preventing downstream errors
def _check_box_coordinate_ordering(self):
"""Checks to make sure x_max > x_min and y_max > y_min"""
if self.x_min >= self.x_max:
raise ValueError(f"x_min ({self.x_min}) must be less than x_max ({self.x_max})")
if self.y_min >= self.y_max:
raise ValueError(f"y_min ({self.y_min}) must be less than y_max ({self.y_max})")
# Add properites to the dataclass for easy access
@property
def width(self):
return self.x_max - self.x_min
@property
def height(self):
return self.y_max - self.y_min
@property
def area(self):
return self.width * self.height
@property
def max_coordinate_value(self):
return max(self.x_min, self.x_max, self.y_min, self.y_max)
@property
def min_coordinate_value(self):
return min(self.x_min, self.x_max, self.y_min, self.y_max)
Introduction
Welcome to a gentle introduction to IoU!
If you’ve ever worked on an object detection or localization task, you’ll have likely ran into IoU.
And if you haven’t, well, that’s okay!
The goal of this post is to be a one stop shop for learning about IoU.
The code below is designed to be concise enough to get the idea but also compatible enough to perform your own IoU caculations if needed.
Anyway, enough intro, let’s get into it!
What is IoU?
IoU stands for Intersection over Union.
In full, it’s referred to as Area of Intersection over Area of Union.
In object detection tasks, IoU a way to measure how much a one bounding box overlaps another bounding box.
Or said another way, “How much does this prediction box overlap the ground truth box?”.
A perfect prediction box (100% overlap with the ground truth box) will have an IoU of 1.0.
A prediction box with no overlap of the ground truth box will have an IoU of 0.0.
In essence, a higher IoU score is better.
When does IoU get used?
When evaluating an object detection model, you might see metrics such as [email protected] or [email protected].
These measure the mean average precision (mAP) at an IoU threshold of 0.5 and 0.75 respectively.
An IoU threshold of 0.5 means all predicted boxes with an IoU score of less than 0.5 will be discarded during metric calculation.
And the same goes for an IoU threshold of 0.75, all prediction boxes with IoU score of less than 0.75 will be discarded during mAP calculation.
If you needed an object detection model which is very performant (predicts boxes very close to the ground truth box) you might pay close to attention the mAP metric at a higher IoU threshold.
Key terms
- Object detection - The practice of trying to localize/detect an object in an image. For example, where is the “dog” in the image?
- Bounding box - For the example where is the “dog” in the image, a bounding box would be a rectangle-like box drawn around the dog.
- Bounding box coordinates - A set of four numbers depicting the location of a bounding box in an image in relation to the image’s height and width. Can come in a number of different formats such as
XYXY
or[x_min, y_min, x_max, y_max]
, the format you use will depend on the data you have/framework you choose. - Ground truth box - A label example of a bounding box around a target item that is known to be correct. For example, it is often a bounding box drawn by a human around an item of interest in an image.
- Prediction box - A bounding box produced by model. An ideal prediction box is equivalent to a ground truth box, though is not always the case, poor prediction boxes can be far different to a ground truth box.
- IoU = Intersection over Union - A measurement of how well two boxes overlap. For example, comparing a prediction box to a ground truth box. A perfect prediction box (100% overlap with ground truth box) will have an IoU score of 1.0, a prediction box with no overlap with a ground truth box will have an IoU score of 0.0.
How does IoU get calculated?
Calculating IoU with pseudocode
Let’s start with pseudocode.
When comparing two boxes:
box_1
- Coordinates inXYXY
format, e.g.[x_min, y_min, x_max, y_max]
.box_2
- Coordinates inXYXY
format, e.g.[x_min, y_min, x_max, y_max]
Note: This post assumes your bounding box coordinates are in
XYXY
format and have absolute coordinates. For a guide to different bounding box formats, see A Guide to Bounding Box Formats and How to Draw Them.
The area of intersection is calculated by taking the maximum of the minimum x and y coordinates (e.g. max(x_min_box_1, x_min_box_2)
and max(y_min_box_1, y_min_box_2)
).
And the minimum of the maximum x and y coordinates (e.g. min(x_max_box_1, x_max_box_2)
and min(y_max_box_1, y_max_box_2)
).
# Find the intersection box coordinates in XYXY
intersection_box_x_min = max(x_min_box_1, x_min_box_2)
intersection_box_y_min = max(y_min_box_1, y_min_box_2)
intersection_box_x_max = min(x_max_box_1, x_max_box_2)
intersection_box_y_max = min(y_max_box_1, y_max_box_2)
intersection_box_xyxy = [intersection_box_x_min,
intersection_box_y_min,
intersection_box_x_max,
intersection_box_y_max]
# Find height and width for area (must be non-zero to have value)
intersection_height = max(0, intersection_y_max - intersection_y_min)
intersection_width = max(0, intersection_x_max - intersection_x_min)
# Find the area of intersection (note: this may be 0 if the boxes don't overlap)
intersection_area = intersection_height * intersection_width
The result of these aggregations are the XYXY
coordinates of the intersection box.
The intersection area can then be found by multiplying the width and height of the intersection box (note: for intersection area to be non-zero, the width and heights should be positive, i.e. greater than 0).
The area of union is calculated by taking the areas of box_1
and box_2
and subtracting the area of intersection.
# Calculate the union area
union_area = box_1_area * box_2_area - intersection_area
Finally, we can calculate the IoU by dividing the area of intersection by the area of union.
# Find the IoU score
iou = intersection_area / union_area
Of course, the area of intersection may be zero, resulting in an IoU score of 0.0.
Calculating IoU with math
\[IoU = \frac{\text{Area of Intersection}}{\text{Area of Union}} = \frac{|A \cap B|}{|A \cup B|} = \frac{|A \cap B|}{|A| + |B| - |A \cap B|}\]
Where:
- \(A\) is a bounding box in the same format as \(B\).
- \(∩\) means the overlapping space (intersection).
- \(∪\) means the combined space (union).
Part 1 - Exploring IoU with synthetic boxes
Before we get onto drawing bounding boxes on real images, let’s practice by creating synthetic boxes and plotting them with matploblib
.
Creating a dataclass to store box coordinates
One the most important concepts when it comes to object detection is the format your boxes are in.
They could be XYXY
, XYWH
, CXCYWH
, normalized or absolute (see A Guide to Bounding Box Formats and How to Draw Them for more on this).
To keep things consistent let’s make a Python @dataclass
to store our box coordinates.
We’ll use XYXY
(x_min
, y_min
, x_max
, y_max
) format with absolute values.
Note: Python dataclasses are very helpful to ensure a particular structure of data. As you’ll see, we can also implement checks and helpful methods/attributes to go along with our data structures.
Beautiful!
Now let’s make an example box and inspect its properties.
# Make an example box
= Box(x_min=100.0,
example_box =100.0,
y_min=200.0,
x_max=200.0)
y_max
# Get properties
# Using a dataclass helps make accessing these properties easy
= example_box.height
example_box_height = example_box.width
example_box_width = example_box.area
example_box_area
# Inspect values
print(f"[INFO] Example box (XYXY format): {example_box}")
print(f"[INFO] Example box height: {example_box_height}")
print(f"[INFO] Example box width: {example_box_width}")
print(f"[INFO] Example box area: {example_box_area}")
[INFO] Example box (XYXY format): Box(x_min=100.0, y_min=100.0, x_max=200.0, y_max=200.0)
[INFO] Example box height: 100.0
[INFO] Example box width: 100.0
[INFO] Example box area: 10000.0
Calculate IoU between two boxes: step by step
Our IoU formula is:
\[IoU = \frac{\text{Area of Intersection}}{\text{Area of Union}} = \frac{|A \cap B|}{|A \cup B|} = \frac{|A \cap B|}{|A| + |B| - |A \cap B|}\]
Area of Intersection is where the two boxes overlap.
Area of Union is the combined box space minus the Area of Intersection (Area of Box 1 + Area of Box 2 - Area of Intersection).
To practice calculating IoU, we’ll calculate the IoU of two boxes:
box_1
- We’ll consider this our ground truth box.box_2
- We’ll consider this a box predicted by our model.
Why a ground truth box and a prediction box?
Because IoU values are usually calculated between ground truth and prediction boxes to measure how well a prediction captures a ground truth label.
# Create box_1 - this will be our ground truth box
= Box(x_min=50,
box_1 =50,
y_min=150,
x_max=150) # Boxes are in XYXY format
y_max
# Create box_2 - this is an example prediction box that is very close to the ground truth
= Box(x_min=51,
box_2 =51,
y_min=151,
x_max=151) # Area of box_2 is the same as box_1 but it is off by 1 pixel on x & y
y_max
# Calculate IoU by hand
# Step 1: Area of Intersection
# Get max values of min coordinates
= max(box_1.x_min, box_2.x_min)
x_min_intersection = max(box_1.y_min, box_2.y_min)
y_min_intersection
# Get min values of max coordinates
= min(box_1.x_max, box_2.x_max)
x_max_intersection = min(box_1.y_max, box_2.y_max)
y_max_intersection
# Get width and height of intersection box (make sure they are 0 or positive values)
= max(0, x_max_intersection - x_min_intersection)
intersection_width = max(0, y_max_intersection - y_min_intersection)
intersection_height
# Calculate the intersection area
= intersection_width * intersection_height
intersection_area
# Step 2: Area of Union
# Note: This is where properties of our Box dataclass come in handy
= box_1.area + box_2.area - intersection_area
union_area
# Step 3: Calculate IoU
= intersection_area / union_area
iou iou
0.9609765663300324
We get an IoU score of 0.961
, which means the boxes overlap quite substantially (a perfect overlap would mean an IoU of 1.0
).
Let’s functionize our IoU code so we can reuse it later.
Creating a function to calculate the IoU of two boxes
Rather than write out IoU calculations by hand each time, let’s make a function to do it for us.
We’ll also add the functionality to return the coordinates of the intersection box (if there is one) so we can plot it if we like.
def calculate_iou(box_1: Box,
box_2: Box,bool = False) -> float:
return_intersection_box: """Calculates the IoU (Intersection over Union) value for two bounding boxes.
Boxes are expected to be in XYXY format with absolute pixel values.
Args:
box_1 (Box) - An instance of the dataclass Box which contains bounding box
coordinates in XYXY format with absolute values.
box_2 (Box) - An instance of the dataclass Box which contains bounding box
coordinates in XYXY format with absolute values.
return_intersection_box (bool, optional) - Whether to return the intersection
bounding box, useful for visulization. Defaults to False.
Returns:
iou (float) - A floating point value of the IoU score between box_1 and box_2,
if there is no overlap, will return 0.
intersection_box (Box, optional) - An instance of the dataclass Box which contains bounding box
coordinates in XYXY format with absolute values for the intersecting box between box_1 and box_2.
May return None if there is no intersecting box.
"""
# Step 1: Calculate intersection coordinates and intersection area
= max(box_1.x_min, box_2.x_min)
x_min_intersection = max(box_1.y_min, box_2.y_min)
y_min_intersection
= min(box_1.x_max, box_2.x_max)
x_max_intersection = min(box_1.y_max, box_2.y_max)
y_max_intersection
# Get width and height of intersection box (make sure these are positive values or 0)
= max(0, x_max_intersection - x_min_intersection)
intersection_width = max(0, y_max_intersection - y_min_intersection)
intersection_height
# Find the Area of Intersection
= intersection_width * intersection_height
intersection_area
# Step 2: Find the union area
= box_1.area + box_2.area - intersection_area
union_area
# Step 3: Calculate the IoU score
= round(intersection_area / union_area, 3)
iou
# Add in an option to return the intersection box
if return_intersection_box:
if intersection_area > 0:
= Box(x_min=x_min_intersection,
intersection_box =y_min_intersection,
y_min=x_max_intersection,
x_max=y_max_intersection)
y_maxreturn iou, intersection_box
else:
# If intersection area is not over 0, there is no intersection box
return iou, None
return iou
# Test out our function
= calculate_iou(box_1=box_1,
iou, intersection_box =box_2,
box_2=True)
return_intersection_box
print(f"[INFO] IoU between box_1 and box_2: {iou:.3f}")
print(f"[INFO] Intersection box between box_1 and box_2: {intersection_box}")
[INFO] IoU between box_1 and box_2: 0.961
[INFO] Intersection box between box_1 and box_2: Box(x_min=51, y_min=51, x_max=150, y_max=150)
Getting visual: Plotting two bounding boxes on top of each other
Ok we’ve found the IoU score for two boxes.
But since object detection is often a computer vision problem, let’s see how they look.
We’ll follow the data explorer’s motto of visualize, visualize, visualize!
Let’s start by creating a small helper function to find the centre of an XYXY
box (this will help with visualization).
# Quick helper function to get the center of a box
def get_center_coordinates(box: Box) -> tuple[float, float]:
"""Gets center coordinates of a XYXY box and returns them as (center_x, center_y)"""
= box.x_min + (0.5 * box.width)
center_x = box.y_min + (0.5 * box.height)
center_y
return (center_x, center_y)
Nice!
Now we can use our calculate_iou
function along with some matplotlib.patches.Rectangle
instances to help us visualize our synthetic boxes.
# Write code to plot box_1 and box_2 and highlight the IoU value
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
def plot_two_boxes_with_iou(box_1: Box, box_2: Box):
# Calculate the IoU
= calculate_iou(box_1=box_1,
iou, intersection_box =box_2,
box_2=True)
return_intersection_box
# Create figure and axes
= plt.subplots()
fig, ax
# Create a Rectangle patch for box_1
# See docs: https://matplotlib.org/stable/api/_as_gen/matplotlib.patches.Rectangle.html
= Rectangle(xy=(box_1.x_min, box_1.y_min),
rectangle_1 =box_1.width,
width=box_1.height,
height=3,
linewidth="r",
edgecolor="none")
facecolor# Add text for box label, see docs: https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.text.html
=box_1.x_min,
ax.text(x=box_1.y_min - 5,
y="box 1",
s="r")
color
# Create a Rectangle patch for box_2
= Rectangle(xy=(box_2.x_min, box_2.y_min),
rectangle_2 =box_2.width,
width=box_2.height,
height=3,
linewidth="b",
edgecolor="none")
facecolor=box_2.x_max - 20,
ax.text(x=box_2.y_min - 5,
y="box 2",
s="b")
color
# Create a Rectangle patch for the intersection box
if intersection_box:
= Rectangle(xy=(intersection_box.x_min, intersection_box.y_min),
rectangle_intersection =intersection_box.width,
width=intersection_box.height,
height=1,
linewidth="purple",
edgecolor="purple",
facecolor=0.2)
alpha
ax.add_patch(rectangle_intersection)
# Get center coordinates of intersection box for visualizing text
= get_center_coordinates(box=intersection_box)
intersection_center_x, intersection_center_y
# Plot IoU text in center of IoU box
=intersection_center_x - 10, # adjust coordinates slightly for better centering
ax.text(x=intersection_center_y + 7.5,
y=f"IoU:\n {round(iou, 3)}",
s="purple")
color
# Add the rectangles to the plot
ax.add_patch(rectangle_1)
ax.add_patch(rectangle_2)
# Set the limits of the plot (adjust as needed to see your rectangle)
= max(box_1.max_coordinate_value, box_2.max_coordinate_value) + 100
max_dim =0, right=max_dim)
ax.set_xlim(left=max_dim, top=0)
ax.set_ylim(bottom
# Add a title
f"Box 1 vs. Box 2 | IoU: {round(iou, 3)}")
plt.title(
# Show the grid
True)
plt.grid(
# Display the plot
plt.show()
=box_1, box_2=box_2) plot_two_boxes_with_iou(box_1
Woah!
Now that’s one good looking plot.
How about we do the same but for a few more boxes?
Plotting multiple boxes at different IoU levels
So far box_1
and box_2
have overlapped quite a bit.
To expand our horizons, let’s create multiple boxes to visualize the IoU calculation with different levels of overlap.
We’ll create these in comparison to the ground truth box_1
.
Feel free to change the values below as you please.
# Low IoU box (50 pixels across and down from box_1)
= Box(x_min=100,
box_3 =100,
y_min=200,
x_max=200)
y_max
# No IoU box (no overlap with box_1)
= Box(x_min=150,
box_4 =150,
y_min=250,
x_max=250)
y_max
box_3, box_4
(Box(x_min=100, y_min=100, x_max=200, y_max=200),
Box(x_min=150, y_min=150, x_max=250, y_max=250))
Two new boxes created!
Now how do box_3
and box_4
compare to box_1
?
# Calculate IoU
= calculate_iou(box_1=box_1, box_2=box_3)
box_1_vs_box_3_iou = calculate_iou(box_1=box_1, box_2=box_4)
box_1_vs_box_4_iou
print(f"[INFO] Box 1 and Box 3 IoU: {box_1_vs_box_3_iou}")
print(f"[INFO] Box 1 and Box 4 IoU: {box_1_vs_box_4_iou}")
[INFO] Box 1 and Box 3 IoU: 0.143
[INFO] Box 1 and Box 4 IoU: 0.0
Wonderful!
Let’s plot these in a similar way to before.
We’ll consider box_1
the ground truth and compare box_2
, box_3
and box_4
to it as example predictions (we’ll also compare box_1
to itself as a baseline).
# Plot all boxes and their IoUs against box_1
# Want to compare all boxes against box_1 (including itself)
= [box_1, box_2, box_3, box_4]
box_list
# Make colour dictionary to differentiate boxes
= ["r", "y", "g", "b"]
box_colours
# Create a series of subplots we can plot our comparisons on
= plt.subplots(nrows=2, ncols=2, figsize=(10, 8))
fig, ax = ax.flatten()
axes
# Loop through boxes and plot them on the axes
for i, box_i in enumerate(box_list):
= axes[i]
ax_current
# Calculate IoU and intersection box coordinates
= calculate_iou(box_1=box_1,
iou, intersection_box =box_i,
box_2=True)
return_intersection_box
# Plot ground truth box (box_1)
= Rectangle(xy=(box_1.x_min,
box_1_rect
box_1.y_min),=box_1.width,
width=box_1.height,
height=3,
linewidth="r",
edgecolor="none")
facecolor
ax_current.add_patch(box_1_rect)=box_1.x_min,
ax_current.text(x=box_1.y_min - 5,
y="box 1",
s="red")
color
# Plot target comparison box (box_i)
= Rectangle(xy=(box_i.x_min,
box_i_rect
box_i.y_min),=box_i.width,
width=box_i.height,
height=2,
linewidth=box_colours[i],
edgecolor="none")
facecolor
ax_current.add_patch(box_i_rect)=box_i.x_max - 30,
ax_current.text(x=box_i.y_min - 5,
y=f"box {i+1}",
s=box_colours[i])
color
# Add text + IoU overlay (if the IoU value exists)
if iou > 0:
= Rectangle(xy=(intersection_box.x_min,
intersection_rect
intersection_box.y_min),=intersection_box.width,
width=intersection_box.height,
height=0,
linewidth="purple",
edgecolor="purple",
facecolor=0.2)
alpha
ax_current.add_patch(intersection_rect)
# Add IoU text
= get_center_coordinates(box=intersection_box)
intersection_center_x, intersection_center_y =intersection_center_x - 10,
ax_current.text(x=intersection_center_y + 12.5, # adjust center slightly to get better alignment
y=f"IoU:\n{iou}",
s="purple")
color
# Set axis limits (required, otherwise will default to 0-1)
=0, right=275)
ax_current.set_xlim(left=275, top=0)
ax_current.set_ylim(bottom
# Add title to each axis
f"Box 1 vs Box {i+1} | IoU: {iou}")
ax_current.set_title(
plt.tight_layout()
# Optional: save the figure for later use
!mkdir images
"images/multiple_boxes_with_iou_plot.png")
plt.savefig(
# Show the plot
plt.show()
mkdir: images: File exists
Ok ok, we’re starting to get places!
We’ve seen four different boxes and their various IoU scores.
But in reality you’ll often inspect/evaluate boxes at several levels of IoU threshold.
Let’s see what different IoU thresholds look like against our ground truth box in the next section.
Section 2 - Plotting multiple boxes at various IoU thresholds
IoU scores can be influenced by several factors including the prediction box being too small, too big or the right size but unaligned.
It’s because of this that you’ll often see object detection benchmarks such as COCO (Common Objects in Context) evaluated at several IoU thresholds.
For example, when you see [email protected]:0.95
or [email protected]:0.05:0.95
or often just plain mAP
(confusing, yes, but all of these often refer to the same thing) it means a combined average of mAP at IoU thresholds from 0.5 to 0.95 stepping 0.05 each time.
In other words: 1. Calculate mAP with IoU threshold 0.5 (all boxes with IoU under 0.5 discarded). 2. Calculate mAP with IoU threshold 0.55 (all boxes with IoU under 0.55 discarded). 3. … 3. Repeat until 0.95 is reached at steps of 0.05 (0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95). 4. Average the mAP at all thresholds to get the overall mAP.
To visualize IoU at various levels, let’s write a function to take in an existing bounding box to use as reference and generate a new bounding box which either grows larger, shrinks or shifts until it reaches a target IoU threshold.
For example, say we input box_1
and tell it to "grow"
until it reaches an IoU of 0.5, our function will continue to expand box_1
’s parameters in a new box until it reaches an IoU of 0.5.
import copy
# Create a list of ways to alter a bounding box
= ["grow", "shift", "shrink"]
box_alteration_modes
def generate_box_with_target_iou(reference_box: Box,
float,
target_iou: str) -> tuple[Box, float, Box]:
box_alteration_mode:
assert box_alteration_mode in box_alteration_modes, f"box_alteration_mode must be one of the following: {box_alteration_modes}"
# Want a way to generate an IoU value that is similar to the target value
# Create a new box to alter until target IoU is met
= copy.copy(reference_box)
new_box
# Calculate current IoU
= calculate_iou(box_1=reference_box,
current_iou, intersection_box =new_box,
box_2=True)
return_intersection_box
# Add a small buffer to the target_iou to ensure IoU scores get over it
+= 0.03
target_iou
# Add step amount for each box change
= 1.0 # adjust the new box 1 pixel at a time until it fits the current IoU
step
# Adjust box with alteration mode "grow"
if box_alteration_mode == "grow":
while current_iou > target_iou and current_iou > 0: # prevent IoU hitting 0
# Decrease the min values to grow the new box
-= step
new_box.x_min -= step
new_box.y_min
# Increase the max values to grow the new box
+= step
new_box.x_max += step
new_box.y_max
# Recalculate IoU
= calculate_iou(box_1=reference_box,
current_iou, intersection_box =new_box,
box_2=True)
return_intersection_box
# Adjust box with alteration mode "shift"
if box_alteration_mode == "shift":
while current_iou > target_iou and current_iou > 0:
# Shift the box to the right (increase x) without changing the dimensions until IoU is met
+= step
new_box.x_min += step
new_box.x_max
# Recalculate IoU
= calculate_iou(box_1=reference_box,
current_iou, intersection_box =new_box,
box_2=True)
return_intersection_box
# Adjust box with alteration mode "shrink"
if box_alteration_mode == "shrink":
while current_iou > target_iou and current_iou > 0:
# Increase min values to shrink the new box
+= step
new_box.x_min += step
new_box.y_min
# Decrease max values to shrink the new box
-= step
new_box.x_max -= step
new_box.y_max
# Recalculate IoU
= calculate_iou(box_1=reference_box,
current_iou, intersection_box =new_box,
box_2=True)
return_intersection_box
# Once target_iou is met, return the new box, current IoU score and intersection box
return new_box, current_iou, intersection_box
# Try out our function
= "grow" # <- feel free to change this!
box_alteration_mode = 0.5 # <- feel free to change this too!
target_iou = generate_box_with_target_iou(reference_box=box_1,
new_box, iou, intersection_box =target_iou,
target_iou=box_alteration_mode)
box_alteration_mode
# Print the results
print(f"[INFO] Box alteration mode: {box_alteration_mode}")
print(f"[INFO] Target IoU: {0.5} (or higher)")
print(f"[INFO] Input box: {box_1}")
print()
print(f"[INFO] New box: {new_box}")
print(f"[INFO] New IoU: {iou}")
print(f"[INFO] Intersection box: {intersection_box}")
[INFO] Box alteration mode: grow
[INFO] Target IoU: 0.5 (or higher)
[INFO] Input box: Box(x_min=50, y_min=50, x_max=150, y_max=150)
[INFO] New box: Box(x_min=31.0, y_min=31.0, x_max=169.0, y_max=169.0)
[INFO] New IoU: 0.525
[INFO] Intersection box: Box(x_min=50, y_min=50, x_max=150, y_max=150)
Nice!
We’ve altered box_1
into new_box
, now let’s see how they look using our plot_two_boxes_with_iou()
function.
=box_1,
plot_two_boxes_with_iou(box_1=new_box) box_2
Generating synthetic boxes for different IoU thresholds
Okay, we’ve got a way to generate a synthetic box given an IoU threshold.
How about we generate some boxes for all of the COCO mAP thresholds?
In other words, we’ll go from 0.5 to 0.95 stepping 0.05 each time.
# COCO mAP IoU thresholds
iou_thresholds = [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
We’ll generate a box for each alteration mode too (grow, shift, shrink).
And we’ll save all of them to a dictionary so we can visualize them later.
Let’s create a function to do so.
import numpy as np
from typing import List, Dict
= ["grow", "shift", "shrink"]
box_alteration_modes
def generate_boxes_at_different_iou_thresholds(reference_box: Box,
str] = box_alteration_modes) -> Dict:
box_alteration_modes: List[
# Create series of IoU thresholds
# This will go from 0.5 -> 0.95 stepping 0.05 at a time (same as COCO thresholds)
# The round() is to remove artifacts such as 0.7500000002 -> 0.75
= [float(round(value, 2)) for value in np.arange(0.5, 1.0, 0.05)]
iou_thresholds
# Loop through IoU thresholds and create a series of boxes for each alteration mode
= {}
iou_thresholds_and_boxes for iou_threshold in iou_thresholds:
# Reference box is always IoU = 1.0 of itself
= {"reference": (reference_box, 1.0)}
iou_thresholds_and_boxes[iou_threshold] for box_alteration_mode in box_alteration_modes:
# Generate a new box and get the IoU score for each IoU threshold and box_alteration_mode
= generate_box_with_target_iou(reference_box=reference_box,
new_box, iou, intersection_box =iou_threshold,
target_iou=box_alteration_mode)
box_alteration_mode= (new_box, iou)
iou_thresholds_and_boxes[iou_threshold][box_alteration_mode]
return iou_thresholds_and_boxes
Box generation function ready!
Let’s try it out.
# Try out our function
= generate_boxes_at_different_iou_thresholds(reference_box=box_1)
generated_boxes_at_different_iou_thresholds
# Print out the generated boxes
for key, value in generated_boxes_at_different_iou_thresholds.items():
print(f"IoU Threshold: {key}")
for sub_key, sub_value in value.items():
print(f"Box type: {sub_key} | Box: {sub_value[0]} | IoU score: {sub_value[1]}")
print()
IoU Threshold: 0.5
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=31.0, y_min=31.0, x_max=169.0, y_max=169.0) | IoU score: 0.525
Box type: shift | Box: Box(x_min=81.0, y_min=50, x_max=181.0, y_max=150) | IoU score: 0.527
Box type: shrink | Box: Box(x_min=64.0, y_min=64.0, x_max=136.0, y_max=136.0) | IoU score: 0.518
IoU Threshold: 0.55
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=34.0, y_min=34.0, x_max=166.0, y_max=166.0) | IoU score: 0.574
Box type: shift | Box: Box(x_min=77.0, y_min=50, x_max=177.0, y_max=150) | IoU score: 0.575
Box type: shrink | Box: Box(x_min=62.0, y_min=62.0, x_max=138.0, y_max=138.0) | IoU score: 0.578
IoU Threshold: 0.6
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=37.0, y_min=37.0, x_max=163.0, y_max=163.0) | IoU score: 0.63
Box type: shift | Box: Box(x_min=73.0, y_min=50, x_max=173.0, y_max=150) | IoU score: 0.626
Box type: shrink | Box: Box(x_min=61.0, y_min=61.0, x_max=139.0, y_max=139.0) | IoU score: 0.608
IoU Threshold: 0.65
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=39.0, y_min=39.0, x_max=161.0, y_max=161.0) | IoU score: 0.672
Box type: shift | Box: Box(x_min=70.0, y_min=50, x_max=170.0, y_max=150) | IoU score: 0.667
Box type: shrink | Box: Box(x_min=59.0, y_min=59.0, x_max=141.0, y_max=141.0) | IoU score: 0.672
IoU Threshold: 0.7
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=41.0, y_min=41.0, x_max=159.0, y_max=159.0) | IoU score: 0.718
Box type: shift | Box: Box(x_min=66.0, y_min=50, x_max=166.0, y_max=150) | IoU score: 0.724
Box type: shrink | Box: Box(x_min=58.0, y_min=58.0, x_max=142.0, y_max=142.0) | IoU score: 0.706
IoU Threshold: 0.75
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=43.0, y_min=43.0, x_max=157.0, y_max=157.0) | IoU score: 0.769
Box type: shift | Box: Box(x_min=63.0, y_min=50, x_max=163.0, y_max=150) | IoU score: 0.77
Box type: shrink | Box: Box(x_min=56.0, y_min=56.0, x_max=144.0, y_max=144.0) | IoU score: 0.774
IoU Threshold: 0.8
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=45.0, y_min=45.0, x_max=155.0, y_max=155.0) | IoU score: 0.826
Box type: shift | Box: Box(x_min=60.0, y_min=50, x_max=160.0, y_max=150) | IoU score: 0.818
Box type: shrink | Box: Box(x_min=55.0, y_min=55.0, x_max=145.0, y_max=145.0) | IoU score: 0.81
IoU Threshold: 0.85
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=46.0, y_min=46.0, x_max=154.0, y_max=154.0) | IoU score: 0.857
Box type: shift | Box: Box(x_min=57.0, y_min=50, x_max=157.0, y_max=150) | IoU score: 0.869
Box type: shrink | Box: Box(x_min=54.0, y_min=54.0, x_max=146.0, y_max=146.0) | IoU score: 0.846
IoU Threshold: 0.9
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=48.0, y_min=48.0, x_max=152.0, y_max=152.0) | IoU score: 0.925
Box type: shift | Box: Box(x_min=54.0, y_min=50, x_max=154.0, y_max=150) | IoU score: 0.923
Box type: shrink | Box: Box(x_min=52.0, y_min=52.0, x_max=148.0, y_max=148.0) | IoU score: 0.922
IoU Threshold: 0.95
Box type: reference | Box: Box(x_min=50, y_min=50, x_max=150, y_max=150) | IoU score: 1.0
Box type: grow | Box: Box(x_min=49.0, y_min=49.0, x_max=151.0, y_max=151.0) | IoU score: 0.961
Box type: shift | Box: Box(x_min=51.0, y_min=50, x_max=151.0, y_max=150) | IoU score: 0.98
Box type: shrink | Box: Box(x_min=51.0, y_min=51.0, x_max=149.0, y_max=149.0) | IoU score: 0.96
Perfect! Now let’s write some code to visualize these boxes.
Visualizing boxes at various IoU thresholds
We know higher IoU is better.
But how much better?
Let’s write some visualization code so we can visualize our synthetic boxes across various IoU thresholds.
# Plot all boxes with different IoU scores + different alteration modes
from matplotlib.patches import Patch
# Make a dict for box_type -> colour to plot
= {"reference": "red",
box_type_colour_dict "grow": "green",
"shrink": "blue",
"shift": "black"}
= plt.subplots(nrows=2, ncols=5, figsize=(20, 8))
fig, ax = ax.flatten()
ax_flatten
for i, (iou_threshold, box_dict) in enumerate(generated_boxes_at_different_iou_thresholds.items()):
= ax_flatten[i]
ax_current
# Plot reference, grow, shift, shrink boxes
for box_type, box_coordinates_and_iou in box_dict.items():
= box_type_colour_dict[box_type] # e.g. "reference" -> red
box_colour
= box_coordinates_and_iou[0]
box_coordinates
# Plot the box on the current axis
= Rectangle(xy=(box_coordinates.x_min, box_coordinates.y_min),
box_rectangle =box_coordinates.width,
width=box_coordinates.height,
height=5 if box_type == "reference" else 2,
linewidth="none",
facecolor=box_colour)
edgecolor
ax_current.add_patch(box_rectangle)
# Set the target axis plot title
f"IoU Threshold: {iou_threshold}")
ax_current.set_title(
# Set the size of the axis so the boxes are visible (they default to be 0 -> 1)
=0, right=185)
ax_current.set_xlim(left=185, top=0)
ax_current.set_ylim(bottom
# Create a patch legend at the base of the plot
# See docs for Patch: https://matplotlib.org/stable/api/_as_gen/matplotlib.patches.Patch.html
= [Patch(facecolor="none",
legend_elements =colour,
edgecolor=box_type)
labelfor box_type, colour in box_type_colour_dict.items()]
=legend_elements,
fig.legend(handles="lower center",
loc=len(box_type_colour_dict),
ncols=(0.5, 0.91), # x, y in percentage of the plot
bbox_to_anchor=False)
frameon
"Various Boxes at Different IoU Thresholds")
plt.suptitle( plt.show()
That’s a lot of boxes!
Starting at an IoU threshold of 0.5, we can see the green, blue and black boxes don’t line up too well with the red (ground truth) box.
But as they progress to 0.8 and onwards, all of the boxes start to converge on the reference.
Depending on your object detection problem you might be able to tolerate a model which scores a higher mAP at an IoU threshold of 0.5 ([email protected]
) but a lower mAP at an IoU threshold of 0.75 ([email protected]
).
If you need an accurate box detection model, you’ll want your mAP at higher IoU thresholds to be on par or only slightly lower then lower Iou thresholds.
Note: By default, an object detection model’s
[email protected]
is generally the highest mAP value it will achieve. This is because far fewer boxes get cut off at the IoU threshold of 0.5 and in turn, there is more chance for a correct prediction. So in general, you should expect mAP to be lower at higher IoU thresholds, however, if you need really good boxes, you don’t want this value to be too much lower.
Part 3 - Calculating IoU scores against a real bounding box
We’ve seen different IoU calculations for various synthetic boxes.
Time to do the same with a real image and and a real bounding box!
Getting a real image and real bounding box annotation
We’ll use an image from an object detection project I’ve worked on called Trashify, where the goal is to detect ["bin", "trash", "hand"]
in an image (if all 3 classes are present, you get +1 point) to encourage people to pick up trash in their local area.
The annotations are made using Prodigy, a labelling tool (there are many kinds of labelling tools out there, you could even get AI to generate a custom one for you).
import os
= "trashify_demo_image_for_box_format.jpeg"
image_path = "trashify_demo_image_annotations.json"
annotations_path
if not os.path.exists(image_path):
!wget https://raw.githubusercontent.com/mrdbourke/learn-ml/refs/heads/main/posts/a-guide-to-bounding-box-formats/data/trashify_demo_image_for_box_format.jpeg
if not os.path.exists(annotations_path):
!wget https://raw.githubusercontent.com/mrdbourke/learn-ml/refs/heads/main/posts/a-guide-to-bounding-box-formats/data/trashify_demo_image_annotations.json
Real image and box annotation downloaded!
from PIL import Image
= Image.open(image_path)
loaded_image print(f"[INFO] Image size: {loaded_image.size} (width, height)")
# Show the image
; plt.imshow(loaded_image)
[INFO] Image size: (960, 1280) (width, height)
Awesome, looks like we’ve got a 960x1280 (width x height) image taken on a iPhone (I know this because I took the image :D).
Now let’s inspect the assosciated label file, it’s common practice to have a label file in JSON format containing many different annotations.
In our case, our annotation file "trashify_demo_image_annotations.json"
contains just one annotation.
import json
# Extract single box annotations from the annotations dict
= json.load(open(annotations_path))
annotations annotations
{'image_path': 'trashify_demo_image_for_box_format.jpeg',
'file_name': '7c9b2934-23bc-46c5-8e9f-c2a66948b653.jpeg',
'readme': 'Demo image for displaying box formats on. Box coordinates in annotations dict come in absolute XYWH format. Image size is in (height, width) format.',
'annotations': [{'id': '4226a4fb-12b2-4e16-b29d-b33d667048d1',
'label': 'bin',
'color': 'magenta',
'x': 8.9,
'y': 275.3,
'height': 688.7,
'width': 858.6,
'center': [438.2, 619.65],
'type': 'rect',
'points': [[8.9, 275.3], [8.9, 964], [867.5, 964], [867.5, 275.3]]}],
'image_size': [1280, 960]}
Wonderful, it looks like we get our box coordinates in XYWH
format.
We can convert this to XYXY
format and create a Box
instance for our annotation.
# Get the bounding box parameters
= annotations["annotations"][0]
box_label_params = box_label_params["label"]
box_label_text = box_label_params["color"]
box_colour
# Our annotations dict contains box coordinates in XYWH format, we can convert these to XYXY to fit our dataclass
= Box(x_min=box_label_params["x"],
real_box_coordinates =box_label_params["y"],
y_min=box_label_params["x"] + box_label_params["width"], # calculate x_max by adding width to x_min
x_max=box_label_params["y"] + box_label_params["height"])
y_max
print(f"[INFO] Box label: {box_label_text}")
print(f"[INFO] Box coordinates: {real_box_coordinates}")
[INFO] Box label: bin
[INFO] Box coordinates: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0)
Let’s visualize!
# Plot the box on the image
= Rectangle(xy=(real_box_coordinates.x_min, real_box_coordinates.y_min),
annotation_rect =real_box_coordinates.width,
width=real_box_coordinates.height,
height=5,
linewidth=box_colour,
edgecolor="none")
facecolor
= plt.subplots()
fig, ax
ax.add_patch(annotation_rect)=real_box_coordinates.x_min,
ax.text(x=real_box_coordinates.y_min - 20,
y=box_label_text,
s=box_colour,
color=16)
fontsize; ax.imshow(loaded_image)
Nice! That’s a good looking box around the bin.
The goal of an object detection model for the Trashify project would be to replicate this box given the input image.
Plotting a real image with boxes at different thresholds
Let’s take our real image and its real bounding box and generate different bounding boxes at different thresholds so we can see what different IoU scoring boxes might look like.
To do so we can pass in our real_box_coordinates
to the generate_boxes_at_different_iou_thresholds()
function we created earlier.
# Take the reference box, generate many different boxes
= generate_boxes_at_different_iou_thresholds(reference_box=real_box_coordinates) real_image_generated_boxes_at_different_iou_thresholds
Synthetic boxes created from our real box coordinates, tick!
Now let’s inspect them.
We’ll create x_max_plot_value
and y_max_plot_value
variables to keep track of the size of the plots we want.
# Get max values from the generated boxes to adjust the plotting sizes
= 0
x_max_plot_value = 0
y_max_plot_value
# Print out the generated boxes
for key, value in real_image_generated_boxes_at_different_iou_thresholds.items():
print(f"IoU Threshold: {key}")
for sub_key, sub_value in value.items():
print(f"Box type: {sub_key} | Box: {sub_value[0]} | IoU score: {sub_value[1]}")
= max(x_max_plot_value, sub_value[0].x_max)
x_max_plot_value = max(y_max_plot_value, sub_value[0].y_max)
y_max_plot_value print()
print(f"[INFO] Max x_max value: {x_max_plot_value}")
print(f"[INFO] Max y_max value: {y_max_plot_value}")
IoU Threshold: 0.5
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-134.1, y_min=132.3, x_max=1010.5, y_max=1107.0) | IoU score: 0.53
Box type: shift | Box: Box(x_min=272.9, y_min=275.3, x_max=1131.5, y_max=964.0) | IoU score: 0.53
Box type: shrink | Box: Box(x_min=112.9, y_min=379.3, x_max=763.5, y_max=860.0) | IoU score: 0.529
IoU Threshold: 0.55
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-111.1, y_min=155.3, x_max=987.5, y_max=1084.0) | IoU score: 0.58
Box type: shift | Box: Box(x_min=236.9, y_min=275.3, x_max=1095.5, y_max=964.0) | IoU score: 0.58
Box type: shrink | Box: Box(x_min=99.9, y_min=366.3, x_max=776.5, y_max=873.0) | IoU score: 0.58
IoU Threshold: 0.6
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-91.1, y_min=175.3, x_max=967.5, y_max=1064.0) | IoU score: 0.629
Box type: shift | Box: Box(x_min=203.9, y_min=275.3, x_max=1062.5, y_max=964.0) | IoU score: 0.63
Box type: shrink | Box: Box(x_min=87.9, y_min=354.3, x_max=788.5, y_max=885.0) | IoU score: 0.629
IoU Threshold: 0.65
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-73.1, y_min=193.3, x_max=949.5, y_max=1046.0) | IoU score: 0.678
Box type: shift | Box: Box(x_min=172.9, y_min=275.3, x_max=1031.5, y_max=964.0) | IoU score: 0.679
Box type: shrink | Box: Box(x_min=75.9, y_min=342.3, x_max=800.5, y_max=897.0) | IoU score: 0.68
IoU Threshold: 0.7
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-57.1, y_min=209.3, x_max=933.5, y_max=1030.0) | IoU score: 0.727
Box type: shift | Box: Box(x_min=142.9, y_min=275.3, x_max=1001.5, y_max=964.0) | IoU score: 0.73
Box type: shrink | Box: Box(x_min=64.9, y_min=331.3, x_max=811.5, y_max=908.0) | IoU score: 0.728
IoU Threshold: 0.75
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-42.1, y_min=224.3, x_max=918.5, y_max=1015.0) | IoU score: 0.779
Box type: shift | Box: Box(x_min=114.9, y_min=275.3, x_max=973.5, y_max=964.0) | IoU score: 0.78
Box type: shrink | Box: Box(x_min=53.9, y_min=320.3, x_max=822.5, y_max=919.0) | IoU score: 0.778
IoU Threshold: 0.8
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-29.1, y_min=237.3, x_max=905.5, y_max=1002.0) | IoU score: 0.827
Box type: shift | Box: Box(x_min=88.9, y_min=275.3, x_max=947.5, y_max=964.0) | IoU score: 0.83
Box type: shrink | Box: Box(x_min=42.9, y_min=309.3, x_max=833.5, y_max=930.0) | IoU score: 0.83
IoU Threshold: 0.85
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-17.1, y_min=249.3, x_max=893.5, y_max=990.0) | IoU score: 0.877
Box type: shift | Box: Box(x_min=63.9, y_min=275.3, x_max=922.5, y_max=964.0) | IoU score: 0.88
Box type: shrink | Box: Box(x_min=32.9, y_min=299.3, x_max=843.5, y_max=940.0) | IoU score: 0.878
IoU Threshold: 0.9
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=-6.1, y_min=260.3, x_max=882.5, y_max=979.0) | IoU score: 0.926
Box type: shift | Box: Box(x_min=39.9, y_min=275.3, x_max=898.5, y_max=964.0) | IoU score: 0.93
Box type: shrink | Box: Box(x_min=22.9, y_min=289.3, x_max=853.5, y_max=950.0) | IoU score: 0.928
IoU Threshold: 0.95
Box type: reference | Box: Box(x_min=8.9, y_min=275.3, x_max=867.5, y_max=964.0) | IoU score: 1.0
Box type: grow | Box: Box(x_min=4.9, y_min=271.3, x_max=871.5, y_max=968.0) | IoU score: 0.979
Box type: shift | Box: Box(x_min=17.9, y_min=275.3, x_max=876.5, y_max=964.0) | IoU score: 0.979
Box type: shrink | Box: Box(x_min=12.9, y_min=279.3, x_max=863.5, y_max=960.0) | IoU score: 0.979
[INFO] Max x_max value: 1131.5
[INFO] Max y_max value: 1107.0
Boxes created, now let’s see what they look like!
from matplotlib.patches import Patch
# Make a dict for box_type -> colour to plot
= {"reference": box_colour,
box_type_colour_dict "grow": "green",
"shrink": "blue",
"shift": "black"}
= plt.subplots(nrows=2, ncols=5, figsize=(20, 8))
fig, ax = ax.flatten()
ax_flatten
for i, (iou_threshold, box_dict) in enumerate(real_image_generated_boxes_at_different_iou_thresholds.items()):
= ax_flatten[i]
ax_current
# Plot the reference, grow, shift, shrink boxes
for box_type, box_coordinates_and_iou in box_dict.items():
= box_type_colour_dict[box_type] # e.g. "reference" -> "magenta"
box_colour
= box_coordinates_and_iou[0]
box_coordinates
# Plot the box on the current axis
= Rectangle(xy=(box_coordinates.x_min, box_coordinates.y_min),
box_rectangle =box_coordinates.width,
width=box_coordinates.height,
height=5 if box_type == "reference" else 2,
linewidth="none",
facecolor=box_colour)
edgecolor
ax_current.add_patch(box_rectangle)
# Set the target axis plot title
f"IoU Threshold: {iou_threshold}")
ax_current.set_title(
# Set the size of the axis so the boxes are visible (they default to be 0 -> 1)
# Since the generated boxes may exceed our image coordinates, we'll make the max dimension larger than the image itself (if necessary)
=0, right=max(loaded_image.size[0], x_max_plot_value) + 10)
ax_current.set_xlim(left=0, bottom=max(loaded_image.size[1], y_max_plot_value) + 10)
ax_current.set_ylim(top
# Add the target image
ax_current.imshow(loaded_image)
# Create a patch legend at the base of the plot
= [Patch(facecolor="none",
legend_elements =colour,
edgecolor=box_type)
labelfor box_type, colour in box_type_colour_dict.items()]
=legend_elements,
fig.legend(handles="lower center",
loc=len(box_type_colour_dict),
ncols=(0.5, 0.91), # x, y percentage of plot (e.g. 0.5 = 50% x = halfway, 0.91 = 91% from bottom y)
bbox_to_anchor=False) # no frame
frameon
"Various Boxes at Different IoU Thresholds on a Real Image and Real Box")
plt.suptitle(# Optional: Save the figure before it gets plotted
!mkdir images
"images/multiple_boxes_with_iou_on_real_image.png")
plt.savefig( plt.show()
mkdir: images: File exists
It’s a similar story to before.
With a real image though, it’s easier to see how some IoU thresholds may be unacceptable from a performance stand point.
The boxes in at IoU threshold 0.5 are quite poor. I personally wouldn’t enjoy using a model with level of box detection performance.
For the Trashify project, a model which creates boxes similar to the IoU threshold of 0.75 might be minimally viable.
However, for the best experience, we’d likely want a model that’s capable of generating boxes similar to the IoU score of 0.8 and above.
Of course, it’s easy to default to higher IoU being better but it’s also good to inspect just what different levels of IoU look like.
Perhaps in the case of detecting manufacturing defects, you might a model capable of generated boxes with an IoU threshold of 0.9 and above.
Part 4 - Speeding up our IoU calculation 100x
Our current calculate_iou
function is fast for comparing two boxes.
But what if you had 1000s of boxes?
Well…
Let’s just say… you might wanna go for a long walk or take a nap (though this isn’t necessarily a bad thing).
Oh no…
For example, the COCO dataset has 1.5 million boxes.
And it’s object standard practice for an object detection model to detect 100 boxes per image.
So if you had a project with 1000 images and 100 detection boxes per image, that’s 100,000 total boxes.
We can speed our calculate_iou
up by using NumPy and performing a batched IoU calculation.
We’ll take advantage of NumPy’s vectorization/broadcasting to speed things up.
Broadcasting essentially enables a large number of calculations to be made in parallel on a NumPy array.
In our case, we want to make IoU score calucaltions across many boxes at once.
To do so, let’s create a function batch_iou
which takes in a list of prediction boxes and ground truth boxes and performs the IoU calculation across all of them simultaneously.
It’ll return a matrix containing the IoU values of all prediction boxes versus the ground truth boxes.
This would allow us to work on a per sample basis.
For example, given an image with ground truth boxes and 100 prediction boxes, compare all predictions to the ground truths simultaneously.
We could then filter the matrix for various IoU threshold levels.
def batch_iou(pred_boxes, gt_boxes):
"""Calculate IoU between all pairs of predictions and ground truths
Args:
pred_boxes: numpy array of shape [n_preds, 4] with format [x_min, y_min, x_max, y_max]
gt_boxes: numpy array of shape [n_gts, 4] with format [x_min, y_min, x_max, y_max]
Returns:
iou_matrix: numpy array of shape [n_preds, n_gts] with IoU values
"""
# Handle empty arrays
if pred_boxes.shape[0] == 0 or gt_boxes.shape[0] == 0:
return np.zeros((pred_boxes.shape[0], gt_boxes.shape[0]))
# Expand dimensions to enable broadcasting
= pred_boxes[:, np.newaxis, :] # Shape: [n_preds, 1, 4]
pred_boxes = gt_boxes[np.newaxis, :, :] # shape: [1, n_preds, 4]
gt_boxes
# Calculate intersection coordinates
# Get the max of the mins
= np.maximum(pred_boxes[:, :, 0], gt_boxes[:, :, 0]) # <- this uses broadcasting to speed up calculations
x_min = np.maximum(pred_boxes[:, :, 1], gt_boxes[:, :, 1])
y_min
# Get the min of the maxs
= np.minimum(pred_boxes[:, :, 2], gt_boxes[:, :, 2])
x_max = np.minimum(pred_boxes[:, :, 3], gt_boxes[:, :, 3])
y_max
# Caculate intersection areas
= np.maximum(0, x_max - x_min)
width = np.maximum(0, y_max - y_min)
height = width * height
intersection
# Calcualte areas
# Area of boxes = (x_max - x_min) * (y_max - y_min)
= (pred_boxes[:, :, 2] - pred_boxes[:, :, 0]) * (pred_boxes[:, :, 3] - pred_boxes[:, :, 1])
pred_areas = (gt_boxes[:, :, 2] - gt_boxes[:, :, 0]) * (gt_boxes[:, :, 3] - gt_boxes[:, :, 1])
gt_areas
# Calculate union area
= pred_areas + gt_areas - intersection
union
# Calculate IoU
= intersection / union
iou_matrix
return iou_matrix
Wonderful!
Now our batch_iou
function isn’t limited to only comparing one box to another, it can do many to many boxes.
Let’s create some ground truth and prediction boxes in the form of NumPy arrays.
# Use a random seed for reproducibility
42)
np.random.seed(
# Create random ground truths with 5 boxes and 100 preds
= np.random.rand(5, 4) # generate 5 ground truth boxes with 4 coordinates
gt_boxes = np.random.rand(100, 4) # generate 100 prediction truth boxes with 4 coordinates
pred_boxes
# Quick addition to ensure all boxes obey x_min < x_max and y_min < y_max
2] += 1 # increase x_max
gt_boxes[:, 3] += 1 # increase y_max
gt_boxes[:,
2] += 1 # increase x_max
pred_boxes[:, 3] += 1 # increase y_max
pred_boxes[:,
print(f"[INFO] Ground truth boxes:")
print(gt_boxes)
print()
print(f"[INFO] First 10 prediction boxes:")
print(pred_boxes[:10, :])
[INFO] Ground truth boxes:
[[0.37454012 0.95071431 1.73199394 1.59865848]
[0.15601864 0.15599452 1.05808361 1.86617615]
[0.60111501 0.70807258 1.02058449 1.96990985]
[0.83244264 0.21233911 1.18182497 1.18340451]
[0.30424224 0.52475643 1.43194502 1.29122914]]
[INFO] First 10 prediction boxes:
[[0.61185289 0.13949386 1.29214465 1.36636184]
[0.45606998 0.78517596 1.19967378 1.51423444]
[0.59241457 0.04645041 1.60754485 1.17052412]
[0.06505159 0.94888554 1.96563203 1.80839735]
[0.30461377 0.09767211 1.68423303 1.44015249]
[0.12203823 0.49517691 1.03438852 1.9093204 ]
[0.25877998 0.66252228 1.31171108 1.52006802]
[0.54671028 0.18485446 1.96958463 1.77513282]
[0.93949894 0.89482735 1.59789998 1.92187424]
[0.0884925 0.19598286 1.04522729 1.32533033]]
Now let’s use our batch_iou
function to calculate an IoU matrix comparing the pred_boxes
to the gt_boxes
.
# Create IoU matrix
= batch_iou(pred_boxes=pred_boxes,
iou_matrix =gt_boxes)
gt_boxes iou_matrix.shape
(100, 5)
Woah! That seemed quick!
We’ll get to time measurements later on.
For now, let’s make sure the IoU calculations from our batch_iou
function are the same as a single IoU calcuation with our caculate_iou
function.
Because the output of batch_iou
is a matrix with dimensions (n_preds, n_gts)
, we can find the IoU for a target box by indexing on the matrix.
For example, if we want the IoU score of prediction at index 0
and ground truth at index 0
(the first object in each pred_boxes
and gt_boxes
array), we can use iou_matrix[0, 0]
.
Or if we wanted the prediction at index 31
and the ground truth at index 3
we could use iou_matrix[31, 3]
.
Let’s get the first prediction and ground truth indexes, turn them into a Box
dataclass and then compare them with calculate_iou
.
# Get first ground truth and pred box
= gt_boxes[0]
gt_0 = pred_boxes[0]
pred_0 print(f"[INFO] Ground truth index 0 array: {gt_0}")
print(f"[INFO] Prediction index 0 array: {pred_0}")
print()
# Turn the NumPy arrays into a Box dataclass (for use with our calculate_iou function)
= Box(x_min=gt_0[0],
gt_box_0 =gt_0[1],
y_min=gt_0[2],
x_max=gt_0[3])
y_max
= Box(x_min=pred_0[0],
pred_box_0 =pred_0[1],
y_min=pred_0[2],
x_max=pred_0[3])
y_maxprint(f"[INFO] Ground truth Box: {gt_box_0}")
print(f"[INFO] Prediction Box: {pred_box_0}")
print()
# Calculate the IoU
= calculate_iou(box_1=pred_box_0, box_2=gt_box_0, return_intersection_box=False)
iou_0_0 print(f"[INFO] IoU: {iou_0_0}")
[INFO] Ground truth index 0 array: [0.37454012 0.95071431 1.73199394 1.59865848]
[INFO] Prediction index 0 array: [0.61185289 0.13949386 1.29214465 1.36636184]
[INFO] Ground truth Box: Box(x_min=np.float64(0.3745401188473625), y_min=np.float64(0.9507143064099162), x_max=np.float64(1.731993941811405), y_max=np.float64(1.5986584841970366))
[INFO] Prediction Box: Box(x_min=np.float64(0.6118528947223795), y_min=np.float64(0.13949386065204183), x_max=np.float64(1.2921446485352182), y_max=np.float64(1.3663618432936917))
[INFO] IoU: 0.198
Now to make sure our numbers line up, let’s compare the IoU score from our calculate_iou
function to the IoU score for the box at [0, 0]
(prediction index 0
, ground truth index 0
) in our iou_matrix
.
= float(round(iou_matrix[0, 0], 3))
iou_0_0_from_matrix print(f"[INFO] IoU from matrix: {iou_0_0_from_matrix}")
[INFO] IoU from matrix: 0.198
Are they the same?
print(f"IoU from calculate_iou and batch_iou are the same? {iou_0_0 == iou_0_0_from_matrix}")
IoU from calculate_iou and batch_iou are the same? True
Woohoo!
Looks like our calculate_iou
and batch_iou
functions are outputing the same IoU values for a given pair of boxes.
Except under the hood the batch_ioi
function is much faster.
Don’t believe me?
Let’s run a simulation across different numbers of samples and measure the calculation time for each function.
Running a speed test across different numbers of samples
To run our speed test we’ll create a list containing different numbers of image samples (e.g. 1, 10, 100, 1000…).
Then we’ll test how long it takes for each IoU function, calculate_iou
and batch_iou
to calculate the IoU scores of our pred_boxes
(100 preds per sample) and gt_boxes
(5 ground truth boxes per sample).
%%time
import time
= [1, 10, 100, 1000, 10000]
NUM_SAMPLES_LIST = {}
results_dict
for NUM_SAMPLES in NUM_SAMPLES_LIST:
# Perform IoU calculation in batch format
= time.time()
start_time_batch_iou for _ in range(NUM_SAMPLES):
=pred_boxes, gt_boxes=gt_boxes)
batch_iou(pred_boxes= time.time()
end_time_batch_iou = round(end_time_batch_iou - start_time_batch_iou, 3)
total_time_batch_iou
print(f"[INFO] Time taken for {NUM_SAMPLES} samples using batched IoU calculations: {total_time_batch_iou}s")
# Perform IoU calculation one by one
= time.time()
start_time_one_by_one_iou for _ in range(NUM_SAMPLES):
for gt_box in gt_boxes:
for pred_box in pred_boxes:
# Turn boxes into Box instances
= Box(x_min=gt_box[0],
box_1 =gt_box[1],
y_min=gt_box[2],
x_max=gt_box[3])
y_max
= Box(x_min=pred_box[0],
box_2 =pred_box[1],
y_min=pred_box[2],
x_max=pred_box[3])
y_max
=box_1,
calculate_iou(box_1=box_2,
box_2=False)
return_intersection_box= time.time()
end_time_one_by_one_iou = round(end_time_one_by_one_iou - start_time_one_by_one_iou, 5)
total_time_one_by_one_iou
print(f"[INFO] Time taken for {NUM_SAMPLES} samples using one by one IoU calculations: {total_time_one_by_one_iou}s")
# Calculate the total number of boxes calculated on
= len(pred_boxes) * len(gt_boxes) * NUM_SAMPLES
total_boxes print(f"[INFO] Total boxes calculated on (per function): {total_boxes}")
print()
# Save the results to a dictionary
= {"batch_iou_time": total_time_batch_iou,
times "one_by_one_iou_time": total_time_one_by_one_iou}
= times results_dict[NUM_SAMPLES]
[INFO] Time taken for 1 samples using batched IoU calculations: 0.0s
[INFO] Time taken for 1 samples using one by one IoU calculations: 0.00181s
[INFO] Total boxes calculated on (per function): 500
[INFO] Time taken for 10 samples using batched IoU calculations: 0.0s
[INFO] Time taken for 10 samples using one by one IoU calculations: 0.01475s
[INFO] Total boxes calculated on (per function): 5000
[INFO] Time taken for 100 samples using batched IoU calculations: 0.001s
[INFO] Time taken for 100 samples using one by one IoU calculations: 0.14879s
[INFO] Total boxes calculated on (per function): 50000
[INFO] Time taken for 1000 samples using batched IoU calculations: 0.012s
[INFO] Time taken for 1000 samples using one by one IoU calculations: 1.47221s
[INFO] Total boxes calculated on (per function): 500000
[INFO] Time taken for 10000 samples using batched IoU calculations: 0.118s
[INFO] Time taken for 10000 samples using one by one IoU calculations: 14.91115s
[INFO] Total boxes calculated on (per function): 5000000
CPU times: user 16.6 s, sys: 79.1 ms, total: 16.7 s
Wall time: 16.7 s
Woah! Looks like the batch IoU calculations perform far quicker than the one by one calculations.
Let’s write some code to confirm which is faster and by how much.
# Find the faster times for each number of samples
for num_samples, results in results_dict.items():
= min(results.values())
min_time for key, value in results.items():
if value == min_time:
= key
faster_key else:
= key
slower_key
print(f"[INFO] Using {faster_key} for {num_samples} samples is {round(times[slower_key] / times[faster_key], 2)}x faster than using {slower_key}")
[INFO] Using batch_iou_time for 1 samples is 126.37x faster than using one_by_one_iou_time
[INFO] Using batch_iou_time for 10 samples is 126.37x faster than using one_by_one_iou_time
[INFO] Using batch_iou_time for 100 samples is 126.37x faster than using one_by_one_iou_time
[INFO] Using batch_iou_time for 1000 samples is 126.37x faster than using one_by_one_iou_time
[INFO] Using batch_iou_time for 10000 samples is 126.37x faster than using one_by_one_iou_time
Ok ok…
It’s clear batch_iou
is the winner here.
On Google Colab, I noticed a 100x+ speed up using batch_iou
versus calculate_iou
(speedup factors will depend on the hardware you’re using).
How about we visualize the results in a plot?
First, we’ll check out our results_dict
.
results_dict
{1: {'batch_iou_time': 0.0, 'one_by_one_iou_time': 0.00181},
10: {'batch_iou_time': 0.0, 'one_by_one_iou_time': 0.01475},
100: {'batch_iou_time': 0.001, 'one_by_one_iou_time': 0.14879},
1000: {'batch_iou_time': 0.012, 'one_by_one_iou_time': 1.47221},
10000: {'batch_iou_time': 0.118, 'one_by_one_iou_time': 14.91115}}
Beautiful!
Now let’s write some code to visualize our results.
import matplotlib.pyplot as plt
# Get data for plot
= list(results_dict.keys())
x_values = [value["one_by_one_iou_time"] for value in results_dict.values()]
one_by_one_times = [value["batch_iou_time"] for value in results_dict.values()]
batch_times
# Create x-axis positions
= np.arange(len(x_values))
x_positions = 0.35 # width of the columns
width
# Create figure and bars
= plt.subplots(figsize=(10, 7))
fig, ax = ax.bar(x=x_positions - width/2,
bars_one_by_one =one_by_one_times,
height=width,
width="One by One IoU",
label="blue")
color= ax.bar(x=x_positions + width/2,
bars_batch =batch_times,
height=width,
width="Batch IoU",
label="green")
color
# Add times to the top of each column
for bar in bars_one_by_one + bars_batch:
= bar.get_height()
height_of_bar f"{height_of_bar:.4g}", # :.4g = use 4 significant figures total, :.4f = use 4 significant figures after the decimal
ax.annotate(=(bar.get_x()+ bar.get_width()/2, height_of_bar),
xy=(0, 3),
xytext="offset points",
textcoords="center",
ha="bottom")
va
# Setup figure attributes
ax.set_xticks(x_positions)str(x) for x in x_values])
ax.set_xticklabels(["Number of samples")
ax.set_xlabel("Time (seconds)")
ax.set_ylabel("Computation time for different functions and sample size")
ax.set_title(
ax.legend()
fig.tight_layout() plt.show()
Summary
What an effort!
Consider IoU learned about!
Let’s summarize:
- IoU = a metric which compares how much two boxes (e.g. a prediction box and a ground truth box) overlap
- Higher IoU = better, a value of 1.0 is a perfect overlap where as a value of 0.0 is no overlap at all
- It’s good to visualize different levels of IoU thresholds on your images so you know what different kinds of boxes look like (e.g. your use case may call for a higher IoU than standard metrics consolidate for)
- For faster calculations, if you can, batch them with NumPy, you might see speedups of 100x or more
Extensions
- For more on the topic of bounding boxes and the different formats they come in, see A Guide to Bounding Box Formats and How to Draw Them.
- For an end-to-end example of building an object detection model and customizing it for your own dataset, see the Object Detection with Hugging Face Transformers Tutorial.