Object Detection - Kleber Hub

Task: Object detection using pre-trained model.

Scenario: You need to recognize objects (from image files) and indicate them graphically from the original file.

📥 Dependency: pip install torch torchvision matplotlib pillow

Python

import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
from torchvision.models._meta import _COCO_CATEGORIES
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from pathlib import Path

# Load pre-trained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Load and preprocess the image
image_path = Path('dirt-bike.jpg')
image = Image.open(image_path).convert("RGB")
image_tensor = F.to_tensor(image)

# Perform object detection
with torch.no_grad():
    prediction = model([image_tensor])

# Visualize the result
def plot_with_boxes(image, prediction):
    fig, ax = plt.subplots(1, figsize=(12, 6))
    ax.imshow(image)

    # Get the predicted bounding boxes, labels, and scores
    boxes = prediction[0]['boxes']
    labels = prediction[0]['labels']
    scores = prediction[0]['scores']

    # Only display boxes with score > 0.5
    threshold = 0.5
    for i, box in enumerate(boxes):
        if scores[i] > threshold:
            cat = labels[i]
            rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], linewidth=2, edgecolor="C{}".format(cat), facecolor='none')
            ax.add_patch(rect)
            label = _COCO_CATEGORIES[cat]  # Get class label
            score = scores[i].item()  # Convert tensor to float
            ax.text(box[0], box[1] - 10, f'{label}: {score:.2f}', color='black', fontsize=10, bbox=dict(facecolor="C{}".format(cat), alpha=0.5))

    plt.show()

# Plot the image with bounding boxes
plot_with_boxes(image, prediction)

PyTorch

PyTorch is an open-source deep learning library developed by Facebook’s AI Research lab (FAIR) that provides tools for building and training machine learning models, particularly for tasks involving neural networks. PyTorch is highly regarded for its flexibility, ease of use, and strong community support, making it one of the most popular libraries for both research and industry-level AI applications.

Common Use Cases

Computer Vision: Image classification, object detection, and segmentation tasks, often leveraging pre-trained models from PyTorch’s torchvision library.
Natural Language Processing (NLP): Tasks such as sentiment analysis, text generation, and translation, facilitated by libraries like Hugging Face’s transformers, which work seamlessly with PyTorch.
Reinforcement Learning: PyTorch is commonly used for developing RL models due to its flexibility and ease in implementing custom environments and policies.
Generative Models: Models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) are often built using PyTorch due to its dynamic graph nature, which is well-suited for these complex models.

Advantages of PyTorch

Pythonic and Intuitive: PyTorch integrates well with the Python ecosystem, making it easier for developers to write code that feels natural and is easy to debug.
Great for Research: Its dynamic computation graph makes it an excellent choice for research purposes, where models are often more experimental and require flexibility.
Large Ecosystem and Community: With a rich set of libraries (like torchvision for vision tasks, torchaudio for audio tasks, and torchtext for NLP tasks) and a growing community, PyTorch offers a robust ecosystem for developing AI solutions.