Friday

Bird View Image from Images Stiching by ML

 

                                        Photo by Marcin Jozwiak

Creating a top-level bird view diagram of a place with object detection involves several steps. Here's a

high-level overview:

1. Camera Calibration:

- Calibrate each camera to correct for distortion and obtain intrinsic and extrinsic parameters.

NVIDIA DeepStream SDK primarily focuses on building AI-powered video analytics applications,

including object detection and tracking, but it doesn't directly provide camera calibration functionalities

out of the box. Camera calibration is typically a separate process that involves capturing images of a

known calibration pattern (like a checkerboard) and using those images to determine the camera's

intrinsic and extrinsic parameters.

Here's a brief overview of how you might approach camera calibration using OpenCV along with some

general guidance:

1. Capture Calibration Images:

   - Capture a set of images of a known calibration pattern from different camera angles.

2. Install OpenCV:

   - Ensure that OpenCV is installed on your system. You can install it using:

     ```bash

     pip install opencv-python

     ```

3. Camera Calibration Script:

   - Write a Python script to perform camera calibration using OpenCV. This script will load the

calibration images, detect the calibration pattern, and compute the camera matrix and distortion

coefficients.

   ```python

   import cv2

   import numpy as np


   # Prepare object points, like (0,0,0), (1,0,0), (2,0,0), ..., (6,5,0)

   objp = np.zeros((6*9, 3), np.float32)

   objp[:, :2] = np.mgrid[0:9, 0:6].T.reshape(-1, 2)


   # Arrays to store object points and image points from all images.

   objpoints = []  # 3D points in real world space

   imgpoints = []  # 2D points in image plane.


   # Load calibration images and find chessboard corners

   images = [...]  # List of calibration images


   for fname in images:

       img = cv2.imread(fname)

       gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)


       # Find the chess board corners

       ret, corners = cv2.findChessboardCorners(gray, (9, 6), None)


       # If found, add object points, image points (after refining them)

       if ret:

           objpoints.append(objp)

           corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)

           imgpoints.append(corners2)


   # Calibrate the camera

   ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

   ```

4. Save Calibration Parameters:

   - Save the obtained camera matrix (`mtx`) and distortion coefficients (`dist`) for later use.

   ```python

   np.savez('calibration_params.npz', mtx=mtx, dist=dist)

   ```

Now, you can use the obtained calibration parameters in your DeepStream application. When setting

up your camera pipeline, apply the distortion correction using the calibration parameters.

Remember to adapt this script to your specific use case and integrate it into your workflow as needed.

Additional help on camera calibration:

Camera calibration involves determining the intrinsic and extrinsic parameters of a camera to correct

distortions in the images it captures. Here's a step-by-step guide using OpenCV in Python. 


Step 1: Capture Calibration Images

Capture several images of a chessboard pattern from different camera angles. Ensure the chessboard

is visible in each image.

Step 2: Install OpenCV

Make sure you have OpenCV installed. You can install it using:

```bash

pip install opencv-python

```

Step 3: Write Camera Calibration Script

```python

import numpy as np

import cv2

import glob


# Chessboard dimensions (inner corners)

pattern_size = (9, 6)


# Prepare object points, like (0,0,0), (1,0,0), (2,0,0), ..., (6,5,0)

objp = np.zeros((np.prod(pattern_size), 3), dtype=np.float32)

objp[:, :2] = np.mgrid[0:pattern_size[0], 0:pattern_size[1]].T.reshape(-1, 2)


# Arrays to store object points and image points

objpoints = []  # 3D points in real world space

imgpoints = []  # 2D points in image plane


# Load calibration images

images = glob.glob('calibration_images/*.jpg')


for fname in images:

    img = cv2.imread(fname)

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)


    # Find chessboard corners

    ret, corners = cv2.findChessboardCorners(gray, pattern_size, None)


    if ret:

        objpoints.append(objp)

        imgpoints.append(corners)


# Calibrate camera

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)


# Save calibration parameters

np.savez('calibration_params.npz', mtx=mtx, dist=dist)

```

Step 4: Apply Calibration to Images

```python

# Load calibration parameters

calibration_data = np.load('calibration_params.npz')

mtx, dist = calibration_data['mtx'], calibration_data['dist']


# Undistort an example image

example_image = cv2.imread('calibration_images/example.jpg')

undistorted_image = cv2.undistort(example_image, mtx, dist, None, mtx)


# Display original and undistorted images

cv2.imshow('Original Image', example_image)

cv2.imshow('Undistorted Image', undistorted_image)

cv2.waitKey(0)

cv2.destroyAllWindows()

```

In this example:

- `objpoints` are the 3D points of the real-world chessboard corners.

- `imgpoints` are the 2D image points corresponding to the corners found in the images.

- `cv2.calibrateCamera` calculates the camera matrix (`mtx`) and distortion coefficients (`dist`).

- `cv2.undistort` corrects the distortion in an example image using the obtained calibration parameters.


Remember to replace 'calibration_images/' with the path to your calibration images. You can use the

undistorted images in your further computer vision applications.

2. Image Stitching:

- Use OpenCV or other stitching libraries to combine images from multiple cameras into a panoramic

view.

While OpenCV's stitching module isn't designed specifically for 360-degree images, it can be used to

stitch together overlapping images with some considerations:

Key Challenges:

Distortion: 360-degree images often have significant distortion, especially near the poles, which can

make feature detection and alignment challenging for OpenCV's algorithms.

Field of View: Stitching images with a full 360-degree field of view requires careful handling of

wraparound areas where the edges of the panorama meet.

Here is a high level of stitching API code example 
https://docs.opencv.org/4.x/d8/d19/tutorial_stitcher.html
Another with Python https://github.com/OpenStitching/stitching
And Lastly not last https://pyimagesearch.com/2018/12/17/image-stitching-with-opencv-and-python/
- Apply an object detection model (such as YOLO, SSD, or Faster R-CNN) on each stitched image to

3. Object Detection:

identify objects like humans, forklifts, etc.

To apply object detection using a pre-trained model (e.g., YOLO, SSD, Faster R-CNN) on stitched

images, you'll typically follow these steps:

1. Install Required Libraries:

   Make sure you have the necessary libraries installed. For example, you can use the `cv2` (OpenCV)

library for image processing and the `tensorflow` library for working with deep learning models.

   ```bash

   pip install opencv-python tensorflow

   ```

2. Load Pre-trained Model:

   Download a pre-trained object detection model. TensorFlow provides the TensorFlow Object Detection API that

supports various models. You can choose a model from the

[TensorFlow Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md).

   Here's an example using the EfficientDet model:

   ```python

   import cv2

   import tensorflow as tf


   # Load the pre-trained EfficientDet model

   model = tf.saved_model.load('path/to/efficientdet/saved/model')

   ```

3. Preprocess Image:

   Preprocess the stitched image before feeding it into the model. Resize the image, normalize pixel

values, and convert it to the required format.

   ```python

   def preprocess_image(image_path):

       image = cv2.imread(image_path)

       image = cv2.resize(image, (640, 480))  # Adjust the size based on your model requirements

       image = image / 255.0  # Normalize pixel values

       image = tf.convert_to_tensor(image, dtype=tf.float32)

       image = tf.expand_dims(image, 0)  # Add batch dimension

       return image

   ```

4. Run Object Detection:

   Use the pre-trained model to detect objects in the image.

   ```python

   def run_object_detection(model, image):

       detections = model(image)

       return detections

   ```

5. Postprocess Results:

   Parse the model's output to obtain bounding boxes, confidence scores, and class labels.

   ```python

   def postprocess_results(detections):

       boxes = detections['detection_boxes'][0].numpy()

       scores = detections['detection_scores'][0].numpy()

       classes = detections['detection_classes'][0].numpy().astype(int)

       return boxes, scores, classes

   ```

6. Draw Bounding Boxes:

   Draw bounding boxes on the original image based on the detected objects.

   ```python

   def draw_boxes(image, boxes, scores, classes):

       for i in range(len(boxes)):

           box = boxes[i]

           score = scores[i]

           class_id = classes[i]


           # Draw bounding box if confidence is high enough

           if score > 0.5:

               ymin, xmin, ymax, xmax = box

               ymin, xmin, ymax, xmax = int(ymin * height), int(xmin * width), int(ymax * height),

int(xmax * width)

               cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)

               cv2.putText(image, f'Class {class_id}, {score:.2f}', (xmin, ymin - 10),

cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)

   ```

7. **Display Results:**

   Display the image with drawn bounding boxes.


   ```python

   def display_image(image):

       cv2.imshow('Object Detection Result', image)

       cv2.waitKey(0)

       cv2.destroyAllWindows()

   ```


8. Complete Example:

   Here's how you can put it all together:

   ```python

   image_path = 'path/to/your/stitched/image.jpg'


   image = preprocess_image(image_path)

   detections = run_object_detection(model, image)

   boxes, scores, classes = postprocess_results(detections)


   original_image = cv2.imread(image_path)

   draw_boxes(original_image, boxes, scores, classes)

   display_image(original_image)

   ```


Make sure to replace `'path/to/efficientdet/saved/model'` with the actual path to your pre-trained model.

Adjust parameters such as image size and confidence threshold based on your requirements.

4. Perspective Transformation: - Apply a perspective transformation to correct the bird's-eye view. This involves mapping the

detected objects in the stitched image to a 2D plane.

Perspective transformation is crucial for obtaining a bird's-eye view. Here's how you can apply it using

OpenCV in Python:

```python

import cv2

import numpy as np


# Load the stitched image

stitched_image = cv2.imread('stitched_image.jpg')


# Define the source points (coordinates of the detected objects in the stitched image)

src_points = np.float32([[x1, y1], [x2, y2], [x3, y3], [x4, y4]])


# Define the destination points (coordinates on the 2D plane for the bird's-eye view)

dst_points = np.float32([[x1_dst, y1_dst], [x2_dst, y2_dst], [x3_dst, y3_dst], [x4_dst, y4_dst]])


# Compute the perspective transformation matrix

perspective_matrix = cv2.getPerspectiveTransform(src_points, dst_points)


# Apply the perspective transformation

birdseye_view = cv2.warpPerspective(stitched_image, perspective_matrix, (width, height)) 


# Display the original and bird's-eye view images

cv2.imshow('Original Image', stitched_image)

cv2.imshow('Bird\'s-Eye View', birdseye_view)

cv2.waitKey(0)

cv2.destroyAllWindows()

```

In this example:

- `src_points` are the source coordinates of the detected objects in the stitched image.

- `dst_points` are the destination coordinates on the 2D plane for the bird's-eye view.

- `cv2.getPerspectiveTransform` calculates the perspective transformation matrix.

- `cv2.warpPerspective` applies the perspective transformation to obtain the bird's-eye view.

Make sure to replace `x1, y1, ...` and `x1_dst, y1_dst, ...` with the actual coordinates of the detected

objects and their corresponding coordinates in the bird's-eye view. The `width` and `height` are the

dimensions of the output bird's-eye view image. Adjust these parameters based on your specific use

case.

5. Object Tracking (Optional):

   - Implement object tracking algorithms if you need to track the detected objects across frames.


Here is an example code https://github.com/dwnsingh/Object-Detection-in-Floor-Plan-Images

Overlaying detected objects on the bird's-eye view involves mapping the bounding boxes or contours of the objects from the original stitched image to the corresponding positions on the bird's-eye view. Here's how you can achieve this using OpenCV in Python:

1. Detect Objects:

   First, you need to detect objects in the original stitched image. You can use an object detection model or any method suitable for your use case.

   ```python

   # Assume you have detected objects and obtained their bounding boxes

   detected_objects = [(x1, y1, x2, y2), ...]  # (x1, y1) and (x2, y2) are the top-left and bottom-right coordinates of the bounding box

   ```

2. Apply Perspective Transformation:

   Before overlaying objects, apply the perspective transformation to the original image to get the bird's-eye view.

   ```python

   # Assuming you have the perspective_matrix from the previous step

   birdseye_view = cv2.warpPerspective(stitched_image, perspective_matrix, (width, height))

   ```

3. Overlay Objects:

   Map the bounding box coordinates of the detected objects from the original image to the bird's-eye view.

   ```python

   # Overlay detected objects on the bird's-eye view

   for obj in detected_objects:

       x1, y1, x2, y2 = obj  # Bounding box coordinates in the original image

       

       # Map the coordinates to the bird's-eye view using the perspective matrix

       mapped_coords = cv2.perspectiveTransform(np.array([[[x1, y1], [x2, y2], [x2, y2], [x1, y1]]], dtype=np.float32), perspective_matrix)


       # Draw the mapped bounding box on the bird's-eye view

       cv2.polylines(birdseye_view, [np.int32(mapped_coords)], isClosed=True, color=(0, 255, 0), thickness=2)

   ```

   In this code:

   - `cv2.perspectiveTransform` is used to map the coordinates of the bounding box from the original image to the bird's-eye view.

   - `cv2.polylines` is used to draw the mapped bounding box on the bird's-eye view.

4. Display Result:

   Finally, display the bird's-eye view with overlaid objects.

   ```python

   cv2.imshow('Bird\'s-Eye View with Objects', birdseye_view)

   cv2.waitKey(0)

   cv2.destroyAllWindows()

   ```

Ensure that the bounding box coordinates are correctly mapped using the perspective transformation matrix, and adjust the color, thickness, or other parameters based on your visualization preferences.

When dealing with overlapping objects in images, putting bounding boxes around them can be challenging. One common approach is to use non-maximum suppression (NMS) to eliminate redundant bounding boxes and keep only the most confident ones. Here's a general outline of the steps:

1. Object Detection:

   Run your object detection model on the image to obtain bounding boxes and confidence scores for each detected object.

2. NMS (Non-Maximum Suppression):

   Apply non-maximum suppression to filter out redundant bounding boxes. This involves selecting the bounding box with the highest confidence score and removing any other bounding boxes that have significant overlap with it.

   ```python

   def non_max_suppression(boxes, scores, threshold):

       # Sort bounding boxes by confidence score

       indices = np.argsort(scores)[::-1]

       keep = []

   

       while len(indices) > 0:

           i = indices[0]

           keep.append(i)

   

           # Calculate overlap with other bounding boxes

           overlaps = calculate_overlap(boxes[i], boxes[indices[1:]])

   

           # Remove bounding boxes with high overlap

           indices = indices[1:][overlaps < threshold]

   

       return keep

   ```

3. Draw Bounding Boxes:

   Draw bounding boxes on the image for the selected indices.

   ```python

   keep_indices = non_max_suppression(detected_boxes, confidence_scores, threshold=0.5)


   for i in keep_indices:

       box = detected_boxes[i]

       cv2.rectangle(image, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2)

   ```

   Adjust the threshold parameter based on your application's requirements. Higher values will result in more aggressive suppression, removing more overlapping bounding boxes.

4. Display Result:

   Display the image with drawn bounding boxes.

   ```python

   cv2.imshow('Objects with Bounding Boxes', image)

   cv2.waitKey(0)

   cv2.destroyAllWindows()

   ```

Keep in mind that non-maximum suppression is a critical step when dealing with overlapping objects. It helps to ensure that only the most relevant and confident bounding boxes are retained, reducing redundancy and improving the overall quality of object detection results.

Using segmentation instead of bounding boxes can be a valuable approach, especially if you want to capture the precise shape and boundaries of detected objects. It may enhance the accuracy of object representation in the bird's-eye view. However, the choice between bounding boxes and segmentation depends on the nature of your application and the specific requirements.

Advantages of Segmentation:

1. Precise Object Boundaries: Segmentation provides a more accurate representation of object boundaries, capturing finer details.

2. Improved Object Understanding: If understanding the object's shape and structure is crucial, segmentation can provide more detailed information.

Potential Challenges:

1. Increased Complexity: Implementing segmentation can be more complex than using bounding boxes.

2. Computational Cost: Segmentation might require more computational resources than bounding boxes, potentially impacting inference time.

If you choose to use segmentation, here's a general outline of the steps:

1. Object Segmentation:

   Use a segmentation model (such as a semantic segmentation or instance segmentation model) to obtain masks for each detected object.

2. Perspective Transformation:

   Apply the perspective transformation to the original image, similar to the bounding box approach.

   ```python

   birdseye_view = cv2.warpPerspective(stitched_image, perspective_matrix, (width, height))

   ```

3. Overlay Segmentation Masks:

   Overlay the segmentation masks on the bird's-eye view.

   ```python

   # Assuming you have segmentation_masks obtained from the segmentation model

   for mask in segmentation_masks:

       # Map the mask to the bird's-eye view using the perspective matrix

       mapped_mask = cv2.warpPerspective(mask, perspective_matrix, (width, height))

       

       # Overlay the mapped mask on the bird's-eye view

       birdseye_view[mask > 0] = mapped_mask[mask > 0]

   ```

4. Display Result:

   Display the bird's-eye view with overlaid segmentation masks.

   ```python

   cv2.imshow('Bird\'s-Eye View with Segmentation', birdseye_view)

   cv2.waitKey(0)

   cv2.destroyAllWindows()

   ```

Keep in mind that the computational cost of segmentation may vary based on the complexity of your segmentation model and the size of the images. It's recommended to profile the inference time and resource usage to ensure it meets your application's requirements.

We can plant to convert the detected objects' positions from individual camera coordinate systems to a global coordinate system. Here's a breakdown of the process:

1. Apply Person Detection (Bounding Box):

You've already covered this step with YOLO or another object detection model, obtaining bounding box coordinates for each detected person.

2. Read Center Point of Lower Bounding Box (Standpoint):

Calculate the center point of the lower bounding box. This will serve as a reference point for the person's location within the image.

3. Extrinsic Calibration of the Cameras:

Perform extrinsic calibration for each camera. This involves determining the relationship between each camera's coordinate system and a common reference coordinate system. Calibration can be done using techniques like camera calibration boards, known object points, or specialized calibration software.

4. Introduction into One Global Coordinate System:

Map the center points obtained from the lower bounding boxes to the global coordinate system established during the calibration process. This involves applying a transformation to convert camera-specific coordinates to a common global reference.

5. Match Both Camera Coordinate Systems to be in Global Coordinate System:

Align the coordinate systems of all cameras to the global coordinate system. This may involve rotation, translation, and scaling transformations based on the extrinsic calibration parameters obtained earlier.

Code Example (using OpenCV for Extrinsic Calibration):

```python

import cv2

import numpy as np


# Example extrinsic calibration for two cameras

# Define known 3D points (e.g., corners of a calibration board)

obj_points = np.array([[0, 0, 0], [1, 0, 0], [0, 1, 0], [1, 1, 0]], dtype=np.float32)


# Corresponding 2D image points for each camera

img_points_cam1 = np.array([[x1, y1], [x2, y2], [x3, y3], [x4, y4]], dtype=np.float32)

img_points_cam2 = np.array([[x1', y1'], [x2', y2'], [x3', y3'], [x4', y4']], dtype=np.float32)


# Camera matrices and distortion coefficients

camera_matrix_cam1 = np.array([[focal_length_cam1, 0, cx_cam1], [0, focal_length_cam1, cy_cam1], [0, 0, 1]])

camera_matrix_cam2 = np.array([[focal_length_cam2, 0, cx_cam2], [0, focal_length_cam2, cy_cam2], [0, 0, 1]])


dist_coeffs_cam1 = np.array([k1_cam1, k2_cam1, p1_cam1, p2_cam1, k3_cam1])

dist_coeffs_cam2 = np.array([k1_cam2, k2_cam2, p1_cam2, p2_cam2, k3_cam2])


# Calibrate cameras

retval_cam1, rvec_cam1, tvec_cam1 = cv2.solvePnP(obj_points, img_points_cam1, camera_matrix_cam1, dist_coeffs_cam1)

retval_cam2, rvec_cam2, tvec_cam2 = cv2.solvePnP(obj_points, img_points_cam2, camera_matrix_cam2, dist_coeffs_cam2)


# Now, use rvec and tvec for further transformations

```

This is a simplified example, and the actual implementation would depend on your camera setup, calibration procedure, and the libraries you are using. The goal is to obtain the transformation matrices (`rvec` and `tvec`) for each camera.

Once you have these matrices, you can use them to transform the detected person's position from camera coordinates to a common global coordinate system. The exact transformation will depend on your specific calibration and coordinate system conventions.

Remember to adjust the parameters and code according to your specific camera setup and calibration process.

Bonus links:

This is a wonderful github article I have found on a related topic here surround-view-system-introduction/doc/en.md at master · hynpu/surround-view-system-introduction · GitHub

Another great article and tutorial from MathWorks here Create 360° Bird's-Eye-View Image Around a Vehicle - MATLAB & Simulink - MathWorks India

A related question graphics - Projection from 2D Camera view to 2D Bird Eye view - Stack Overflow

image processing - Generating a bird's eye / top view with OpenCV - Stack Overflow


No comments: