The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible.
All input images are simply divided into slices of 1/3 of the original height, and the L2 norm is used to evaluate the alignment performance. The red and green channels are aligned to the blue channel as the reference. However, the results of this direct alignment are surprisingly poor, as the slices obtained by equally dividing the image have significant black borders around them, which severely affects the alignment accuracy.
The issues encountered include:
To avoid interference from the original black borders on the alignment algorithm, I applied the same proportional cropping to all channels, using only the central portion of each channel for alignment while ignoring the rest. There are various options for this cropping ratio, but to avoid the black borders while retaining enough image information, I chose a 50% cropping rate (in fact, changing this ratio to other similar values will not affect the final alignment result).
After implementing the improvements, my algorithm achieved precise and fast alignment for small images:
G offset: (5, 2), R offset: (12, 3)
G offset: (-3, 2), R offset: (3, 2)
G offset: (3, 3), R offset: (7, 3)
An image pyramid is a multi-scale representation of an image, often used in image processing tasks such as alignment, compression, and object detection. The idea is to create a series of progressively smaller images, each formed by downscaling the original image. These smaller images are stacked like a pyramid, with the original image at the base and the smallest image at the top.
There are two main types of image pyramids:
Gaussian Pyramid: Each layer is created by applying a Gaussian blur to the previous layer and then downsampling it. This type of pyramid is often used in coarse-to-fine image alignment or when reducing image detail at different scales.
Laplacian Pyramid: This pyramid is formed by subtracting each Gaussian layer from the layer above it, effectively capturing the difference in image detail between scales. Laplacian pyramids are useful for image compression and blending, as they represent image details in a more compact form.
Image pyramids allow for faster processing by enabling algorithms to operate on smaller, less detailed versions of the image before refining results on higher-resolution layers. This technique is particularly effective in tasks that require multi-scale analysis or iterative refinement, such as object tracking, image stitching, and alignment.
For small images, the current algorithm has achieved ideal results. However, for high-resolution images, the algorithm takes a considerable amount of time due to exhaustive search. Based on suggestions, I adopted the image pyramid technique to improve the alignment process for high-resolution images. The detailed steps are as follows:
Using these techniques, the algorithm now only takes 2 to 3 seconds to align a 3k-4k resolution image.
G offset: (42, 6), R offset: (86, 32)
G offset: (24, 4), R offset: (58, -4)
G offset: (58, 16), R offset: (124, 14)
G offset: (40, 18), R offset: (90, 22)
G offset: (82, 10), R offset: (178, 14)
G offset: (52, 26), R offset: (108, 36)
G offset: (34, -10), R offset: (140, -26)
G offset: (78, 28), R offset: (176, 36)
G offset: (50, 14), R offset: (110, 12)
G offset: (50, 8), R offset: (108, 12)
G offset: (48, 24), R offset: (0, -332)
Through the optimizations in Step 2, I have established a fast image alignment algorithm, but the alignment effect for emir.tif turned out to be surprisingly poor, with the offset of the red channel clearly miscalculated.
Upon reviewing the original channels, I found that the three original channels exhibited completely different brightness levels, and the existing SSD-based alignment method could not handle such images. Therefore, I applied histogram equalization to balance the brightness of the three channels, and then used the structural similarity (SSIM) method for alignment.
Brightness Equalization of Three Channels:
The Structural Similarity Index (SSIM) between two images \(x\) and \(y\) is defined as:
\[SSIM(x, y) = \frac{(2\mu_x \mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)}\]where:
The final result: g_offset: (48, 22), r_offset: (102, 40)
Here is a Markdown explanation of the white balance adjustment portion of your code, written with LaTeX formulas:
The apply_white_balance
function is designed to correct the color balance of an image so that the colors appear more natural and consistent. The process involves adjusting each color channel to match the average intensity of all channels. Here’s a step-by-step breakdown:
Convert Image to Float Precision:
The image is first converted to a float32
type to ensure precision during the adjustment process.
Calculate Average Intensity for Each Channel: For each color channel (Red, Green, Blue), compute the average pixel value:
\[\text{avg}_r = \frac{1}{N} \sum_{i=1}^{N} \text{image}[i, j, 0]\] \[\text{avg}_g = \frac{1}{N} \sum_{i=1}^{N} \text{image}[i, j, 1]\] \[\text{avg}_b = \frac{1}{N} \sum_{i=1}^{N} \text{image}[i, j, 2]\]where \(N\) is the total number of pixels in the image, and \(\text{image}[i, j, 0]\), \(\text{image}[i, j, 1]\), and \(\text{image}[i, j, 2]\) represent the pixel values for the Red, Green, and Blue channels, respectively.
Calculate the Overall Average Intensity:
\[\text{avg}_{\text{gray}} = \frac{\text{avg}_r + \text{avg}_g + \text{avg}_b}{3}\]Adjust Each Channel: Scale each channel so that its average matches the overall average intensity:
\[\text{image}[:, :, 0] \leftarrow \text{image}[:, :, 0] \times \frac{\text{avg}_{\text{gray}}}{\text{avg}_r}\] \[\text{image}[:, :, 1] \leftarrow \text{image}[:, :, 1] \times \frac{\text{avg}_{\text{gray}}}{\text{avg}_g}\] \[\text{image}[:, :, 2] \leftarrow \text{image}[:, :, 2] \times \frac{\text{avg}_{\text{gray}}}{\text{avg}_b}\]Clip Values and Convert Back to uint8
:
Finally, ensure that pixel values are within the valid range \([0, 255]\) and convert the image back to uint8
type:
This method effectively balances the colors by equalizing the average intensity across all color channels, leading to a more visually consistent image.
After center cropping to remove borders, performing white balance, and enhancing saturation, the optimized result was obtained.