Project 1: Images of the Russian Empire

Background

Sergei Mikhailovich Prokudin-Gorskii, a famous photographer who won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw. ALthough there was not a way to print color photographs in his era, he came up with a simple yet genious idea - record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Later, the revolution began and right after, he left Russia in 1918, never to return again. Fortunately, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress (LoC). The LoC has recently digitized the negatives and made them available online.

The goal of this project is to automate the production of a color image given the three channels of the same image, in particular the digitized Prokudin-Gorskii glass plate images!

Methodology: Single Scale Alignment

In this project, only x-y translation model is applied due to the interest of time. In accordance to project guidances, I used the blue channel as the reference channel, subsequently align the red and green channels to the blue channel. I will explain the my attempt in producing a color image with few visual artifacts.

  1. Baseline: simple put the channels together without any preprocessing or translation
  2. Translation using NCC: a brute force algorithm which searches over the displacement window of [-15, 15] x [-15, 15]. Normalized Cross Correlation (NCC) is used as the metric to compute the similarity of the shifted channel and the reference channel.
  3. Translation with inner pixels and auto-cropping: With the same algorithm that search over a displacement window, I implemented an auto-cropping function that detects color borders uniformly. It extracts pixel values from the edges (top, bottom, left, right) of all RGB channels. Then it calculates the standard deviation (s.d.) of each edge for all channels. The idea is that if the s.d. is low across all channels, i.e. the color isn't changing much, I can treat them as color borders and crop them. Additionally, I used the inner 80% pixels for the calculation of ncc metric, which makes the process faster and disregard the noisy borders that failed to be recognize by the auto-cropping function.
  4. The idea of Bells and Whistles is applied between the second and the third image. In particular, automate border cropping makes the image more sharp in lower right corner. Albeit it fails to detect the noisy borders in the top left corner. On the other hand, automatic white balancing and contrasting didn't seem to affect the image much. I believe it's because of the naive methods being used, and including techniques such non-linear functions or other ways to estimate the illuminant can definitely boost the performance.
Baseline w/no alignment
Baseline w/ no alignment
Red [0, 0], Green: [0, 0]
NCC translation w/no preprocessing
NCC translation w/ no preprocessing
Red [-1, 1], Green: [-1, 7]
Translation with preprocessing
Translation with preprocessing
Red [2, 5], Green: [3, 12]

Methodology: Multi Scale Alignment (Image Pyramid)

For higher-resolution glass plate scans in .tif format, the exhausive search becomes prohibitively expensive with a larger displacement search window in proportion to its size. Yet if I keep a small displacement window [-15, 15] x [-15, 15] like I did in single scale alignment, it is likely to miss the optimal displacement vector. As a result, the pyramid search algorithm was used which downscales the image, and it allows me to update the optimal displacement layer by layer. The process was fairly simple: keep downscaling the image by a order of 2 until it is less than 300 pixels (same as the size of previous .jpg images). Then starting from the coarsest scale and going down the pyramid, I compute the optimal displacement at each layer by calling the single_scale_align function and scales up by 2 in the following layer. This process is continued until the image is in its original size.

Baseline
Baseline
Red [0, 0], Green: [0, 0]
Translation w/image pyramid
Translation w/image pyramid
Red [13, 178], Green: [10, 82]

For higher-resolution glass plate scans in .tif format, the exhausive search becomes prohibitively expensive with a larger displacement search window in proportion to its size. Yet if I keep a small displacement window [-15, 15] x [-15, 15] like I did in single scale alignment, it is likely to miss the optimal displacement vector. As a result, the pyramid search algorithm was used which downscales the image, and it allows me to update the optimal displacement layer by layer. The process was fairly simple: keep downscaling the image until it is less than 300 pixels (same as the size of previous .jpg images). Then starting from the coarsest scale and going down the pyramid, I compute the optimal displacement at each layer by recursively calling the single_scale_alignment function and scales up by 2 in the following layer. This process is continued until the image is in its original size.

In the following section, I display the result of my final algorithm on 12 other example images. A few images still appear to be messy, which is expected given that I used a uniform and rather naive approach for all images. In particular, I think having a different feature like edge alignment or transformation like rotation can make things more interesting. In the interest of time, this is all that's available for display. One last note for future work is that each image takes around 10 minutes. Although it is still in the speed of minutes, a more vectorized calculation and parallelzation can help in reducing the time spent.

Results on Provided Examples

monastery
Red [2, -3], Green: [2, 3]
tobolsk
Red [2, 3], Green: [3, 6]
onion_church
Red [36, 108], Green: [26, 52]
lady
Red [34, 125], Green: [-12, 43]
church
Red [-4 58], Green: [4, 25]
emir
Red [-286, 96], Green: [24, 49]
harvesters
Red [14, 124], Green: [17, 60]
icon
Red [23, 90], Green: [17, 41]
sculpture
Red [-27, 140], Green: [-11, 33]
self_portrait
Red [37, 176], Green: [29, 79]
three_generations
Red [11, 112], Green: [14, 53]
train
Red [32, 87], Green: [6, 42]