Sergei Mikhailovich Prokudin-Gorskii, a famous photographer who won Tzar's special permission to travel across
the vast Russian Empire and take color photographs of everything he saw. ALthough there was not a way to print color photographs in his era,
he came up with a simple yet genious idea - record three exposures of every scene onto a glass plate using a red, a green, and a blue filter.
Later, the revolution began and right after, he left Russia in 1918, never to return again. Fortunately, his RGB glass plate negatives, capturing
the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress (LoC). The LoC has recently digitized the
negatives and made them available online.
The goal of this project is to automate the production of a color image given the three channels of the same image, in particular the digitized
Prokudin-Gorskii glass plate images!
In this project, only x-y translation model is applied due to the interest of time. In accordance to project guidances, I used the blue channel
as the reference channel, subsequently align the red and green channels to the blue channel. I will explain the my attempt in producing a color image
with few visual artifacts.
- Baseline: simple put the channels together without any preprocessing or translation
- Translation using NCC: a brute force algorithm which searches over the displacement window of [-15, 15] x [-15, 15]. Normalized Cross Correlation
(NCC) is used as the metric to compute the similarity of the shifted channel and the reference channel.
- Translation with inner pixels and auto-cropping: With the same algorithm that search over a displacement window, I implemented an auto-cropping function
that detects color borders uniformly. It extracts pixel values from the edges (top, bottom, left, right) of all RGB channels. Then it calculates the
standard deviation (s.d.) of each edge for all channels. The idea is that if the s.d. is low across all channels, i.e. the color isn't changing much,
I can treat them as color borders and crop them. Additionally, I used the inner 80% pixels for the calculation of ncc metric, which makes the
process faster and disregard the noisy borders that failed to be recognize by the auto-cropping function.
- The idea of Bells and Whistles is applied between the second and the third image. In particular, automate border cropping makes the image more sharp in
lower right corner. Albeit it fails to detect the noisy borders in the top left corner. On the other hand, automatic white balancing and contrasting didn't
seem to affect the image much. I believe it's because of the naive methods being used, and including techniques such non-linear functions or other ways
to estimate the illuminant can definitely boost the performance.
For higher-resolution glass plate scans in .tif format, the exhausive search becomes prohibitively expensive with a larger displacement search window in proportion to its size.
Yet if I keep a small displacement window [-15, 15] x [-15, 15] like I did in single scale alignment, it is likely to miss the optimal displacement vector. As a result,
the pyramid search algorithm was used which downscales the image, and it allows me to update the optimal displacement layer by layer. The process was fairly simple: keep
downscaling the image by a order of 2 until it is less than 300 pixels (same as the size of previous .jpg images). Then starting from the coarsest scale and going down the pyramid, I compute the
optimal displacement at each layer by calling the single_scale_align function and scales up by 2 in the following layer. This process is continued until the image is in its original size.
For higher-resolution glass plate scans in .tif format, the exhausive search becomes prohibitively expensive with a larger displacement search window in proportion to its size.
Yet if I keep a small displacement window [-15, 15] x [-15, 15] like I did in single scale alignment, it is likely to miss the optimal displacement vector. As a result,
the pyramid search algorithm was used which downscales the image, and it allows me to update the optimal displacement layer by layer. The process was fairly simple: keep
downscaling the image until it is less than 300 pixels (same as the size of previous .jpg images). Then starting from the coarsest scale and going down the pyramid, I compute the
optimal displacement at each layer by recursively calling the single_scale_alignment function and scales up by 2 in the following layer. This process is continued until the image
is in its original size.
In the following section, I display the result of my final algorithm on 12 other example images. A few images still appear to be messy, which is expected given that I used a uniform
and rather naive approach for all images. In particular, I think having a different feature like edge alignment or transformation like rotation can make things more interesting.
In the interest of time, this is all that's available for display. One last note for future work is that each image takes around 10 minutes. Although it is still in the speed of minutes,
a more vectorized calculation and parallelzation can help in reducing the time spent.