Fast Registration of Tabular Document Images
Using the Fourier- Mellin Transform
Luke A. D. Hutchison
Graphics, Vision and Image Processing Lab
Brigham Young University
Provo, Utah, USA
lukeh@ email. byu. edu
William A. Barrett
Graphics, Vision and Image Processing Lab
Brigham Young University
Provo, Utah, USA
barrett@ cs. byu. edu
Abstract
A new technique is presented for quickly identifying
global affine transformations applied to tabular document
images, and to correct for those transformations. This tech-nique,
based on the Fourier- Mellin transform, is used to reg-ister
( align) a set of tabular documents to each other. Each
component of the affine transform is handled separately,
which dramatically reduces the total parameter space of the
problem. This method is robust, and deals with all compo-nents
of the affine transform in a uniform way. The Fourier-
Mellin transform is also extended to handle shear, which
can approximate a small amount of perspective distortion,
and to not need Blackman windowing for document images.
In order to limit registration to foreground pixels only, and
to eliminate Fourier “ edge effects”, a novel, locally adap-tive
foreground- background segmentation algorithm is in-troduced,
based on the median filter. An original method is
also presented for automatically obtaining blank document
templates from a set of registered document images. Finally,
image registration is demonstrated as a tool for compres-sion
of document images which share the same template.
1. Introduction
In many image processing and document archival fields,
the ability to register, or align, related images to one another
in the presence of various image transformations is impor-tant
[ 3, 21]. In image registration, one image is stretched
to align to another in its salient features, effectively undo-ing
the relative transformation between the two images [ 4].
The need for image registration arises when a set of im-ages
share features which need to be correlated to make
sense of the relationship between the images. Image reg-istration
is therefore important when dealing with a set of
images that shares a common template, e. g. in the case of
Figure 1. The pointwise average of 27 regis-tered
census document images
batches of document images written on tabular forms [ 24].
By first registering the images, more information is avail-able
for reliably determining which pixels in the image are
part of the original document form ( Figure 1), making table
structure recognition [ 9, 44, 38] and content extraction [ 8]
potentially more reliable.
2. Motivation
Many image collections in digital libraries con-tain
batches of documents sharing an identical tabu-lar
form, such as census records. The information in these
forms is often handwritten. Unfortunately, handwritten in-formation
is currently hard to automatically extract in a re-liable
way [ 39], so most extraction work for genealogy
is done manually by volunteers. For example, in October
2002, The Church of Jesus Christ of Latter- day Saints re-leased
for free access on their www. familysearch. org web
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
( a) Image well- aligned to extraction field ( b) Image not well aligned to extraction field
Figure 2. Example of manual extraction of date field from hospital death records, for genealogi-cal
indexing. Screenshots are from the UDE ( Universal Data Extractor) environment. The manually-designated
extraction field does not always align well with all images. ( Names have been changed.)
site [ 42] the complete US 1880 and Canadian 1881 cen-suses.
A total of 55 million names were extracted from
these two censuses, after a total contribution by volun-teers
of 15 million hours of time [ 43]. The resulting in-dexed
resource is invaluable for genealogists, because a list
of records matching a query may be returned almost in-stantaneously.
However, there is still much more that needs
to be done to make the huge collection of genealogi-cal
records in the world accessible online.
2.1. Motivation: Assisting Manual Extraction
The above extraction project was accomplished by dis-tributing
batches of images to volunteer extractors on CD,
paper, or sometimes microfilm, depending on the extrac-tors’
preference. A program called UDE ( Universal Data
Extractor) was built for those extracting from CD ( Fig-ure
2). UDE allows users to create a grid overlaying the
image that specifies where the table cells are. The program
then highlights each cell in turn during data entry. The user
also has a way to modify the grid bymoving it or skewing it,
because sometimes documents don’t line up with the tem-plate
( Figure 2( b)). All this has to be done by hand currently.
Automatic document image registration would streamline
and simplify this process, by aligning each document with
the template grid, allowing extractors to get on with the
work of extraction.
Even if the table cells in the document image are to be
read by computer rather than by a human ( in the case of
automatic indexing by OCR [ 12] or handwriting recogni-tion
[ 36, 28, 39]), it is useful to have the image straight-ened
out so that the table cells become axis- aligned rect-angles,
and so that the scale of all the images is exactly the
same. This means that the text recognition algorithms do not
need to be able to cope with potentially significant sources
of global image distortion between different images.
2.2. Motivation: Document Image Compression
The Church of Jesus Christ of Latter- day Saints cur-rently
has archived 2.3 million rolls of genealogical micro-film,
with each roll containing an average of 1300 images.
Around 60,000 new films are added each year. In order to
better preserve these films, and to make them readily acces-sible
over the Internet, one million of the rolls which are
likely to be most useful to genealogists are currently being
selected, and digitally scanned. The films are scanned at ap-proximately
200 dpi ( relative to the original page size) in 8-
bit grayscale, and with page sizes ranging from 8.5” ×11” to
17” ×20”. Images are typically 2000×3000 to 4000×5000
pixels in size. Uncompressed TIFF images therefore range
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
in size from 6- 20MB, averaging around 13MB. The mil-lion
selected rolls of microfilm alone would require ap-proximately
17 Petabytes to store in an uncompressed for-mat.
This would fill approximately 140,000 120GB hard
drives. Obviously, efficient image compression mechanisms
are important to a digital library project of this scale.
Between one third and one half of the images in the 2.3
million roll collection were written on some sort of tabular
form ( parish registers, census records, death records etc.).
Tabular records are often the most valuable to genealogists,
because of their information- rich nature, thus they are more
likely to be selected for the million- roll digitized collection.
We can exploit the repetitive nature of form templates in a
series of tabular documents, improving image compression
ratios by predictive encoding. This technique has been suc-cessfully
used before at a finer scale, for compression of
machine- printed text, by identifying glyphs on the page and
storing them symbolically [ 18]. Image registration would
allow us to find the form template that is shared between a
set of registered documents, and to just store it once for the
entire set of images. Each image would then be stored as a
residual difference from the template.
3. Related Work
The brute- force approach to image registration under
arbitrary affine transformation involves a search ( local or
global) through up to six dimensions of transform space
( scale: two dimensions; translation: two dimensions; ro-tation:
two dimensions; and shear: a possible further one
dimension). For images of any significant resolution, this
search space is unreasonably large. The brute- force ap-proach
is thus never used for image registration, and various
algorithms which exploit properties of the specific problem
domain of image registration have been developed to reduce
the total parameter space that needs to be searched.
The majority of previous work related to document im-age
registration has been in skew detection and correction,
e. g. correcting for rotational variation when a document is
not axis- aligned on a scanner. Most skew detection algo-rithms
do not easily extend to handle scale, translation or
other transformations. However, as most of the body of doc-ument
image registration literature deals with skew detec-tion,
it is described below. A description of full image reg-istration
techniques then follows in Section 3.5.
3.1. Skew Detection: Projection Histogram Meth-ods
A common strategy for finding the skew ( rotation) an-gle
of a document is to find the angle at which a projec-tion
histogram through the document has the strongest and
the best- separated peaks, as determined by some cost func-tion.
Many algorithms for skew angle detection by projection
histogram have been described in the literature. Postl [ 29,
30] describes a “ simulated skew scan” method, where a
cost function ( the premium) is maximized while scanning
through the document at a range of angles. Baird [ 2] reduces
each connected component to a single point and builds pro-jection
profiles from the set of these points – this algorithm
is designed to work well with machine- printed text where
each character is separate, as opposed to large tables filled
with handwriting, where almost the entire image may be a
single connected component.
3.2. Skew Detection: Hough Transform Methods
The Hough transform [ 13] is another method for deter-mining
document rotation [ 20]. The standard Hough trans-form
detects straight lines by mapping points in the ( x, y)
domain to sinusoidal curves in the ( ?, ?) domain. Local
maxima in the ( ?, ?) domain correspond to straight lines
in the original image.
Amin [ 1] describes a method where the image is thresh-olded,
and then groups of connected components are itera-tively
built, one scanline at a time. The Hough transform is
applied to one connected component in each group to de-termine
the skew angle. The algorithm is somewhat sim-ilar
to the algorithm of Hinds et al. [ 16], who use run-lengths
rather than connected components to choose points
for Hough analysis. Using larger connected features such as
this can reduce the computational complexity of the Hough
transform [ 27], but such block- based data reduction meth-ods
often don’t work well for tabular documents with hand-written
content, because so much of the document ends up
being connected.
In general, the Hough transform can be computationally
expensive and is sensitive to noise. There is always a trade-off
in accuracy of the recovered parameters and the size of
the accumulator array ( and thus the computational complex-ity).
Issues such as accumulator smoothing and accumula-tor
granularity have to be addressed during implementation.
3.3. Skew Detection: Vectorization Methods
Cao et al. [ 5] have developed a method for skew detec-tion
by fitting a straight line to the baseline of connected
components. Again, this is designed for machine print and
is not suited to tabular documents containing handwriting.
Nielsen [ 24] has developed a system for alignment of
tabular microfilm images, wherein a vector- based template
is extracted from the document, representing the lines of the
table upon which the document was written. A “ snapping
algorithm” is used to match one document’s vector tem-
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
plate to another. A consensus template is formed by com-bining
several snapped grids together, making the algorithm
more robust to noise. The production of the consensus tem-plate
also allows for the transformation mapping one image
to another to be found. This algorithm works well, but the
author assumes that there is no major rotation of the doc-ument
image away from axis alignment, because the algo-rithm
relies on axis- aligned projection histograms to find
table lines. The algorithm does allow for some variation in
scale, by a brute- force search through a small neighborhood
of the scale space.
3.4. Skew Detection: Other Methods
Some other various methods and features used for skew
angle detection include gradient direction [ 34], eigenvec-tors
of the covariance matrix [ 25, 35], texture directional
evidence analysis [ 33] and mathematical morphology [ 22].
A particularly interesting ( and generalized) approach makes
use of the Wigner- Ville distribution [ 17].
3.5. Full Image Registration Methods
It should be noted again that document skew detection
methods such as those described above are rarely designed
to detect scale, translation and shear components of the
transform that best maps one image to another. If such non-rotational
transformations are present in a set of images, a
separate algorithm is needed to correct for these compo-nents
of the transform once skew angle has been detected
and corrected for.
Garris and Grother [ 14] introduce a method for both rota-tional
and translational registration, based on detecting lin-ear
features in forms. They demonstrate the efficacy of their
method for a set of scanned forms. However, scale is not
detected because it is not a consideration for scanned docu-ments.
Digital microfilm images are usually produced with
a camera, and thus scale correction is a definite requirement.
Wolberg and Zokai [ 41] present a method for robustly
registering images under rotation and scale variations, by
warping an image in the image- space into log- polar form:
in this form, changes of rotation and scale become simple
shifts in x and y. The system does not handle translation
however, and the center for the log- polar warp has to be
chosen reliably for registration to work at all.
Algorithms also exist for registration of images in a hi-erarchical
“ coarse- to- fine” fashion [ 45, 40]. Wolberg and
Zokai’s method in [ 40] is particularly interesting, as it re-covers
the parameters of a perspective distortion by break-ing
down the images into tiles, and then finding the best
affine fit between pairs of tiles in the two images to be
registered. The perspective parameters are inferred by the
piecewise- affine match between the image.
Image registration for general ( non- affine) image warp
correction may be performed by identifying common fea-tures
between two images, and estimating the transform
model that maps one image to the other [ 46, 4]. These meth-ods
can be complex ( and invariant feature selection strate-gies
must be chosen), but highly- generalized techniques like
these should work well for specific problem subsets, such as
tabular document registration.
4. The Fourier- Mellin Transform
Kuglin and Hines [ 19] originally proposed that “ phase
correlation” be used for translational registration in the
frequency domain. Postl [ 29] described a Fourier- based
method for skew detection, using a radial line integral
through a 2D power spectrum to determine the rotation an-gle.
De Castro and Morandi [ 7] presented a two- step pro-cess,
first identifying the angle of rotation and then deter-mining
the translational shift. If the change of image scale
is also present, the images can be registered by combin-ing
the log- polar mapping of the power spectrum ( corre-sponding
to the Fourier- Mellin transform) with phase cor-relation
[ 6, 31, 10, 32].
The Fourier- Mellin transform is a method for determin-ing
the rotation, scale and translation that best maps one
image to another ( see [ 32, 37] for details). The algorithm
makes use of the Fourier shift theorem to eliminate spa-tial
translation from consideration while determining rota-tion
and scale in the frequency domain. The frequency do-main
is radially sampled at exponential intervals to create a
log- polar mapping of the frequency space, ( ln ?, ?). In this
mapping, rotation and scale are represented as simple lin-ear
shifts. One of the two remaining dimensions ( rotation
and scale) is then “ collapsed” by projection, leaving a sim-ple
1D correlation to determine the remaining transform.
The Fourier- Mellin transform is a powerful tool for reg-istering
images under simple combinations of scale, rotation
and translation. It also has several weaknesses, which are
analyzed in excellent detail by Stone and McGuire in [ 37].
One limitation for document image registration is that
the Fourier- Mellin transform, in its usual form, doesn’t han-dle
shear distortion ( meaning it is not a solution for reg-istration
under arbitrary affine transformations). ( See Fig-ure
3( a).) Shear correction is useful for approximating a
small amount of perspective distortion in photographed
documents.
A further limitation of the Fourier- Mellin transform is
that there can potentially be problems recovering rotation,
because the operation of “ wrapping” ( induced by the pe-riodicity
of the DFT) doesn’t commute with rotation ( Fig-ure
3( b)). Thus, high- magnitude “ wrapped” features in the
frequency space can interfere with the log- polar transform.
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
( a) Shear transform effect on frequency
domain— log- polar mapping is nonlinearly
distorted
( b) “ Wrapping” of features in frequency
space— note overlap of wrapped radial fea-tures
at top and bottom
( c) “ Edge effects” producing axis- aligned
features ( due to white- black transition
where edges wrap)
Figure 3. Standard problems with the Fourier- Mellin transform
The Fourier- Mellin Transform also requires the projec-tion
of the two dimensional ( ln ?, ?) space down to one di-mension
in the final step. As with any 2D correlation prob-lem,
reduction to two 1D correlations of projection his-tograms
can potentially create spurious matches.
Perhaps most severe, however, are “ edge effects” result-ing
from sharp, high- frequency transitions in image intensi-ties
where edges wrap. When this happens, axis- aligned ra-dial
features are created in the frequency domain, which can
throw off rotation angle detection ( Figure 3( c)). Usually, a
Blackman window [ 37] is applied to the Fourier transform
to try to curb some of these effects – a square subregion
of the image is chosen before applying the Fourier trans-form,
and frequencies that do not fall inside the incircle of
the region are set to zero. Everything that is not zeroed is
weighted as a Gaussian function of radius from the origin.
Unfortunately, such windowing requires that we do not use
the whole image for registration, making the algorithm po-tentially
sensitive to subregion selection.
5. New method for document image regis-tration
as a specialization of the Fourier-
Mellin Transform
In this paper we present a specialization of the Fourier-
Mellin transform which exploits the properties of tabular
document images to overcome most of the standard prob-lems
with the Fourier- Mellin transform. We demonstrate
this algorithm as a general- purpose registration tool for tab-ular
documents.
Many of the problems experienced with the Fourier-
Mellin transform, in the context of document image reg-istration,
occur because of the coloring of the image back-ground,
i. e. the parts of the image we are not interested in
registering ( the gray paper color, the potential black border
around a page, etc.). In order to accurately register docu-ment
images, we really need to completely remove this doc-ument
background. We introduce below a novel automatic
background removal algorithm for document images.
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
5.1. Preprocessing – automatic “ locally- adaptive”
background removal using the median filter
Typically, a document is written with dark ink on light
paper, so many foreground- background segmentation algo-rithms
attempt to find a segmentation threshold based on the
histogram of graylevel intensities [ 26]. Better algorithms
are locally- adaptive, choosing the segmentation threshold
according to statistical properties of the histogram of inten-sities
for a local neighborhood [ 23].
Our foreground- background segmentation algorithm is
not really a thresholding algorithm at all, it is more a “ back-ground
removal algorithm”. Simply described, we perform
median filtering of a document image to remove the smaller
foreground features ( writing, lines etc.). We then subtract
this “ background image” from the original image, leaving
just the foreground ( Figure 4).
Significantly, this background removal method does not
produce bitonal output, but leaves all of the subtle fore-ground
variations in intensity intact. If a bitonal image is
required, all that is required is a simple thresholding opera-tion
( Figure 4( d)), although the extra information available
in the grayscale foreground image ( Figure 4( c)) is valuable.
The method works because foreground pixels usually
form much less than half the pixels in any reasonably- sized
neighborhood in a document image. Of course the kernel
size has to be appropriately chosen for each image. In gen-eral
a kernel of radius 17 pixels was found to suffice for
full- page document images of up to 1600x1200. The ker-nel
size can scale linearly with the image size and achieve
the same effect. The median filter preserves sharp grayscale
transitions, but tends to round off corners of features in 2D
space, so it pays not to make the kernel much bigger than
needed; additionally, median filtering is time- consuming for
large kernels. If the kernel is too small, then parts of the doc-ument
that should be left as foreground will be absorbed as
background.
Fast algorithms for finding the median ( or the k- th largest
element) exist [ 11, 15]. Median filtering for background
removal can be sped up by working on a much lower-resolution
version of the image, as described in Section 7.4.
When the median image is subtracted from the original
image, if any pixel ends up negative, it is assumed to be
light- colored noise (“ salt noise”), and the difference is set
to be zero rather than negative, because it is assumed that
anything lighter than the background color has to be noise.
The amount of “ salt noise” in a neighborhood may be used
as an estimate of the amount of dark- colored “ pepper noise”
in a neighborhood, so that everything within a noise toler-ance
is set to pure white.
Most importantly for the problem of the Fourier- Mellin
transform, this form of background removal is able to distin-guish
between the black of the foreground ( the handwriting
or document form), and the black of the background ( the
border around a microfilm document image). Thus black
borders are stripped out automatically, and different back-ground
grayscale levels at opposite edges of an image are
normalized to white. Blackman windowing [ 37] is there-fore
not needed to eliminate the problem shown in Fig-ure
3( c), and the entire image ( not just an arbitrarily cho-sen
windowed portion of it) can be used in the registration
process. In this way, preprocessing with a median filter can
improve registration accuracy dramatically for the Fourier-
Mellin transform.
5.2. Recovery of rotation
Observe ( e. g. in Figure 3) that groups of parallel lines in
an image act as “ wavefronts”, and thus produce a pattern of
“ delta functions”, or peaks in magnitude in the frequency
domain, corresponding to the harmonics of the wavefronts.
This pattern of peaks radiates out in both directions in a
straight line through the origin of the frequency domain.
Rotation of the image produces a corresponding rotation
of these radial patterns. The most periodic linear compo-nents
of an image will therefore produce the strongest ra-dial
line integral over the power spectrum, at right angles
to the linear component. The most periodic components of
a tabular document are the lines forming the table itself.
This makes it easy to find the angle of rotation and shear
of the table – we just look for the two largest peaks in the
( smoothed) radial projection histogram of the power spec-trum.
This is more robust for this problem domain than the
general Fourier- Mellin case, where the 2D log- polar space
is collapsed down into a 1D correlation problem, because no
significant information that could prevent a spurious regis-tration
is lost in the projection. In addition, the intransitivity
of wrapping and rotation ( Figure 3( b)) does not adversely
affect the correct selection of rotation angle.
5.3. Recovery of shear
Interestingly, by looking for the two most significant
maxima in a radial projection histogram, we can now de-couple
the two image axes, since a rotation angle is returned
for each. Thus we can separately determine the rotation an-gle
( for the horizontal axis) and the shear angle from ver-tical
( for the vertical axis). Recovery of shear is something
that is not possible with the standard Fourier- Mellin trans-form,
because the log- polar space is nonlinearly distorted
in a shear operation ( Figure 3( a)). It is made possible here
by exploiting properties of this particular problem domain
( specifically, the two strong periodic components of tabu-lar
document templates). Some care does need to be taken
sheared images, however, because if there is any non- affine
perspective distortion, the radial pattern of maxima tends to
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
( a) Original document images; dimensions: 800×630; 1600×1100; 1200×980
( b) “ Background images” obtained from original images by application of the median filter, with a circular kernel of radius 17
( c) “ Foreground images” obtained by subtracting background images from originals ( both background shading and black border are gone)
( d) Foreground images, manually thresholded ( only provided for comparison)
Figure 4. New automatic background removal algorithm based on the median filter, used as a pre-processing
stage to the Fourier- Mellin transform. This algorithm can eliminate the problem shown in
Figure 3( c).
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
spread, so the recovery of scale ( as described below) may
not be able to be performed accurately by sampling in a sin-gle
straight line.
For rotation angles within ± 45 ? , the closest radial max-imum
to vertical is used as the rotation angle of the hor-izontal
lines in the table. If registration over a greater an-gle
range than that is required, then during the recovery of
scale ( below), the correct mapping of axes between the two
images must be determined, by finding the assignment that
produces the strongest total scale correlation.
The recovery of rotation and shear angles for a docu-ment
image allows the image to be axis- aligned, without
even considering other documents. The task of image reg-istration,
however, requires two or more images to be reg-istered
to one another. We arbitrarily choose the first im-age,
de- rotate it / de- shear it, and then use this as our “ refer-ence
image” to which the scale and translation of the other
images are matched. The first image may not be the opti-mal
choice for a scale and translation reference point, due
to image noise, low image quality, or non- ideal size. A bet-ter
system would use a consensus- based approach to ensure
the reference image was not badly chosen.
5.4. Recovery of scale
Scale recovery happens in log- polar coordinates, as in
the normal Fourier- Mellin transform. However, because we
are dealing with tabular document images, only the strongly
linear components of the image are important for registra-tion.
Thus determination of scale need only involve conver-sion
to log- polar form of the Fourier magnitudes along the
direction of the detected radial projection maxima. While a
radial projection is performed, we do not lose potential cor-relation
features by collapsing to one dimension in thisman-ner,
unlike the case with the projection of log- polar space in
the standard Fourier- Mellin transform.
We then end up with two log- polar projection his-tograms,
each one sampled at antilogarithmic intervals
along one of the radial axes recovered during rotation and
shear detection. The positive- and negative- frequencymag-nitudes
are summed, weighted by the size of the current in-terval,
and stored in a histogram. The logarithmic base can
be chosen so as to yield any desired accuracy in scale deter-mination.
A base of 1.001 gives ample accuracy for a range
of images.
The rest of scale recovery proceeds as normal for the
Fourier- Mellin transform. In log- r space, a translation by
log( S) corresponds to a change of scale by a factor of S:
r = r · S ( 1)
? log( r ) = log( r) + log( S). ( 2)
Thus determination of scale reduces to a 1D correlation
problem. It turns out that this works much better than try-ing
to dilate one histogram to match the other in non- log
space, because resampling a histogram in the frequency do-main
is problematic.
5.5. Recovery of translation
Now that rotation, shear and scale have been recovered,
the transformation matrix which inverts these transforms
can be easily determined, and the transformed image is
rendered into a temporary image buffer. The image in this
buffer now needs only to be aligned with the reference im-age
using 2- D correlation, which may be performed again
in the frequency domain, by pointwise multiplication with
the complex conjugate.
Although this final correlation step is the simplest oper-ation
of the whole process, an important observation must
be made: straight 2D correlation is not guaranteed to find
the right correlation offset without a good algorithm ( such
as that presented in Section 5.1) for removal of background,
and possible black borders around a page. Without this pre-processing
step, changes in intensity across the background
of the image can adversely affect the choice of a correla-tion
offset.
5.6. Quadratic parameter fitting
Quadratic parameter fitting was used in the neighbor-hood
of all recovered parameters to better achieve subpixel
accuracy in the results.
5.7. Summary
Based on the shift theorem of the Fourier transform, and
properties specific to the problem domain of tabular docu-ment
image registration, we can “ peel off” one translation
component at a time ( even more so than is possible with
the standard Fourier- Mellin transform), dramatically reduc-ing
the total parameter space of the problem.
6. Results
6.1. Pointwise mean
Figure 5 shows three datasets to which the registration
process was applied. The first and second image for each
dataset show the result of pointwise- averaging all the im-ages
in the dataset together before and after registration re-spectively.
The registered versions of the images are well-aligned
in the first and second instance; the third dataset
was well- aligned horizontally, but not vertically, because
the line spacing was irregular on some pages in the dataset
( probably due to the process used to print the census forms).
Figure 6( b) shows a closeup of the pointwise mean of a set
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
Figure 5. Results of image registration: [ Left] Pointwise mean of unregistered images; [ Middle] Point-wise
mean of registered images]; [ Right] Pointwise median of registered images, yielding a blank
form. Dataset sizes: [ Top] 445 images; [ Middle] 27 images; [ Bottom] 31 images.
of registered images. Note that the original document im-ages
were transformed with the final registration transfor-mation
to produce these images ( i. e. the versions with the
backgrounds removed were not used).
6.2. Pointwise median
The third image in each dataset shows the pointwise me-dian
of the set of registered images. By using the median
rather than the mean, we are able to recover a fairly clean
blank document template. Figure 6( c) shows a closeup of a
pointwise median image.
Some handwriting is still present in the pointwise me-dian
image. By taking a pointwise percentile value at a
higher percentile than the median ( i. e. the 50th percentile),
more of this handwriting can be excluded, at the expense of
template line strength ( Figure 6( d)).
Median filter background removal can of course also be
used to remove the background from the blank document
template obtained by taking the pointwise median.
6.3. Accuracy of algorithm
Qualitatively, the images registered very well in almost
all cases ( including test cases not presented here). In order
to quantitatively measure the accuracy of these methods, a
single image was taken and rotated through various random
( fractional) angles, and then separately scaled by various
random fractional amounts. Each transformed image was
then registered to the original, and then the detected trans-
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
( a) A single registered image ( b) Pointwise mean of 27 registered im-ages
( c) Pointwise median of 27 registered
images
( d) Pointwise percentile images for the 10th to the 90th percentile ( the 50th percentile image is the pointwise median image).
Figure 6. Pointwise mean and pointwise median images
formation parameter was compared to randomly chosen pa-rameter.
Over ten trials, the mean absolute error for the angle
of rotation and for the scale factor were measured. Results
were as follows.
Transformation Measured Accuracy
Rotation/ Shear 0.01 ?
Scale 0.035%
Translation Not determined
Recovery of rotation/ shear and scale parameters was
measured as being very accurate. The accuracy measured
was actually significantly greater than the amount of warp
that would be observed in a document image from almost
any source, due to lens distortion, printing distortion, or me-chanical
lag. The scale calculations accuracy, for example,
implies that a 3000- pixel wide image would have an aver-age
error in width of one pixel after scale correction.
For translation, the subpixel accuracy of correlation was
not measured, but it should be of a similar order of fineness
of fit to the other results ( parabolic fit is surprisingly good
at 2D subpixel correlation, for how simple it is). Subjective
analysis indicates a good level of accuracy in the final step
of recovering the translational offset.
6.4. Non- affine transforms
Often, document images have undergone non- affine
transformations, such as the perspective transform, or a
spherical transform due to lens warping. Some, but not
all, of this type of distortion may be corrected for us-ing
a shear, which is handled by the algorithm as de-scribed.
In order to test how well the algorithm performed un-der
non- affine conditions, digital photographs were taken of
a document from various angles and distances ( Figure 7).
( This document was particularly effective for frequency-domain
analysis because of its highly periodic nature.) No-tice
the ghosting at the edges of the registered image, par-ticularly
in the second dataset: significant perspective dis-tortion
was present in some of the images. However, the re-sulting
image is registered about as well as it could be with-out
handling perspective transformations too.
6.5. Run time
In one example run, ten images, 3045×2269 pixels in
size, were processed on a 2.2GHz P4 machine. Fourier anal-ysis
was performed using “ The Fastest Fourier Transform
in the West” ( www. fftw. org). Total time taken to register
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
( a) Dataset 1 ( Camera no more than 15 ? from the surface normal)
( b) Dataset 1, Pointwise mean before and after document image registration
( c) Dataset 2 ( Camera up to 30 ? from the surface normal)
( d) Dataset 2, Pointwise mean before and after document image registration
Figure 7. Results of image registration of two datasets under non- affine transform
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
the ten images was 8m54s ( approximately 53 seconds per
image). Around 90% of this time, however, was spent in
the final determination of translation, since a 2- D correla-tion
is required to find the translational offset. The runtime
may be reduced, with increased chance of erroneous regis-tration,
by using two 1- D correlations of the projection his-tograms
rather than one 2- D correlation. The measured time
also did not include time to median- filter the images – me-dian
filtering can increases the time by a factor of two to
three. However, as described in Section 7.4, the time taken
to perform background removal could be dramatically less-ened
by working on a low- res version of the image, and in-terpolating
when subtracting it from the image ( the median
filter can be smaller, too, for lower- resolution images). Run-time
could also be dramatically decreased by working with
images of approximately half these dimensions.
For comparison, a set of ten smaller images, 900×736
pixels in size, were registered together. Median filtering
with a kernel of size 17 took 8 seconds per image; rotation,
shear and scale together took about 1.5 seconds per image,
and translation took just under 1 second per image to deter-mine.
All in all, the entire process took around 12 seconds
per image.
Even in spite of these potential optimizations, it may
seem that the presented algorithm is extremely slow com-pared
to commonly- reported deskew algorithm statistics.
However, this algorithm is doing substantially more than a
plain deskew algorithm – it embodies a complete and accu-rate
registration solution for tabular document images.
7. Discussion
7.1. Robustness
Most steps in this algorithm are very robust. The least
robust part of the algorithm is in fact the simplest part: the
final step to recover the translational offset. As noted pre-viously,
it is very important that a good thresholding algo-rithm
be used to separate foreground from background, and
that large areas of black ( foreground color) surrounding the
page are removed prior to this correlation step. Failure to do
so can result in badly registered images.
Under purely- affine distortion, rotation and shear angles
are virtually always recovered correctly. If there is any per-spective
distortion in the image, then the radial ridge of
maxima in the frequency domain begins to spread, and al-though
the skew angles will be chosen somewhere in the
correct range, the linear log- r histogram used for scale cal-culation
may be determined wrong. Thus significant per-spective
transforms are currently not well handled as an
affine approximation. This could be improved by sampling
for scale over an angular window of the chosen rotation an-gle.
Also, most tabular documents have significant numbers
of strong horizontal lines ( for Roman alphabets), but a few
do not have strong vertical lines. Thus the vertical scale
can probably be trusted more than the horizontal scale. If
the value of the horizontal scale is unrealistic based on the
value of the vertical scale, the vertical scale could be used
for both.
7.2. Considerations in recovering rotation
It is assumed that images within a single dataset to be
registered are within ± 45 ? of one another in orientation,
thus in the determination of rotation the smallest rotation
needed to align each image to the reference image is per-formed.
If arbitrary rotations of documents are allowed, all
four possible axis- alignments ( i. e. 90 ? rotations) must be
performed to see which gives the best correlation magni-tude.
7.3. Image registration for predictive image com-pression
Image registration may yield advantages in document
image compression. We obtained a document template us-ing
the median filter, as in Figure 5, and subtracted this tem-plate
from each registered image ( Figure 8). The resulting
set of image, lacking the template, were approximately 14%
smaller when compressed using bzip2 than the originals. In
this way, image information which is shared by many pages
of a document may be stored once and removed from each
page which shares the information. This is an example of
“ predictive encoding” for image compression. Further gains
can be found by improving the subtraction of the template
from each image, by local snapping, or removal afterwards
of small connected components.
An interesting area that we have begun to explore is the
possibilities for image compression afforded by background
removal. The background of an image makes up the ma-jority
of its uncompressed size, and a significant part of its
compressed size, yet it contains little of informational value.
The median filter- based automatic background removal al-gorithm
presented in this paper allows the background to be
removed reliably without discreet foreground/ background
segmentation ( as is employed in DjVu/ JBIG2), and the
“ holes” in the resulting background image are filled in auto-matically
by the median filter. This could enabling a whole
new class of image compression algorithms to be devel-oped,
which store the background at a much lower reso-lution
than the foreground, if at all.
It should be noted that resampling a document image un-der
an affine transform ( as is done during registration) ac-tually
raises the entropy of the image, making it less com-
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
( a) Closeup of a single registered image
( b) Closeup of same image after subtraction of document template
Figure 8. Detail of document before and after
subtraction of document template
pressible. Adjusting contrast, thresholding, etc. can be used
to reduce the entropy again after resampling.
7.4. Optimization
All but the smallest correlation operations can be per-formed
faster by pointwise multiplication of the Fourier
transform of one image with the complex conjugate of the
Fourier transform of the other image, followed by finding
the largest maximum in the inverse Fourier transform of the
result. Fourier correlation was used in the implemenation of
this algorithm.
In addition, the final 2D correlation used to recover scale
may be performed as two separate 1- D correlations of the
projection histograms of the images ( which is much faster,
but has a greater chance of finding the wrong correlation
offset).
Both median filtering and the Fourier transform algo-rithm
are also highly parallelizable, allowing for significant
speedups in hardware or on multiprocessing architectures.
Also, many operations may be performed at a lower reso-lution
for speed. For example, the median filter background
removal process can be accomplished by working on a low-resolution
version of the background with a smaller median
kernel, and the result can then be scaled up using bilinear
or bicubic interpolation before it is subtracted from the full-resolution
image.
8. Future work
The current algorithm does a reasonable job of register-ing
documents under some degree of non- affine transfor-mation,
but since many microfilm documents contain slight
perspective or lens distortion, extending this technique to
correct for these distortionswould render it useful in a wider
variety of situations.
A consensus- based approach could be investigated to en-sure
the reference image was not badly chosen.
Methods for quickly and automatically determining op-timal
median filter parameters for a specific image or set
of images would be immensely useful, to ensure high im-age
qualities and low running times.
Further research should also be performed into the use
of image registration for document image compression.
9. Conclusions
We have presented an algorithm for fast registration
of tabular document images, based on the Fourier- Mellin
transform. This algorithm is robust and accurate in align-ing
tabular documents under affine transformations, and
does a reasonable job of aligning tabular documents un-der
non- affine transformations commonly found in docu-ment
photographs. A novel automatic background- removal
algorithm was introduced that enables many limitations of
the Fourier- Mellin transform to be overcome ( such as the
need for Blackman windowing) for document images. The
Fourier- Mellin transform was adapted to the specific prob-lem
domain of tabular image registration, and deals with all
components of the affine transform in a uniform way in the
frequency domain. The Fourier- Mellin transform is also ex-tended
to handle shear. Using a set of registered images, a
novel method for generation of blank forms was presented,
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
and application of this technique to document image com-pression
were discussed.
References
[ 1] A. Amin, S. Fischer, T. Parkinson, and R. Shiu. Fast algo-rithm
for skew detection. In IS& T/ SPIE Symposium on Elec-tronic
Imaging, 1996.
[ 2] H. Baird. The skew angle of printed documents. In Proceed-ings
of Society of Photographic Scientists and Engineers,
volume 40, pages 21– 24, 1987.
[ 3] W. Barrett, L. Hutchison, D. Quass, H. Nielson, and D. Ken-nard.
Digital mountain: From granite archive to global ac-cess.
In Proceedings, Document Image Analysis for Li-braries.
IEEE, 2004.
[ 4] L. G. Brown. A survey of image registration techniques.
ACM Computing Surveys, 24( 4): 325– 376, December 1992.
[ 5] Y. Cao, S. Wang, and H. Li. Skew detection and correc-tion
in document images based on straight- line fitting. PRL,
24( 12): 1871– 1879, August 2003.
[ 6] D. Casasent and D. Psaltis. Position, rotation, and scale-invariant
optical correlation. Applied Optics, 15: 1793– 1799,
1976.
[ 7] E. D. Castro and C. Morandi. Registration of translated
and rotated images using finite fourier transforms. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
9( 5): 700– 703, September 1987.
[ 8] S. Chandran, S. Balasubramanian, T. Gandhi, A. Prasad, and
R. Kasturi. Structure recognition and information extraction
from tabular documents. International Journal of Imaging
Systems and Technology, 7: 289– 303, 1996.
[ 9] S. Chandran and R. Kasturi. Structural recognition of tabu-lated
data. In Proceedings, International Conference on Doc-ument
Analysis and Recognition, pages 516– 519, 1993.
[ 10] Q. Chen, M. Defrise, and F. Deconinck. Symmetric phase-only
matched filtering of Fourier- Mellin transforms for im-age
registration and recognition. IEEE Trans. Pattern Analy-sis
and Machine Intelligence, 16( 12): 1156– 1168, December
1994.
[ 11] N. Devillard. Fast median search: an ANSI C implementa-tion.
http:// ndevilla. free. fr/ median/ .
[ 12] D. Doermann. The indexing and retrieval of document im-ages:
A survey. Computer Vision and Image Understanding:
CVIU, 70( 3): 287– 298, 1998.
[ 13] R. Duda and P. Hart. Use of the Hough transformation to
detect lines and curves in pictures. Communications of the
ACM, 15( 1): 11– 15, 1972.
[ 14] M. Garris and P. Grother. Generalized form registration us-ing
structure- based techniques. In NIST Internal Report 5726
and in Proceedings of the Fifth Annual Symposium on Doc-ument
Analysis and Information Retrieval, pages 321– 334,
April 1996.
[ 15] J. Gil and M. Werman. Computing 2- d min, median, and
max filters. PAMI, 15( 5): 504– 507, May 1993.
[ 16] S. Hinds, J. Fisher, and D. D. Amato. A document skew
detection method using runlength encoding and the Hough
transform. In Proceedings, International Conference on Pat-tern
Recognition, pages 464– 468, 1990.
[ 17] E. Kavallieratou, N. Fakotakis, and G. Kokkinakis. Skew an-gle
estimation for printed and handwritten documents using
the Wigner- Ville distribution. Image and Vision Computing,
20: 813– 824, 2002.
[ 18] O. Kia and D. Doermann. Structural compression for docu-ment
analysis. In Proceedings, International Conference on
Pattern Recognition, 1996.
[ 19] C. Kuglin and D. Hines. The phase correlation image align-ment
method. IEEE Conference on Cybernetics and Society,
pages 163– 165, 1975.
[ 20] D. X. Lee, G. Thoma, and H. Weschler. Automated page
orientation and skew angle detection for binary document
images. Pattern Recognition, 27( 10): 1325– 1344, October
1994.
[ 21] G. Nagy. Twenty years of document image analysis in pami.
IEEE Transactions on Pattern Analysis and Machine Intelli-gence,
22( 1): 3862, January 2000.
[ 22] L. Najman. Using mathematical morphology for document
skew estimation. In In Document Recognition and Retrieval
XI, part of the IS& T/ SPIE Symposium on Electronic Imag-ing,
January 2004.
[ 23] W. Niblack. Englewood Cliffs, N. J.: Prentice Hall, 1986.
[ 24] H. E. Nielson andW. A. Barrett. Consensus- based table form
recognition. Proceedings, Seventh International Conference
on Document Analysis and Recognition, II: 906– 910, August
2003.
[ 25] O. Okun, M. Pietikainen, and J. J. Sauvola. Document skew
estimation without angle range restriction. International
Journal of Document Analysis and Recognition, 2( 2- 3): 132–
144, 1999.
[ 26] N. Otsu. A threshold selection method from grey- level his-tograms.
IEEE Transactions on Systems, Man, and Cyber-netics,
( 1): 62– 66, 1979.
[ 27] S. Perantonis, B. Gatos, and N. Papamarkos. Block decom-position
and segmentation for fast Hough transform evalua-tion.
Pattern Recognition, 32( 5): 811– 824, 1999.
[ 28] R. Plamondon and S. Srihari. On- line and off- line handwrit-ing
recognition: A comprehensive survey. IEEE Transac-tions
on Pattern Analysis and Machine Intelligence, 22( 1),
January 2000.
[ 29] W. Postl. Detection of linear oblique structures and skew
scan in digitized documents. In Proceedings, 8th Interna-tional
Conference on Pattern Recognition, pages 687– 689,
1986.
[ 30] W. Postl. Method for automatic correction of charater skew
in the acquisition of a text original in the form of digital scan
results. 1988.
[ 31] W. Pratt. Digital Image Processing. John Wiley & Sons,
New York, 1978.
[ 32] B. Reddy and B. Chatterji. An FFT- based technique for
translation, rotation, and scale- invariant image registration.
IEEE Transactions on Pattern Analysis and Machine Intelli-gence,
5( 8): 1266– 1270, August 1996.
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE
[ 33] J. Sauvola and M. Pietik ¨ ainen. Skew angle detection using
texture direction analysis. In Proceedings, 9th Scandinavian
Conference on Image Analysis, Uppsala, Sweden, 1995.
[ 34] D. Si. Skew and slant correction for document images us-ing
gradient direction. In 4th International Conference on
Document Analysis and Recognition, pages 142– 146, Au-gust
1997.
[ 35] T. Steinherz, N. Intrator, and E. Rivlin. Skew detection via
principal components analysis. International Conference on
Document Analysis and Recognition, January 1999.
[ 36] T. Steinherz, E. Rivlin, and N. Intrator. Offline cursive script
word recognition - a survey. International Journal of Docu-ment
Analysis and Recognition, 2( 2- 3): 90– 110, 1999.
[ 37] H. Stone, B. Tao, and M. McGuire. Analysis of image regis-tration
noise due to rotationally dependent aliasing. JVCIR,
14( 2): 114– 135, June 2003.
[ 38] Y. Tang, J. Liu, B. F. Li, and D. Xi. Multiresolution analy-sis
in extraction of reference lines from documents with gray
level background. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 19( 8), August 1997.
[ 39] A. Vinciarelli. A survey of off- line cursive script recogni-tion.
IDIAP Research report 00- 43, 2000.
[ 40] G. Wolberg and S. Zokai. Image registration for perspec-tive
deformation recovery. In SPIE Conf. on Automatic Tar-get
Recognition X, Orlando, Florida, April 2000.
[ 41] G. Wolberg and S. Zokai. Robust image registration using
log- polar transform. In Proc. IEEE Intl. Conf. on Image Pro-cessing,
Vancouver, Canada, September 2000.
[ 42] www. familysearch. org. Online access to fully- indexed cen-sus
records, currently including the 1880 United States, 1881
British Isles, and the 1881 Canadian Census.
http:// www. familysearch. org/ Eng/ Search/
frameset search. asp? PAGE= census/
search census. asp .
[ 43] www. lds. org. News release:
“ Facts About the 1880 U. S. Census”.
http:// www. lds. org/ newsroom/ showpackage/
0,15367,3881- 1--- 4- 645,00. html .
[ 44] D. Xi and S. Lee. Table structure extraction from form docu-ments
based on gradient- wavelet scheme. In Document Anal-ysis
Systems: Theory and Practice : Third IAPR Workshop,
1998.
[ 45] Z. Zhang and R. S. Blum. A hybrid image registration tech-nique
for a digital camera image fusion application. Infor-mation
Fusion, 2( 2): 135– 149, 2001.
[ 46] B. Zitova and J. Flusser. Image registration methods: a sur-vey.
IVC, 21( 11): 977– 1000, October 2003.
Proceedings of the First International Workshop on Document Image Analysis for Libraries ( DIAL’ 04)
0- 7695- 2088- X/ 04 $ 20.00 © 2004 IEEE