Pattern Classification Techniques Applied to Character Segmentation

Robert W. Smith, James F. McNamara and David S. Bradburn*

*email: dave@alumni.caltech.edu

Release Notes
This work originally appeared in the Proceedings of the 5th USPS Advanced Technology Conference, Washington, D.C. Copyright is retained by the authors. This work may be freely copied for non-commercial purposes provided this copyright notice is included.

Abstract

Character segmentation is a necessary preprocessing step for character recognition. Correct segmentation of handwritten addresses requires solutions to several problems, including the splitting of run-together or crossing characters and the merging of fragmented characters. Each case entails a decision based on information derivable from the mail piece image.

In existing systems, character segmentation decisions are usually made by a heuristically derived rule base. Here, we apply well known pattern recognition techniques so that decision boundaries may be optimized by training over a large data set. Some typical problems are described: character splitting, character merging and word-level separation. Training and test data sets of handprint ZIP codes are compiled, and easily measurable properties of the data are used to train Bayesian and neural network classifiers. We compare the results of the classification methods and results of segmentation guided by conventional heuristically derived rules.


Full Text of this Paper Back to: Pattern Analysis Page Other publications by this team