Character segmentation is a necessary preprocessing step for character recognition. Correct segmentation of handwritten addresses requires solutions to several problems, including the splitting of run-together or crossing characters and the merging of fragmented characters. Each case entails a decision based on information derivable from the mail piece image.
In existing systems, character segmentation decisions are usually made by a heuristically derived rule base. Here, we apply well known pattern recognition techniques so that decision boundaries may be optimized by training over a large data set. Some typical problems are described: character splitting, character merging and word-level separation. Training and test data sets of handprint ZIP codes are compiled, and easily measurable properties of the data are used to train Bayesian and neural network classifiers. We compare the results of the classification methods and results of segmentation guided by conventional heuristically derived rules.
|