Streak2O: Data Augmentation for Handwritten Text Recognition in Neural Networks
Date
Authors
Advisor
Publisher
Polytechnic University of Puerto Rico
Item Type
Article
- Total Views Total Views4
- Total Downloads Total Downloads2
Abstract
Streak2O is a machine learning data augmentation algorithm based on the combination of two other independent algorithms: Streak and Droplet. These three augmentations are implemented as non-trainable TensorFlow custom Keras layers to optimize execution time in a GPU based environment. They generate configurable random artifacts that imitate real life handwritten historical document or manuscript water damage and document mishandling. Testing this augmentation algorithm with small subsets of the NIST-SD19 dataset on a convolutional neural network architecture shows that they can help reduce neural network overfitting falling partially into the category of synthetic data generation.
Key Terms ⎯ Handwritten Text Recognition, Machine Learning, Synthetic Data Augmentation, TensorFlow.
Description
Design Project Article for the Graduate Programs at Polytechnic University of Puerto Rico
Keywords
Citation
Beltran Feliciano, E. J. (2021). Streak2O: Data Augmentation for Handwritten Text Recognition in Neural Networks [Unpublished manuscript]. Graduate School, Polytechnic University of Puerto Rico.