Streak2O: Data Augmentation for Handwritten Text Recognition in Neural Networks

Date

Publisher

Polytechnic University of Puerto Rico

Item Type

Article
  • Total Views Total Views4
  • Total Downloads Total Downloads2

Abstract

Streak2O is a machine learning data augmentation algorithm based on the combination of two other independent algorithms: Streak and Droplet. These three augmentations are implemented as non-trainable TensorFlow custom Keras layers to optimize execution time in a GPU based environment. They generate configurable random artifacts that imitate real life handwritten historical document or manuscript water damage and document mishandling. Testing this augmentation algorithm with small subsets of the NIST-SD19 dataset on a convolutional neural network architecture shows that they can help reduce neural network overfitting falling partially into the category of synthetic data generation. Key Terms ⎯ Handwritten Text Recognition, Machine Learning, Synthetic Data Augmentation, TensorFlow.

Description

Design Project Article for the Graduate Programs at Polytechnic University of Puerto Rico

Keywords

Citation

Beltran Feliciano, E. J. (2021). Streak2O: Data Augmentation for Handwritten Text Recognition in Neural Networks [Unpublished manuscript]. Graduate School, Polytechnic University of Puerto Rico.