OCR and ABC’s

By pre-K, Leila had mastered the ABC’s and recognized that an open ladder looks like an A and that a snake is curved like an S. She knew the letters. But she could not yet read.  She could not yet interpret what those letters strung together represent on a page.

Optical Character Recognition (OCR) is the ability of a machine to recognize that a set of black dots is a character, which the machine can then convert into machine-encoded text.  OCR takes documents and turns them into bit and bytes of computer code.  But OCR cannot interpret that code, just as Leila could not read although she knew all the letters.

Itemize Receipt OCR and Data Extraction adds interpretation to OCR.

By first grade, Leila was reading Bob Books and was excited when she could interpret phrases like “Sam ran to Mat.” Rhymes and sight words came quickly after.  Certain patterns and groups of words repeat; kids memorize strings and sight words to get to meaning faster.  They also use the pictures on the page to attribute meaning to words and rule out others.  The cute animal with big ears eating cheese is a mouse not a moose.

OCR vendors use libraries of fonts and ‘pattern matching,’ which, like sight words, read faster.  OCR post-processing uses a lexicon to bind the context and meaning of a specific document.  For example, in legal documents, certain words are more expected than others. The results are complete documents available in machine-encoded text.

Itemize Receipt OCR and Data Extraction adds context and understanding specifically to payments documents, invoices, receipts and folios.  This context includes, for example, understanding taxes vs. VAT and grand total vs. subtotals.

Now Leila is in high school reading The Curious Incident of the Dog in the Night-Time and Hamlet, understanding the nuances of character, perspective, and voice.  She is using real intelligence to process language.  She inherently knows to apply different context for Shakespeare than Haddon.

Itemize Receipt OCR and Data Extraction is trained to understand the context of receipts, invoices, folios, and payment documents.  Using artificial intelligence, Itemize’s engine understands the difference between a 9 next to the word total, a 9 in 9/15/2017, and a 9 next to the word West. 

If you are looking for a solution to identify the ABC’s, many standard OCR vendors will suffice.  If, however, you are looking for a partner to read and interpret the financial values, vendors, and categories in receipts and payments documents, then leave kindergarten behind and step up to Itemize.