|
OCR (Optical Character
Recognition) is a technique to recognize characters based on
the pixels order. Every image is set of ordered pixels
(picture elements). Similarly, each character on image
is a set of ordered pixels. Each pixel has a color number to
display that color. Each character (alphabet, number, etc.)
is combination of ordered pixels.
|
|
There are two types of PDF Converters:
|
|
OCR-Not-enabled
|
|
OCR-enabled
|
|
All
GIRDAC PDF Converters except PDF to Word Converter
belong to the second category.
Some PDF documents have text on images (scanned PDF files).
GIRDAC PDF Converter Ultimate extracts such text as
formatted text through OCR (Optical Character Recognition)
Layout option. Text on image may be in black and white,
grayscale or color. Extracted text is in black color.
Accuracy depends on image quality, font, font size, special
characters and symbols. It does not pick images and shapes
in PDF file. It currently works with English language text.
|
|
|
|
OCR software converts hand-written or typewritten text documents
into machine editable text formats. Earlier versions of OCR are
trained to translate specific fonts. The current OCRs are intelligent
enough to recognize most of the fonts with high accuracy.
Some OCRs can converts the image into a formatted version same
as the original image. OCR uses algorithms to recognize characters
and Neural Networks to increase the accuracy.
|
|
There are two methods employed in OCR software.
|
|
Matrix matching
|
|
Feature extraction
|
|
Matrix matching is simpler than Feature extraction.
Matrix Matching compares each character with a library
of character matrices. When an image matches one of
the matrices of pixels, it labels that image as
the corresponding character.
|
|
Feature Extraction uses artificial intelligence to analyze
features such as closed shapes, diagonal lines, line
intersections, etc. This method is flexible and it is employed
in both type-written and hand-written documents.
|
|
Go to: What is PDF Converter?
|
|
Go to: What is PostScript?
|
|