|
PDF Converter usually converts PDF file into another file format,
such as Word, Excel, PowerPoint, Plain text, html, image, etc.
It should have clear understanding of PDF document structure as
well as target file format structure. For instance, a PDF to Word
Converter must know PDF objects and Word file structure. In fact,
there is no one to one mapping between PDF objects (text streams,
images, shapes, etc) and Word document elements. Therefore, PDF Converter
has to create compatible Word document elements for each PDF object.
This process is further complicated because of the different PDF
object attributes in different PDF versions.
|
|
PDF can be converted to various formats: doc, docx, xml, rtf,
xls, xlsx, .htm, etc. It can also be converted to many image
formats:
|
| AVS | JBIG | PGM | SUN |
| BMP Mono | JNG | PGM RAW | SVG |
| BMP Gray | JP2 | PGNM | TGA |
| BMP Sep1 | JPC | PGNM RAW | TIF Gray |
| BMP Sep8 | JPG | PNG Mono | TIF 12 bit RGB |
| BMP 4 bit | JPG Gray | PNG Gray | TIF 24 bit RGB |
| BMP 8 bit | MNG | PNG 4 bit | TIF 48 RGB |
| BMP 24 bit | MPEG | PNG 8 bit | TIF 32 bit CMYK |
| BMP 32 bit | M2V | PNG 24 bit | TIF 64 bit CMYK |
| CIN | MTV | PKSM | TIF G3Fax no RLE |
| CMYK | OTB | PKSM RAW | TIF G3Fax RLE |
| CMYKA | P7 | PKM | TIF 2DG3Fax |
| DCX | PALM | PKM RAW | TIF G4Fax |
| DIB | PAM | PNM | TIF LZW |
| DPX | PBM | PNM RAW | TIF PackBits |
| EMF | PBM RAW | PPM | TIF Sep |
| EPS 1 | PCD | PPM RAW | TIF Sep1 |
| EPS 1 Color | PCDS | PS 1 | UIL |
| EPS 2 | PCL | PS 1 Color | UYVY |
| FAX G3 | PCX Mono | PS2 | VICAR |
| FAX 2DG3 | PCX Gray | PSD CMYK | VIFF |
| FAX G4 | PCX 4 bit | PSD RGB | WBMP |
| FITS | PCX 8 bit | PTIF | XBM |
| GIF | PCX 24 bit | PXL Mono | XPM |
| GPLT | PCX CMYK | PXL Color | XWD |
| INFO | PDB | SGI | YCbCr |
|
|
|
There are various layout options available for PDF
conversion. Most used option is to convert PDF in
the same format with text, images, shapes etc.
Other options are formatted text, plain text, or
simply extracting images from PDF.
|
|
Some PDF documents have text on images. Scanned PDFs
usually results in text on image. Such text on
image can be extracted through OCR. Almost all
GIRDAC PDF Converters use OCR technology to extract
text and format from images.
|
PDF Converters do not convert documents
having the security setting:
Content Copying: Not Allowed or
Page Extraction: Not Allowed
One can see this information in Adobe Reader top-level menu
File -> Document Properties and clicking on
Security tab.
|
|
Go to: What is Word document?
|
|
Go to: What is OCR?
|
|