Adgent

What is

adgent does layout-following OCR: it produces text files (ASCII files) that mimic the original layout of the input document (columns, tables, etc.)

Those elements are not marked as such in the result document; the text is simply organized with spaces and line breaks to resemble the original as much as possible.

>> See some examples here. <<

How to use

Create an account, then upload files in either the 'OCR' or 'Text' sections.

Why

Most AI engines prefer text to any other format; but when the original layout goes beyond a unique, centered column, normal text export is often unreadable (for a human as well as for an LLM).

Trying to properly parse a document to identify the different zones and reconstruct its structure is extremely difficult, and devising an approach that would work in general, for any document, is even harder.

Adgent addresses this by outputting text that simply follows the visible aspect of the source document, without trying to understand any of its semantics or underlying structure.

It also makes proofing a lot easier, as word positions are similar in the source as in the output.

What is the 'text' menu entry?

The 'text' menu entry is for PDF files (or PDF forms) that contain text instead of images; for those, OCR is unnecessary — but layout-following text export can be useful.

Misc.

This is an early preview with the following limits:

Each new user account is credited automatically with 10,000 tokens
It's currently not possible to buy more tokens, as this is only a preview (email us for more info)
Current 'pricing' is:
- 8 tokens PER PAGE for OCR
- 2 tokens PER PAGE for text jobs
Each document has a limit of 10 pages for OCR, 100 pages for text (it's possible to upload documents with more pages, but they will not be processed beyond those limits)