📖 Glossary · AI and automation

OCR

Optical Character Recognition

Technology for recognising text from images or scanned documents — converts pixel data into text that can be further processed.

What is OCR?

OCR (Optical Character Recognition) is a technology for automatically recognising text from images — scanned documents, photographs, PDF files (without a machine-readable text layer). OCR analyses pixels, identifies individual characters, and assembles them into text that can be further processed, searched, copied, or sent to other systems.

Distinctions:

Classic OCR — character recognition (Tesseract, ABBYY)
IDP (Intelligent Document Processing) — OCR + AI layout understanding, also understands document structure (tables, headers, footers)
ICR (Intelligent Character Recognition) — recognition of handwritten text

Modern deep learning-based OCR achieves 98–99% accuracy on good-quality printed text. Problems arise with: skewed scans, stains, illegible stamps, tables without clear lines — this is where AI/RAG combination helps.

In B2B companies, OCR is most useful for:

Incoming PDF invoices — extracting company registration number, amount, VAT, due date
Contracts — full-text search across archives
Expense receipts — reading paper receipts

When it is used

OCR is the entry point to virtually every AI document automation flow. Without OCR, an AI model could not process a PDF invoice or a scanned contract.

See the Document Extraction module and the Files module.

Document Extraction — the end-to-end process of OCR + AI extraction. See /en/glossary/vytazovanie-dokladov-pojem.
e-Invoice — the alternative where OCR is not needed. See /en/glossary/e-invoice.
RAG — after OCR, documents are typically indexed into a RAG system. See /en/glossary/rag.

In Modulario

The Document Extraction module in Modulario combines OCR with AI extraction — a PDF invoice arrives by email, the system OCRs it, an AI model extracts all relevant fields, and automatically creates a received invoice record in Accounting.

Modulario uses a layered architecture: for machine-readable PDFs text is extracted directly; for scanned documents or layered PDFs an OCR engine trained on various European character sets with diacritics is used. Extraction accuracy for common invoice fields (company registration number, amount, date) in Modulario is around 98%, with ambiguous cases flagged for manual review.

Related terms

Vyťažovanie dokladov

Automated reading of invoices, orders, delivery notes and other documents using OCR and AI — extracting data without manual re-keying.

RAG

A technique that extends an LLM with dynamic search across company documents — the answer is generated by combining retrieved context with a generative model.

AI Agent

A software system built on an LLM that autonomously resolves tasks — planning steps, using tools and calling APIs to achieve a given goal.

e-Invoice

A structured electronic invoice in XML/UBL format that can be processed automatically without manual re-keying.

P2P

End-to-end process from raising a purchase requisition, through the purchase order, delivery and invoice receipt, to payment to the supplier.

Related Modulario modules

vytazovanie-dokladov subory fakturacia

Implementing OCR in your company?

Modulario covers most B2B processes modularly — deploy only what you need now and grow gradually. Book a free consultation.

Dávid Bělousov

Sales Director

+421 902 826 802 sales@amcef.com

Book a consultation

Back to glossary