OCR
Optical Character Recognition
Technology for recognising text from images or scanned documents — converts pixel data into text that can be further processed.
What is OCR?
OCR (Optical Character Recognition) is a technology for automatically recognising text from images — scanned documents, photographs, PDF files (without a machine-readable text layer). OCR analyses pixels, identifies individual characters, and assembles them into text that can be further processed, searched, copied, or sent to other systems.
Distinctions:
- Classic OCR — character recognition (Tesseract, ABBYY)
- IDP (Intelligent Document Processing) — OCR + AI layout understanding, also understands document structure (tables, headers, footers)
- ICR (Intelligent Character Recognition) — recognition of handwritten text
Modern deep learning-based OCR achieves 98–99% accuracy on good-quality printed text. Problems arise with: skewed scans, stains, illegible stamps, tables without clear lines — this is where AI/RAG combination helps.
In B2B companies, OCR is most useful for:
- Incoming PDF invoices — extracting company registration number, amount, VAT, due date
- Contracts — full-text search across archives
- Expense receipts — reading paper receipts
When it is used
OCR is the entry point to virtually every AI document automation flow. Without OCR, an AI model could not process a PDF invoice or a scanned contract.
See the Document Extraction module and the Files module.
Related terms
- Document Extraction — the end-to-end process of OCR + AI extraction. See /en/glossary/vytazovanie-dokladov-pojem.
- e-Invoice — the alternative where OCR is not needed. See /en/glossary/e-invoice.
- RAG — after OCR, documents are typically indexed into a RAG system. See /en/glossary/rag.
In Modulario
The Document Extraction module in Modulario combines OCR with AI extraction — a PDF invoice arrives by email, the system OCRs it, an AI model extracts all relevant fields, and automatically creates a received invoice record in Accounting.
Modulario uses a layered architecture: for machine-readable PDFs text is extracted directly; for scanned documents or layered PDFs an OCR engine trained on various European character sets with diacritics is used. Extraction accuracy for common invoice fields (company registration number, amount, date) in Modulario is around 98%, with ambiguous cases flagged for manual review.
Related terms
Vyťažovanie dokladov
Automated reading of invoices, orders, delivery notes and other documents using OCR and AI — extracting data without manual re-keying.
RAG
A technique that extends an LLM with dynamic search across company documents — the answer is generated by combining retrieved context with a generative model.
AI Agent
A software system built on an LLM that autonomously resolves tasks — planning steps, using tools and calling APIs to achieve a given goal.
e-Invoice
A structured electronic invoice in XML/UBL format that can be processed automatically without manual re-keying.
P2P
End-to-end process from raising a purchase requisition, through the purchase order, delivery and invoice receipt, to payment to the supplier.
Related Modulario modules
Implementing OCR in your company?
Modulario covers most B2B processes modularly — deploy only what you need now and grow gradually. Book a free consultation.
Book a consultation