Skip to main content
Doclo

Our Technology

Open source. Any model. Fully yours to own.

MIT-licensed extraction engine, works with any AI provider through OpenRouter, and built to run on your infrastructure. Three products that cover the full document intelligence lifecycle. Public repository coming soon.

Works with any AI model. No vendor lock-in.

OpenAIAnthropicGoogleMistralxAIReductoDatalabOpenAIAnthropicGoogleMistralxAIReductoDatalab

Open Source Engine

Unified document extraction, any provider, any model.

Self-hosted, MIT-licensed. Normalize multiple extraction providers behind one API. Vision-model first, deploy anywhere.

One consistent interface for any document extraction provider. Define your extraction schema once (JSON-based field definitions with types, validation rules, and output formats), and run it against any provider. Swap models without changing application code.

Not an OCR wrapper. The engine sends full page images to vision models for layout-aware understanding: multi-page documents, merged table cells, handwritten fields, and mixed-format pages. Plug in OpenAI, Anthropic, Google, open-source models, or your own fine-tuned models. We can help you host and fine-tune models tailored to your documents.

Fully open source. Audit, extend, contribute. Run on AWS, GCP, Azure, bare metal, or air-gapped environments. Structured output includes field-level confidence scores, bounding box coordinates, and extraction metadata for downstream validation.

PDF
IMG
XLSX
DOCX
Email
Providers2 active
OpenAI
Anthropic
Google
Mistral
PipelineMIT Licensed
Parse
Reducto98.2%
Extract
GPT-5.499.1%
Classify
Opus-4.697.8%
Validate
Rules100%
JSON
CSV
Webhook
S3

Synthetic Documents

Source-truth documents for evals and training.

Generate realistic document variants to benchmark accuracy, train extraction models, and stress-test pipelines before they hit production.

Generate thousands of document variations with controlled degradation: variable fonts, scan noise, skew, resolution changes, and layout shifts. Output as PDF, TIFF, or image with pixel-perfect ground-truth labels for every field.

Compare provider performance on your specific document types with ground-truth labels you control.

Use synthetic data to augment training sets, reduce annotation costs, and validate pipeline reliability before production.

Templateseed_001
generate
InvoiceLow noise
InvoiceHigh noise
ReceiptMed noise
ContractLow noise
Eval Resultspass
3,847documents
142fields
97.8%coverage
4providers
GPT-5.4
92%
Opus-4.6
88%
Gemini
85%

Custom Platform

Not one-size-fits-all. Built around your business.

We deliver a customized solution, managed or self-hosted, designed around your processes, documents, and goals. Our platform and infrastructure give you a head start. Then we tailor every detail to how your team actually works.

We take our platform and shape it around your document types, approval chains, compliance rules, and team structure. No generic tooling. A solution designed for your business.

Intelligent extraction, classification, and validation powered by the best AI models for your documents. Accuracy improves over time as the system learns your data.

Run on your cloud, on-premise, or let us manage everything. Full custom branding with your own domains and styling. Your data stays under your control.

Your Platform
24.8k
processed+12%
847
today+8%
99.2%
accuracy+0.3%
Recent

Our guiding principles

Every product decision is guided by principles designed to ship reliable document intelligence.

doclo-sdkMIT
1
2
3
4
5
Audit it. Extend it. Own it.

Open by default

Our core engine is MIT-licensed. Audit, extend, and own it. No black boxes, no lock-in.

Deploy
AWSus-east-1
GCPeu-west-1
On-premair-gapped
SOC 2HIPAA

Infrastructure-first

Deploy on your cloud, your hardware, or air-gapped environments. Documents never leave your control.

Provider
active:OpenAI
model:gpt-5.4
OpenAI
Anthropic
Google
Mistral

Provider-agnostic

Use the best model for each task. Swap providers without changing code. No single point of failure.

Capabilities

Everything you need to process documents at scale

Four core capabilities that work together across all three products to power your workflows.

Extraction

Pull structured data from any visual source. Handwritten forms, PDFs with merged table cells, scanned blueprints, multi-page documents, and photos. Field-level confidence scores with bounding box coordinates for every extracted value.

SourcePDF
Fields5/5
invoice_no
INV-0847
vendor
Acme Corp
amount
$12,450
due_date
2025-06-15
status
Verified

Classification

Detect document type, assess quality, and determine next steps. Route to the right team or workflow automatically.

Multi-provider AI

Use OpenAI, Anthropic, Google, Mistral, or your own models. Swap anytime without code changes.

REST API

Connect via REST endpoints configured through Doclo's visual schema editor. Define extraction fields, set validation rules, and get full programmatic control over your document workflows.

Deployment

From repo to production

Our team handles deployment, integration with your existing systems, and ongoing optimization. You get a working solution, not a repository to figure out alone.

01

Assess

We audit your document workflows and identify the highest-impact automation opportunities.

02

Implement

We configure and deploy the stack for your specific documents and systems.

03

Optimize

We monitor, tune, and expand as your needs evolve. Ongoing partnership.

See how we work

See it extract your documents.

Bring a sample document to a 30-minute call. We run it through our engine live and show you the structured output.

MIT

Licensed engine

Any

AI model

4–6 weeks

To production

Full

Data ownership