Endpoint
Authentication
API key for authentication. Your unique API key.
Path Parameters
UUID of the uploaded document. Obtained from the
/upload endpoint.Request Body
JSON Schema defining the structure of data to extract. Uses
named_entities format. Optional, but strongly recommended for structured output.Additional instructions for extraction. Optional but recommended for complex documents.Default:
"No additional instructions provided."For Basic Extraction only. Specify the AI model to use:
google-gemini-3-flash(6 credits/page) - Most accurategoogle-gemini-2.5-flash(4 credits/page) - Balancedqwen-3-vl(2 credits/page) - Fastest
For VLM Extraction only. Set to:
For Basic extraction: Set
vlm(10 credits/page) - Vision-based extraction for scanned docs
model.For Basic extraction: Set
model parameter instead.Confidence threshold (0-100) for automatic review flagging. Only applies to Advanced/VLM modes.Fields with confidence below this threshold are flagged for review if they’re marked as
required in the schema.Enable citation/source matching for extracted fields. Only available in Advanced mode; don’t set
model or extraction_mode when this is true.Use the higher OCR tier for Advanced mode parsing. This is ignored by Basic and VLM extraction.
Additional instructions for confidence scoring in Advanced mode.
Response
UUID of the processed document.
Extracted data matching your schema structure. Fields are ordered according to schema definition.
Whether this extraction requires human review.
true if any required fields have confidence below the review threshold.Metadata about fields needing review. Only present in Advanced/VLM modes.
Examples
Basic Extraction
Fast, single-model extraction for simple documents:Advanced Extraction
Multi-model validation with confidence scores:Extraction Mode Comparison
| Feature | Basic | VLM | Advanced |
|---|---|---|---|
| Credits/Page | 2-6 | 10 | 15 |
| Speed | Fastest | Fast | Moderate |
| Accuracy | Good | Very Good | Highest |
| Confidence Scores | No | Yes | Yes |
| Review Flagging | No | Yes | Yes |
| Citation Matching | No | No | Optional |
| Agentic OCR | No | No | Optional |
| Best For | Simple docs | Scanned images | Critical data |
| How to use | Set model param | Set extraction_mode="vlm" | Don’t set model or extraction_mode |
Schema Guidelines
Field Types
String Fields
String Fields
Number Fields
Number Fields
Array Fields
Array Fields
Nested Objects
Nested Objects
Best Practices
- Descriptive Field Names: Use clear, meaningful names (
invoice_datenotdate1) - Detailed Descriptions: Help the AI understand context and format
- Mark Critical Fields: Add to
requiredarray for automatic review - Consistent Naming: Use snake_case throughout your schema
Error Responses
402 Payment Required
403 Forbidden
500 Internal Server Error
Next Steps
Review Workflow
Handle documents that need review
Polling Pattern
Implement review polling for automation
List Extractions
Query extraction results