Endpoint
POST /extract/{document_id}
Extract structured information from an uploaded document using a defined schema. Choose between Basic, VLM, or Advanced extraction modes based on your accuracy and speed requirements.
Authentication
API key for authentication. Your unique API key.
Path Parameters
UUID of the uploaded document. Obtained from the /upload endpoint.
Request Body
JSON Schema defining the structure of data to extract. Uses named_entities format. {
"named_entities" : {
"field_name" : {
"type" : "string|number|boolean|array|object" ,
"description" : "Field description for AI context"
}
},
"required" : [ "field1" , "field2" ]
}
Additional instructions for extraction. Optional but recommended for complex documents. Default: "No additional instructions provided."
For Basic Extraction only. Specify the AI model to use:
openai-gpt-4o (6 credits/page) - Most accurate
openai-gpt-4.1 (4 credits/page) - Balanced
google-gemini-2.0-flash (2 credits/page) - Fastest
If provided, uses Basic extraction mode (single model, no confidence scores).
For VLM Extraction only. Set to:
vlm (10 credits/page) - Vision-based extraction for scanned docs
For Advanced extraction (15 credits/page): Don’t set this parameter AND don’t set model.
For Basic extraction : Set model parameter instead.
Confidence threshold (0-100) for automatic review flagging. Only applies to Advanced/VLM modes. Fields with confidence below this threshold are flagged for review if they’re marked as required in the schema.
Response
UUID of the processed document.
Extracted data matching your schema structure. Fields are ordered according to schema definition.
Whether this extraction requires human review. true if any required fields have confidence below the review threshold.
Metadata about fields needing review. Only present in Advanced/VLM modes. Confidence scores (0-100) for each extracted field. Calculated as:
0.4 × lexical_similarity + 0.6 × semantic_similarity
Boolean flags indicating which fields need review.
Examples
Fast, single-model extraction for simple documents:
curl -X POST https://api.documind.com/api/v1/extract/550e8400-e29b-41d4-a716-446655440000 \
-H 'X-API-Key: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"schema": {
"named_entities": {
"invoice_number": {
"type": "string",
"description": "The invoice number"
},
"total_amount": {
"type": "number",
"description": "Total invoice amount"
}
},
"required": ["invoice_number"]
},
"prompt": "Extract invoice details",
"model": "openai-gpt-4.1"
}'
Basic Extraction Response
{
"document_id" : "550e8400-e29b-41d4-a716-446655440000" ,
"results" : {
"invoice_number" : "INV-2024-001" ,
"total_amount" : 1250.00 ,
"vendor_name" : "Acme Corporation"
},
"needs_review" : false ,
"needs_review_metadata" : {}
}
Multi-model validation with confidence scores:
advanced_extract = {
"schema" : {
"named_entities" : {
"invoice_number" : {
"type" : "string" ,
"description" : "The invoice number"
},
"line_items" : {
"type" : "array" ,
"description" : "Invoice line items" ,
"items" : {
"type" : "object" ,
"named_entities" : {
"description" : {
"type" : "string" ,
"description" : "Item description"
},
"amount" : {
"type" : "number" ,
"description" : "Line total"
}
}
}
}
},
"required" : [ "invoice_number" , "line_items" ]
},
"prompt" : "Extract all invoice details with high accuracy" ,
# Advanced mode: don't set 'model' or 'extraction_mode' - 15 credits per page
"review_threshold" : 85
}
response = requests.post(
f "https://api.documind.com/api/v1/extract/ { document_id } " ,
headers = {
"X-API-Key" : API_KEY ,
"Content-Type" : "application/json"
},
json = advanced_extract
)
result = response.json()
# Check if review is needed
if result[ "needs_review" ]:
print ( "⚠️ Some fields need review:" )
for field, needs_review in result[ "needs_review_metadata" ][ "review_flags" ].items():
if needs_review:
confidence = result[ "needs_review_metadata" ][ "confidence_scores" ][field]
print ( f " - { field } : { confidence :.1f} % confidence" )
Advanced Extraction Response
{
"document_id" : "550e8400-e29b-41d4-a716-446655440000" ,
"results" : {
"invoice_number" : "INV-2024-001" ,
"line_items" : [
{
"description" : "Professional Services" ,
"amount" : 1000.00
},
{
"description" : "Software License" ,
"amount" : 250.00
}
]
},
"needs_review" : true ,
"needs_review_metadata" : {
"confidence_scores" : {
"invoice_number" : 95.2 ,
"line_items" : {
"0" : {
"description" : 88.5 ,
"amount" : 92.1
},
"1" : {
"description" : 72.3 ,
"amount" : 95.8
}
}
},
"review_flags" : {
"invoice_number" : false ,
"line_items" : {
"0" : {
"description" : false ,
"amount" : false
},
"1" : {
"description" : true ,
"amount" : false
}
}
}
}
}
Feature Basic VLM Advanced Credits/Page 2-6 10 15 Speed Fastest Fast Moderate Accuracy Good Very Good Highest Confidence Scores No Yes Yes Review Flagging No Yes Yes Best For Simple docs Scanned images Critical data How to use Set model param Set extraction_mode="vlm" Don’t set model or extraction_mode
Schema Guidelines
Field Types
"customer_name" : {
"type" : "string" ,
"description" : "Full name of the customer"
}
Use for text data: names, addresses, identifiers.
"total_amount" : {
"type" : "number" ,
"description" : "Total invoice amount in USD"
}
For numeric values: amounts, quantities, percentages.
"line_items" : {
"type" : "array" ,
"description" : "List of invoice line items" ,
"items" : {
"type" : "object" ,
"named_entities" : {
"description" : { "type" : "string" },
"quantity" : { "type" : "number" }
}
}
}
For repeating data: tables, lists, multiple entries.
"billing_address" : {
"type" : "object" ,
"description" : "Customer billing address" ,
"named_entities" : {
"street" : { "type" : "string" },
"city" : { "type" : "string" },
"zip" : { "type" : "string" }
}
}
For structured data groups.
Best Practices
Descriptive Field Names : Use clear, meaningful names (invoice_date not date1)
Detailed Descriptions : Help the AI understand context and format
Mark Critical Fields : Add to required array for automatic review
Consistent Naming : Use snake_case throughout your schema
Error Responses
402 Payment Required
{
"detail" : "Insufficient credits. Please upgrade your plan or wait for your daily credits to refresh."
}
Check your credit balance before processing large batches.
403 Forbidden
{
"detail" : "You don't have access to this document"
}
Document belongs to another user or organization.
500 Internal Server Error
{
"detail" : "Failed to extract information. Please contact support."
}
Extraction processing failed. Retry or contact support if it persists.
Next Steps