Documentation Index Fetch the complete documentation index at: https://docs.documind.cloud/llms.txt
Use this file to discover all available pages before exploring further.
Understanding the full extraction lifecycle helps you build robust automation pipelines. Here’s how documents flow through the system:
Phase 1: Document Upload
Upload Files
Submit documents via POST /upload. Supports batching up to 100 files. import requests
response = requests.post(
"https://api.documind.com/api/v1/upload" ,
headers = { "X-API-Key" : "your_api_key" },
files = [
( "files" , open ( "invoice1.pdf" , "rb" )),
( "files" , open ( "invoice2.pdf" , "rb" ))
]
)
document_ids = response.json()
# Returns: ["uuid1", "uuid2"]
Credit Impact : No credits charged for upload
Storage : Documents stored for 30 days
Receive Document IDs
Store the returned UUIDs for extraction requests. # Map filenames to IDs for tracking
doc_mapping = {
"invoice1.pdf" : document_ids[ 0 ],
"invoice2.pdf" : document_ids[ 1 ]
}
Phase 2: Schema Definition
Choose one of three approaches:
Use the Documind UI to access built-in templates for common document types like invoices, receipts, forms, etc.Navigate to: Dashboard → Schemas → Templates
✓ Fastest setup
✓ Proven accuracy
✗ Not available via API (use UI or define custom schema) Define your own schema for unique documents: schema = {
"named_entities" : {
"policy_number" : {
"type" : "string" ,
"description" : "Insurance policy number"
},
"coverage_amount" : {
"type" : "number" ,
"description" : "Total coverage in USD"
}
},
"required" : [ "policy_number" ]
}
✓ Fully customizable
✓ Matches your exact needs
✗ Requires domain knowledge Auto-generate schema from a sample document or description: Option 1: Generate from Sample Document (Recommended)import requests
# Upload sample document
with open ( "sample_policy.pdf" , "rb" ) as f:
response = requests.post(
"https://api.documind.com/api/v1/upload" ,
headers = { "X-API-Key" : "your_api_key" },
files = { "files" : f}
)
sample_id = response.json()[ 0 ]
# Generate schema from sample
response = requests.post(
f "https://api.documind.com/api/v1/schema/ { sample_id } " ,
headers = { "X-API-Key" : "your_api_key" }
)
schema = response.json()[ "schema" ]
Option 2: Generate from Description response = requests.post(
"https://api.documind.com/api/v1/schema/generate-dynamic-schema/" ,
headers = { "X-API-Key" : "your_api_key" , "Content-Type" : "application/json" },
json = {
"schema_name" : "insurance_policy" ,
"schema_description" : "Extract policy number, coverage amount, policyholder name, and effective dates"
}
)
schema = response.json()
Both methods are available in the UI (Dashboard → Schemas → Generate) and via API
✓ Quick start
✓ AI-generated from your documents
✗ May need refinement
Configure extraction mode based on requirements:
Mode Selection Decision Tree
Start
├─ Need confidence scores? ──No──> Basic Extraction
│ (2-6 credits/page)
│
└─ Yes
├─ Scanned/Image documents? ──Yes──> VLM Extraction
│ (10 credits/page)
│
└─ No (Native PDF/Text)
└─ Critical accuracy needed? ──Yes──> Advanced Extraction
(15 credits/page)
import requests
response = requests.post(
f "https://api.documind.com/api/v1/extract/ { document_id } " ,
headers = { "X-API-Key" : "your_api_key" , "Content-Type" : "application/json" },
json = {
"schema" : schema,
"model" : "openai-gpt-4.1" , # or "google-gemini-2.0-flash"
"prompt" : "Extract all invoice fields accurately"
}
)
result = response.json()
# No confidence scores, no review flagging
if result[ "needs_review" ]: # Always False for Basic mode
pass
else :
process_data(result[ "results" ])
import requests
response = requests.post(
f "https://api.documind.com/api/v1/extract/ { document_id } " ,
headers = { "X-API-Key" : "your_api_key" , "Content-Type" : "application/json" },
json = {
"schema" : schema,
# Advanced mode - no model or extraction_mode specified
"review_threshold" : 85 ,
"prompt" : "Extract invoice with high accuracy"
}
)
result = response.json()
# Includes confidence scores
if result[ "needs_review" ]:
# Some required fields below threshold
print ( "⚠️ Needs human review" )
# Proceed to Phase 4
else :
# All required fields above threshold
process_data(result[ "results" ])
Credit Usage : Credits deducted per page × model cost
Phase 4: Review Workflow
Only triggered when needs_review = true:
Identify Flagged Fields
Parse the metadata to find low-confidence fields: for field, flag in result[ "needs_review_metadata" ][ "review_flags" ].items():
if flag:
confidence = result[ "needs_review_metadata" ][ "confidence_scores" ][field]
print ( f "⚠️ { field } : { confidence } % confidence" )
Notify Review Team
Human review happens in the Documind UI . Direct your review team to:Dashboard → Review Queue They can see all pending reviews, view extraction confidence scores, and correct/approve results.
Optionally, send notifications via your own system: # Your custom notification logic
send_email(
to = "reviewers@company.com" ,
subject = f "Review Needed: { filename } " ,
body = f "Document { document_id } needs review at https://app.documind.com/review"
)
Poll for Completion
Implement polling to detect when is_reviewed = true: import requests
import time
def poll_for_review ( document_id , poll_interval = 10 , timeout = 600 ):
start_time = time.time()
while (time.time() - start_time) < timeout:
response = requests.get(
"https://api.documind.com/api/v1/data/extractions" ,
headers = { "X-API-Key" : "your_api_key" },
params = { "document_id" : document_id, "limit" : 1 }
)
data = response.json()
if data[ "items" ] and data[ "items" ][ 0 ][ "is_reviewed" ]:
return data[ "items" ][ 0 ][ "reviewed_results" ]
time.sleep(poll_interval)
return None
reviewed_data = poll_for_review(document_id)
if reviewed_data:
process_data(reviewed_data)
else :
handle_timeout(document_id)
See Polling Pattern for details.
Use Reviewed Results
Once is_reviewed = true, use reviewed_results instead of results: import requests
response = requests.get(
"https://api.documind.com/api/v1/data/extractions" ,
headers = { "X-API-Key" : "your_api_key" },
params = { "document_id" : document_id, "limit" : 1 }
)
extraction = response.json()[ "items" ][ 0 ]
if extraction[ "is_reviewed" ]:
# Use human-corrected data
data = extraction[ "reviewed_results" ]
else :
# Use original AI extraction
data = extraction[ "results" ]
Phase 5: Data Processing
Process the final data in your automation:
def process_invoice_data ( data ):
"""Process extracted/reviewed invoice data."""
# Validate required fields
assert "invoice_number" in data
assert "total_amount" in data
# Update your system
create_accounting_record(
invoice_number = data[ "invoice_number" ],
amount = data[ "total_amount" ],
vendor = data.get( "vendor_name" ),
line_items = data.get( "line_items" , [])
)
# Archive original document
archive_document(data[ "document_id" ])
return True
Complete Example
Here’s a full workflow implementation with real API calls:
import requests
import time
from typing import Dict, Optional
class DocumindWorkflow :
"""Complete extraction workflow manager using Documind API."""
def __init__ ( self , api_key : str ):
self .api_key = api_key
self .base_url = "https://api.documind.com/api/v1"
self .headers = { "X-API-Key" : api_key}
def process_document (
self ,
file_path : str ,
schema : Dict,
mode : str = "advanced"
) -> Dict:
"""
Complete workflow: upload → extract → review → process
Args:
file_path: Path to document file
schema: Extraction schema
mode: 'basic', 'vlm', or 'advanced'
Returns:
Final extracted data (original or reviewed)
"""
# Phase 1: Upload via API
print ( f "📤 Uploading { file_path } ..." )
with open (file_path, "rb" ) as f:
response = requests.post(
f " { self .base_url } /upload" ,
headers = self .headers,
files = { "files" : f}
)
response.raise_for_status()
doc_id = response.json()[ 0 ]
print ( f "✓ Uploaded: { doc_id } " )
# Phase 2 & 3: Extract via API
print ( f "🔍 Extracting data ( { mode } mode)..." )
config = { "schema" : schema, "prompt" : "Extract all data accurately" }
if mode == "basic" :
config[ "model" ] = "openai-gpt-4.1"
elif mode == "vlm" :
config[ "extraction_mode" ] = "vlm"
config[ "review_threshold" ] = 80
else : # advanced
config[ "review_threshold" ] = 85
response = requests.post(
f " { self .base_url } /extract/ { doc_id } " ,
headers = { ** self .headers, "Content-Type" : "application/json" },
json = config
)
response.raise_for_status()
result = response.json()
print ( f "✓ Extraction complete" )
# Phase 4: Handle review if needed
if result[ "needs_review" ]:
print ( f "⚠️ Document needs review - direct team to UI: https://app.documind.com/review" )
# Poll for review completion
print ( f "⏳ Waiting for human review..." )
start = time.time()
while (time.time() - start) < 600 : # 10 min timeout
response = requests.get(
f " { self .base_url } /data/extractions" ,
headers = self .headers,
params = { "document_id" : doc_id, "limit" : 1 }
)
data = response.json()
if data[ "items" ] and data[ "items" ][ 0 ][ "is_reviewed" ]:
print ( f "✓ Review completed" )
return data[ "items" ][ 0 ][ "reviewed_results" ]
time.sleep( 10 )
raise TimeoutError ( "Review not completed in time" )
else :
print ( f "✓ No review needed" )
return result[ "results" ]
# Usage
workflow = DocumindWorkflow( api_key = "your_api_key" )
schema = {
"named_entities" : {
"invoice_number" : { "type" : "string" },
"total_amount" : { "type" : "number" }
},
"required" : [ "invoice_number" , "total_amount" ]
}
# Process document through complete workflow
try :
data = workflow.process_document(
file_path = "invoice.pdf" ,
schema = schema,
mode = "advanced"
)
print ( f "✓ Final data: { data } " )
# Process your data here
except TimeoutError :
print ( "Review took too long, escalating..." )
except Exception as e:
print ( f "Error: { e } " )
Troubleshooting
Problem : 500 Internal Server Error on uploadSolutions :
Verify file is not corrupted
Check file size < 50MB
Ensure file format is supported
Retry with exponential backoff
All Extractions Need Review
Problem : Polling times out waiting for reviewSolutions :
Increase timeout to match your review SLA
Implement email notifications to reviewers
Check review queue isn’t backlogged
Consider async processing instead of blocking
Next Steps
Upload Documents Detailed upload endpoint documentation
Extract Data Complete extraction API reference
Polling Pattern Robust polling implementation guide
Automation Patterns Production-ready automation examples