Complete Extraction Workflow
Understanding the full extraction lifecycle helps you build robust automation pipelines. Here’s how documents flow through the system:Phase 1: Document Upload
1
Upload Files
Submit documents via Credit Impact: No credits charged for upload
Storage: Documents stored for 30 days
POST /upload. Supports batching up to 100 files.Storage: Documents stored for 30 days
2
Receive Document IDs
Store the returned UUIDs for extraction requests.
Phase 2: Schema Definition
Choose one of three approaches:- Predefined Schema (UI Only)
- Custom Schema
- Generated Schema
Use the Documind UI to access built-in templates for common document types like invoices, receipts, forms, etc.Navigate to: Dashboard → Schemas → Templates
✓ Proven accuracy
✗ Not available via API (use UI or define custom schema)
Phase 3: Data Extraction
Configure extraction mode based on requirements:Mode Selection Decision Tree
Basic Extraction Example
Advanced Extraction Example
Phase 4: Review Workflow
Only triggered whenneeds_review = true:
1
Identify Flagged Fields
Parse the metadata to find low-confidence fields:
2
Notify Review Team
Human review happens in the Documind UI. Direct your review team to:Dashboard → Review QueueThey can see all pending reviews, view extraction confidence scores, and correct/approve results.
3
Poll for Completion
4
Use Reviewed Results
Once
is_reviewed = true, use reviewed_results instead of results:Phase 5: Data Processing
Process the final data in your automation:Complete Example
Here’s a full workflow implementation with real API calls:Troubleshooting
Upload Fails
Upload Fails
Problem:
500 Internal Server Error on uploadSolutions:- Verify file is not corrupted
- Check file size < 50MB
- Ensure file format is supported
- Retry with exponential backoff
Extraction Timeout
Extraction Timeout
Problem: Extraction takes too long or times outSolutions:
- Switch to Basic mode for faster processing
- Reduce document page count
- Simplify schema (fewer fields)
- Contact support if issue persists
All Extractions Need Review
All Extractions Need Review
Problem: Review threshold too strictSolutions:
- Lower
review_thresholdfrom 85 to 75 - Mark fewer fields as
required - Improve schema descriptions
- Use Basic mode if reviews aren’t needed
Reviews Never Complete
Reviews Never Complete
Problem: Polling times out waiting for reviewSolutions:
- Increase timeout to match your review SLA
- Implement email notifications to reviewers
- Check review queue isn’t backlogged
- Consider async processing instead of blocking