Endpoint
Retrieve a list of extractions with flexible filtering, sorting, and pagination. Essential for polling review status and managing extraction history.
Authentication
API key for authentication. Your unique API key.
Query Parameters
Filters
Filter by specific document UUID. Most efficient for single-document queries. ? document_id = 550e8400-e29b-41d4-a716-446655440000
Filter by extraction status. Options: completed, processing, failed, pending
Filter by review requirement. ? needs_review = true # Only extractions needing review
? needs_review = false # Only extractions not needing review
Filter by review completion status. ? is_reviewed = true # Only reviewed extractions
? is_reviewed = false # Not yet reviewed
Filter by exact filename match. ? original_filename = invoice-2024-001.pdf
Filter by creation timestamp (ISO 8601 format). ? created_after = 2024-01-15T00:00:00Z
Filter by creation timestamp (ISO 8601 format). ? created_before = 2024-01-31T23:59:59Z
Filter by organization UUID. Admin-only parameter.
Sorting
sort_by
string
default: "created_at"
Field to sort by. Options: created_at, updated_at, status, original_filename
Sort direction. Options: asc (ascending), desc (descending)
Number of results to skip. Use for pagination. ? skip = 20 # Skip first 20 results
Maximum results to return. Range: 1-100. ? limit = 50 # Return max 50 results
Response
Array of extraction objects matching the query.
Total number of extractions matching the filters (before pagination).
Number of results skipped.
Maximum results returned.
Unique extraction ID (UUID).
UUID of the source document.
Name of the uploaded file.
Processing status: completed, processing, failed, pending.
ISO 8601 timestamp of extraction creation.
ISO 8601 timestamp of last update.
Whether extraction requires human review.
Whether extraction has been reviewed by a human.
ISO 8601 timestamp of review completion. null if not reviewed.
UUID of user who performed review. null if not reviewed.
Extracted data matching the schema.
Corrected data after human review. null if not reviewed. Use this for automation if is_reviewed = true .
Confidence scores and review flags. Only present in Advanced/VLM extractions.
Examples
Poll for Review Completion
Check if a specific document has been reviewed:
curl "https://api.documind.com/api/v1/data/extractions?document_id=550e8400-e29b-41d4-a716-446655440000&limit=1" \
-H 'X-API-Key: YOUR_API_KEY'
{
"items" : [
{
"id" : "extr_abc123" ,
"document_id" : "550e8400-e29b-41d4-a716-446655440000" ,
"original_filename" : "invoice-2024-001.pdf" ,
"status" : "completed" ,
"created_at" : "2024-01-15T10:30:00Z" ,
"updated_at" : "2024-01-15T10:35:00Z" ,
"needs_review" : true ,
"is_reviewed" : true ,
"reviewed_at" : "2024-01-15T10:35:00Z" ,
"reviewed_by" : "user_xyz789" ,
"results" : {
"invoice_number" : "INV-2024-001" ,
"total_amount" : 1250.00
},
"reviewed_results" : {
"invoice_number" : "INV-2024-001" ,
"total_amount" : 1275.00
},
"needs_review_metadata" : {
"confidence_scores" : {
"invoice_number" : 95.2 ,
"total_amount" : 78.5
},
"review_flags" : {
"invoice_number" : false ,
"total_amount" : true
}
}
}
],
"total" : 1 ,
"skip" : 0 ,
"limit" : 1
}
List Pending Reviews
Get all extractions waiting for review:
curl "https://api.documind.com/api/v1/data/extractions?needs_review=true&is_reviewed=false&sort_by=created_at&sort_order=desc&limit=50" \
-H 'X-API-Key: YOUR_API_KEY'
response = requests.get(
"https://api.documind.com/api/v1/data/extractions" ,
headers = { "X-API-Key" : API_KEY },
params = {
"needs_review" : True ,
"is_reviewed" : False ,
"sort_by" : "created_at" ,
"sort_order" : "desc" ,
"limit" : 50
}
)
pending = response.json()
print ( f "Pending reviews: { pending[ 'total' ] } " )
for extraction in pending[ "items" ]:
print ( f "- { extraction[ 'original_filename' ] } ( { extraction[ 'created_at' ] } )" )
Filter by Date Range
Get extractions from last 24 hours:
from datetime import datetime, timedelta
yesterday = (datetime.utcnow() - timedelta( days = 1 )).isoformat() + "Z"
response = requests.get(
"https://api.documind.com/api/v1/data/extractions" ,
headers = { "X-API-Key" : API_KEY },
params = {
"created_after" : yesterday,
"status" : "completed" ,
"limit" : 100
}
)
recent = response.json()
print ( f "Extractions in last 24h: { recent[ 'total' ] } " )
Iterate through all extractions:
def get_all_extractions ( api_key , filters = None ):
"""
Fetch all extractions matching filters, handling pagination.
"""
all_extractions = []
skip = 0
limit = 100
while True :
params = {
"skip" : skip,
"limit" : limit,
** (filters or {})
}
response = requests.get(
"https://api.documind.com/api/v1/data/extractions" ,
headers = { "X-API-Key" : api_key},
params = params
)
data = response.json()
all_extractions.extend(data[ "items" ])
# Check if we've fetched everything
if len (data[ "items" ]) < limit:
break
skip += limit
return all_extractions
# Usage
filters = {
"status" : "completed" ,
"created_after" : "2024-01-01T00:00:00Z"
}
all_completed = get_all_extractions( API_KEY , filters)
print ( f "Total completed extractions: { len (all_completed) } " )
Common Query Patterns
Pattern 1: Polling for Review
# Query by document_id to check specific extraction
params = {
"document_id" : document_id,
"limit" : 1
}
Pattern 2: List All Pending Reviews
# Get extractions waiting for human review
params = {
"needs_review" : True ,
"is_reviewed" : False ,
"sort_by" : "created_at" ,
"sort_order" : "asc" # Oldest first
}
Pattern 3: Get Completed Reviews
# Get extractions reviewed today
from datetime import datetime
params = {
"is_reviewed" : True ,
"created_after" : datetime.utcnow().replace( hour = 0 , minute = 0 ).isoformat() + "Z"
}
# Find failed extractions for retry
params = {
"status" : "failed" ,
"created_after" : yesterday,
"sort_by" : "created_at" ,
"sort_order" : "desc"
}
Pattern 5: Organization-Wide Query (Admin)
# Get all extractions for organization
params = {
"organization_id" : "org_uuid" ,
"created_after" : "2024-01-01T00:00:00Z" ,
"limit" : 100
}
Response Codes
200 OK
Successful query, returns paginated results.
400 Bad Request
Invalid query parameters:
{
"detail" : "Invalid sort field: invalid_field"
}
403 Forbidden
Insufficient permissions:
{
"detail" : "You don't have permission to access these extractions"
}
500 Internal Server Error
Server-side error:
{
"detail" : "Failed to retrieve extractions. Please try again later."
}
Best Practices
Filter by document_id when possible for fastest queries: # ✓ Fast: Direct document lookup
params = { "document_id" : doc_id, "limit" : 1 }
# ✗ Slower: Scan all extractions
params = { "limit" : 100 } # Then filter in code
Implement Pagination Properly
Cache Results When Appropriate
For dashboard views, cache results briefly: import time
cache = {}
CACHE_TTL = 30 # seconds
def get_pending_reviews_cached ( api_key ):
now = time.time()
if "pending" in cache:
cached_data, timestamp = cache[ "pending" ]
if (now - timestamp) < CACHE_TTL :
return cached_data
# Fetch fresh data
data = fetch_pending_reviews(api_key)
cache[ "pending" ] = (data, now)
return data
Choose limits based on use case: # Polling: Just need one result
params = { "document_id" : doc_id, "limit" : 1 }
# Dashboard: Show recent items
params = { "sort_by" : "created_at" , "limit" : 20 }
# Batch export: Process all
params = { "limit" : 100 } # Max per page
Next Steps