Skip to main content

Prerequisites

Before you begin, ensure you have:
  • A Documind account with available credits
  • An API key (see Authentication)
  • A document to process (PDF, DOCX, JPG, or PNG)

Complete Example

This guide walks through a complete extraction workflow: upload → extract → handle results.
1

Upload Document

Upload your document and receive a document ID.
curl -X POST https://api.documind.com/api/v1/upload \
  -H 'X-API-Key: YOUR_API_KEY' \
  -F 'files=@invoice.pdf'
Response:
[
  "550e8400-e29b-41d4-a716-446655440000"
]
2

Define Extraction Schema

Create or generate a JSON schema defining what data to extract.
Invoice Schema
{
  "named_entities": {
    "invoice_number": {
      "type": "string",
      "description": "The invoice number"
    },
    "invoice_date": {
      "type": "string",
      "description": "Date of invoice"
    },
    "vendor_name": {
      "type": "string",
      "description": "Name of the vendor"
    },
    "total_amount": {
      "type": "number",
      "description": "Total invoice amount"
    },
    "line_items": {
      "type": "array",
      "description": "Individual line items",
      "items": {
        "type": "object",
        "named_entities": {
          "description": {
            "type": "string",
            "description": "Item description"
          },
          "quantity": {
            "type": "number",
            "description": "Quantity ordered"
          },
          "unit_price": {
            "type": "number",
            "description": "Price per unit"
          },
          "amount": {
            "type": "number",
            "description": "Line total"
          }
        }
      }
    }
  },
  "required": ["invoice_number", "total_amount"]
}
Mark critical fields as required to enable automatic review flagging if confidence is low.
3

Extract Data

Process the document with your schema.
curl -X POST https://api.documind.com/api/v1/extract/550e8400-e29b-41d4-a716-446655440000 \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "schema": {
      "named_entities": {
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"}
      },
      "required": ["invoice_number", "total_amount"]
    },
    "prompt": "Extract invoice details",
    "model": "openai-gpt-4.1",
    "review_threshold": 80
  }'
Response:
{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "results": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "vendor_name": "Acme Corp",
    "total_amount": 1250.00,
    "line_items": [
      {
        "description": "Widget A",
        "quantity": 10,
        "unit_price": 50.00,
        "amount": 500.00
      },
      {
        "description": "Widget B",
        "quantity": 15,
        "unit_price": 50.00,
        "amount": 750.00
      }
    ]
  },
  "needs_review": false,
  "needs_review_metadata": {}
}
4

Handle Review Workflow

If needs_review is true, implement polling to wait for human review.
import time

def wait_for_review(document_id, timeout=300, poll_interval=10):
    """
    Poll extraction status until reviewed or timeout.
    Returns the reviewed results.
    """
    start_time = time.time()
    
    while (time.time() - start_time) < timeout:
        # Get extraction by document_id
        response = requests.get(
            f"https://api.documind.com/api/v1/data/extractions",
            headers={"X-API-Key": API_KEY},
            params={"document_id": document_id, "limit": 1}
        )
        
        data = response.json()
        if data["items"]:
            extraction = data["items"][0]
            
            if extraction["is_reviewed"]:
                print("✓ Review completed!")
                return extraction["reviewed_results"]
            
            print(f"⏳ Waiting for review... ({poll_interval}s)")
        
        time.sleep(poll_interval)
    
    raise TimeoutError("Review timeout exceeded")

# Usage
if result["needs_review"]:
    print("⚠️  Document needs review")
    reviewed_data = wait_for_review(document_id)
    process_invoice(reviewed_data)
else:
    process_invoice(result["results"])
Your automation now handles both immediate results and reviewed data seamlessly!

Extraction Mode Comparison

Choose the right mode for your use case:
Best for: Simple documents, high-volume processing
Request
{
  "schema": {...},
  "model": "google-gemini-2.0-flash",  // 2 credits/page
  "prompt": "Extract invoice data"
}
  • Fastest processing
  • Single model
  • No confidence scores
  • No automatic review flagging

Common Patterns

Batch Processing

Process multiple documents in parallel:
Python
import concurrent.futures
import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.com/api/v1"

def process_document(file_path):
    # Upload
    with open(file_path, "rb") as f:
        response = requests.post(
            f"{BASE_URL}/upload",
            headers={"X-API-Key": API_KEY},
            files={"files": f}
        )
    document_id = response.json()[0]
    
    # Extract
    result = requests.post(
        f"{BASE_URL}/extract/{document_id}",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"schema": schema, "prompt": "Extract data"}
    ).json()
    
    # Handle review if needed
    if result["needs_review"]:
        # Poll for review completion
        while True:
            extractions = requests.get(
                f"{BASE_URL}/data/extractions",
                headers={"X-API-Key": API_KEY},
                params={"document_id": document_id, "limit": 1}
            ).json()
            if extractions["items"][0]["is_reviewed"]:
                return extractions["items"][0]["reviewed_results"]
            time.sleep(10)
    return result["results"]

# Process 10 documents concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(process_document, document_files))

Error Handling

Handle common error scenarios:
Python
try:
    response = requests.post(
        f"https://api.documind.com/api/v1/extract/{document_id}",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"schema": schema, "prompt": "Extract data"}
    )
    response.raise_for_status()
    result = response.json()
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 402:
        print("Insufficient credits - please upgrade")
    elif e.response.status_code == 403:
        print("Document access denied")
    elif e.response.status_code == 500:
        print("Extraction failed - retry or contact support")
    else:
        print(f"Error: {e}")

Check Credits Before Processing

Avoid failures by checking credits first:
Python
response = requests.get(
    "https://api.documind.com/usage/credits",
    headers={"X-API-Key": API_KEY}
)

credits = response.json()
if credits["available_credits"] < 100:
    print("⚠️  Low credits - consider waiting for daily refresh")

Testing Your Integration

Use these test scenarios:
  1. Simple Document: Single-page invoice with clear text
  2. Complex Layout: Multi-column form or table
  3. Poor Quality: Scanned or low-resolution image
  4. Edge Cases: Missing fields, unusual formats
Start with Basic extraction for testing, then upgrade to Advanced for production.

Next Steps

Extraction Flow

Deep dive into the complete extraction workflow

Review Polling

Advanced patterns for handling reviews in automation

Data Endpoints

Query and filter extraction results

Error Handling

Robust error handling strategies