Skip to main content

Prerequisites

Before you begin, ensure you have:
  • A Documind account with available credits
  • An API key (see Authentication)
  • A document to process (PDF, DOCX, JPG, or PNG)

Complete Example

This guide walks through a complete extraction workflow: upload → extract → handle results.
1

Upload Document

Upload your document and receive a document ID.
curl -X POST https://api.documind.com/api/v1/upload \
  -H 'X-API-Key: YOUR_API_KEY' \
  -F '[email protected]'
Response:
[
  "550e8400-e29b-41d4-a716-446655440000"
]
2

Define Extraction Schema

Create or generate a JSON schema defining what data to extract.
Invoice Schema
{
  "named_entities": {
    "invoice_number": {
      "type": "string",
      "description": "The invoice number"
    },
    "invoice_date": {
      "type": "string",
      "description": "Date of invoice"
    },
    "vendor_name": {
      "type": "string",
      "description": "Name of the vendor"
    },
    "total_amount": {
      "type": "number",
      "description": "Total invoice amount"
    },
    "line_items": {
      "type": "array",
      "description": "Individual line items",
      "items": {
        "type": "object",
        "named_entities": {
          "description": {
            "type": "string",
            "description": "Item description"
          },
          "quantity": {
            "type": "number",
            "description": "Quantity ordered"
          },
          "unit_price": {
            "type": "number",
            "description": "Price per unit"
          },
          "amount": {
            "type": "number",
            "description": "Line total"
          }
        }
      }
    }
  },
  "required": ["invoice_number", "total_amount"]
}
Mark critical fields as required to enable automatic review flagging if confidence is low.
3

Extract Data

Process the document with your schema.
curl -X POST https://api.documind.com/api/v1/extract/550e8400-e29b-41d4-a716-446655440000 \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "schema": {
      "named_entities": {
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"}
      },
      "required": ["invoice_number", "total_amount"]
    },
    "prompt": "Extract invoice details",
    "model": "openai-gpt-4.1",
    "review_threshold": 80
  }'
Response:
{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "results": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "vendor_name": "Acme Corp",
    "total_amount": 1250.00,
    "line_items": [
      {
        "description": "Widget A",
        "quantity": 10,
        "unit_price": 50.00,
        "amount": 500.00
      },
      {
        "description": "Widget B",
        "quantity": 15,
        "unit_price": 50.00,
        "amount": 750.00
      }
    ]
  },
  "needs_review": false,
  "needs_review_metadata": {}
}
4

Handle Review Workflow

If needs_review is true, implement polling to wait for human review.
import time

def wait_for_review(document_id, timeout=300, poll_interval=10):
    """
    Poll extraction status until reviewed or timeout.
    Returns the reviewed results.
    """
    start_time = time.time()
    
    while (time.time() - start_time) < timeout:
        # Get extraction by document_id
        response = requests.get(
            f"https://api.documind.com/api/v1/data/extractions",
            headers={"X-API-Key": API_KEY},
            params={"document_id": document_id, "limit": 1}
        )
        
        data = response.json()
        if data["items"]:
            extraction = data["items"][0]
            
            if extraction["is_reviewed"]:
                print("✓ Review completed!")
                return extraction["reviewed_results"]
            
            print(f"⏳ Waiting for review... ({poll_interval}s)")
        
        time.sleep(poll_interval)
    
    raise TimeoutError("Review timeout exceeded")

# Usage
if result["needs_review"]:
    print("⚠️  Document needs review")
    reviewed_data = wait_for_review(document_id)
    process_invoice(reviewed_data)
else:
    process_invoice(result["results"])
Your automation now handles both immediate results and reviewed data seamlessly!

Extraction Mode Comparison

Choose the right mode for your use case:
Best for: Simple documents, high-volume processing
Request
{
  "schema": {...},
  "model": "google-gemini-2.0-flash",  // 2 credits/page
  "prompt": "Extract invoice data"
}
  • Fastest processing
  • Single model
  • No confidence scores
  • No automatic review flagging

Common Patterns

Batch Processing

Process multiple documents in parallel:
Python
import concurrent.futures
import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.com/api/v1"

def process_document(file_path):
    # Upload
    with open(file_path, "rb") as f:
        response = requests.post(
            f"{BASE_URL}/upload",
            headers={"X-API-Key": API_KEY},
            files={"files": f}
        )
    document_id = response.json()[0]
    
    # Extract
    result = requests.post(
        f"{BASE_URL}/extract/{document_id}",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"schema": schema, "prompt": "Extract data"}
    ).json()
    
    # Handle review if needed
    if result["needs_review"]:
        # Poll for review completion
        while True:
            extractions = requests.get(
                f"{BASE_URL}/data/extractions",
                headers={"X-API-Key": API_KEY},
                params={"document_id": document_id, "limit": 1}
            ).json()
            if extractions["items"][0]["is_reviewed"]:
                return extractions["items"][0]["reviewed_results"]
            time.sleep(10)
    return result["results"]

# Process 10 documents concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(process_document, document_files))

Error Handling

Handle common error scenarios:
Python
try:
    response = requests.post(
        f"https://api.documind.com/api/v1/extract/{document_id}",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"schema": schema, "prompt": "Extract data"}
    )
    response.raise_for_status()
    result = response.json()
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 402:
        print("Insufficient credits - please upgrade")
    elif e.response.status_code == 403:
        print("Document access denied")
    elif e.response.status_code == 500:
        print("Extraction failed - retry or contact support")
    else:
        print(f"Error: {e}")

Check Credits Before Processing

Avoid failures by checking credits first:
Python
response = requests.get(
    "https://api.documind.com/usage/credits",
    headers={"X-API-Key": API_KEY}
)

credits = response.json()
if credits["available_credits"] < 100:
    print("⚠️  Low credits - consider waiting for daily refresh")

Testing Your Integration

Use these test scenarios:
  1. Simple Document: Single-page invoice with clear text
  2. Complex Layout: Multi-column form or table
  3. Poor Quality: Scanned or low-resolution image
  4. Edge Cases: Missing fields, unusual formats
Start with Basic extraction for testing, then upgrade to Advanced for production.

Next Steps