Quick Start

Prerequisites

Before you begin, ensure you have:

A Documind account with available credits
An API key (see Authentication)
A document to process (PDF, DOCX, JPG, or PNG)

Complete Example

This guide walks through a complete extraction workflow: upload → extract → handle results.

Upload Document

Upload your document and receive a document ID.

curl -X POST https://api.documind.com/api/v1/upload \
  -H 'X-API-Key: YOUR_API_KEY' \
  -F '[email protected]'

Response:

[
  "550e8400-e29b-41d4-a716-446655440000"
]

Define Extraction Schema

Create or generate a JSON schema defining what data to extract.

Manual Schema
Generate from Sample

Invoice Schema

{
  "named_entities": {
    "invoice_number": {
      "type": "string",
      "description": "The invoice number"
    },
    "invoice_date": {
      "type": "string",
      "description": "Date of invoice"
    },
    "vendor_name": {
      "type": "string",
      "description": "Name of the vendor"
    },
    "total_amount": {
      "type": "number",
      "description": "Total invoice amount"
    },
    "line_items": {
      "type": "array",
      "description": "Individual line items",
      "items": {
        "type": "object",
        "named_entities": {
          "description": {
            "type": "string",
            "description": "Item description"
          },
          "quantity": {
            "type": "number",
            "description": "Quantity ordered"
          },
          "unit_price": {
            "type": "number",
            "description": "Price per unit"
          },
          "amount": {
            "type": "number",
            "description": "Line total"
          }
        }
      }
    }
  },
  "required": ["invoice_number", "total_amount"]
}

import requests

# Upload a sample invoice
with open("sample_invoice.pdf", "rb") as f:
    response = requests.post(
        "https://api.documind.com/api/v1/upload",
        headers={"X-API-Key": "your_api_key"},
        files={"files": f}
    )
sample_id = response.json()[0]

# Generate schema from the sample
response = requests.post(
    f"https://api.documind.com/api/v1/schema/{sample_id}",
    headers={"X-API-Key": "your_api_key"}
)
schema = response.json()["schema"]

Also available in UI: Dashboard → Schemas → Generate from Sample

Mark critical fields as required to enable automatic review flagging if confidence is low.

Extract Data

Process the document with your schema.

curl -X POST https://api.documind.com/api/v1/extract/550e8400-e29b-41d4-a716-446655440000 \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "schema": {
      "named_entities": {
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"}
      },
      "required": ["invoice_number", "total_amount"]
    },
    "prompt": "Extract invoice details",
    "model": "openai-gpt-4.1",
    "review_threshold": 80
  }'

Response:

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "results": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "vendor_name": "Acme Corp",
    "total_amount": 1250.00,
    "line_items": [
      {
        "description": "Widget A",
        "quantity": 10,
        "unit_price": 50.00,
        "amount": 500.00
      },
      {
        "description": "Widget B",
        "quantity": 15,
        "unit_price": 50.00,
        "amount": 750.00
      }
    ]
  },
  "needs_review": false,
  "needs_review_metadata": {}
}

Handle Review Workflow

If needs_review is true, implement polling to wait for human review.

import time

def wait_for_review(document_id, timeout=300, poll_interval=10):
    """
    Poll extraction status until reviewed or timeout.
    Returns the reviewed results.
    """
    start_time = time.time()
    
    while (time.time() - start_time) < timeout:
        # Get extraction by document_id
        response = requests.get(
            f"https://api.documind.com/api/v1/data/extractions",
            headers={"X-API-Key": API_KEY},
            params={"document_id": document_id, "limit": 1}
        )
        
        data = response.json()
        if data["items"]:
            extraction = data["items"][0]
            
            if extraction["is_reviewed"]:
                print("✓ Review completed!")
                return extraction["reviewed_results"]
            
            print(f"⏳ Waiting for review... ({poll_interval}s)")
        
        time.sleep(poll_interval)
    
    raise TimeoutError("Review timeout exceeded")

# Usage
if result["needs_review"]:
    print("⚠️  Document needs review")
    reviewed_data = wait_for_review(document_id)
    process_invoice(reviewed_data)
else:
    process_invoice(result["results"])

Your automation now handles both immediate results and reviewed data seamlessly!

Extraction Mode Comparison

Choose the right mode for your use case:

Basic (Fastest)
VLM (Balanced)
Advanced (Most Accurate)

Best for: Simple documents, high-volume processing

Request

{
  "schema": {...},
  "model": "google-gemini-2.0-flash",  // 2 credits/page
  "prompt": "Extract invoice data"
}

Fastest processing
Single model
No confidence scores
No automatic review flagging

Best for: Scanned documents, forms with complex layouts

Request

{
  "schema": {...},
  "extraction_mode": "vlm",  // 10 credits/page
  "review_threshold": 80,
  "prompt": "Extract form fields"
}

Visual document processing
Multiple VLM models
Includes confidence scores
Automatic review flagging

Best for: Critical documents, invoices, structured forms

Request

{
  "schema": {...},
  // Advanced mode: don't set 'model' or 'extraction_mode' - 15 credits/page
  "review_threshold": 85,
  "prompt": "Extract all fields with high accuracy"
}

Highest accuracy
Multi-model ensemble extraction
Detailed confidence scores
Automatic review flagging

Common Patterns

Batch Processing

Process multiple documents in parallel:

Python

import concurrent.futures
import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.com/api/v1"

def process_document(file_path):
    # Upload
    with open(file_path, "rb") as f:
        response = requests.post(
            f"{BASE_URL}/upload",
            headers={"X-API-Key": API_KEY},
            files={"files": f}
        )
    document_id = response.json()[0]
    
    # Extract
    result = requests.post(
        f"{BASE_URL}/extract/{document_id}",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"schema": schema, "prompt": "Extract data"}
    ).json()
    
    # Handle review if needed
    if result["needs_review"]:
        # Poll for review completion
        while True:
            extractions = requests.get(
                f"{BASE_URL}/data/extractions",
                headers={"X-API-Key": API_KEY},
                params={"document_id": document_id, "limit": 1}
            ).json()
            if extractions["items"][0]["is_reviewed"]:
                return extractions["items"][0]["reviewed_results"]
            time.sleep(10)
    return result["results"]

# Process 10 documents concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(process_document, document_files))

Error Handling

Handle common error scenarios:

Python

try:
    response = requests.post(
        f"https://api.documind.com/api/v1/extract/{document_id}",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"schema": schema, "prompt": "Extract data"}
    )
    response.raise_for_status()
    result = response.json()
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 402:
        print("Insufficient credits - please upgrade")
    elif e.response.status_code == 403:
        print("Document access denied")
    elif e.response.status_code == 500:
        print("Extraction failed - retry or contact support")
    else:
        print(f"Error: {e}")

Check Credits Before Processing

Avoid failures by checking credits first:

Python

response = requests.get(
    "https://api.documind.com/usage/credits",
    headers={"X-API-Key": API_KEY}
)

credits = response.json()
if credits["available_credits"] < 100:
    print("⚠️  Low credits - consider waiting for daily refresh")

Testing Your Integration

Use these test scenarios:

Simple Document: Single-page invoice with clear text
Complex Layout: Multi-column form or table
Poor Quality: Scanned or low-resolution image
Edge Cases: Missing fields, unusual formats

Start with Basic extraction for testing, then upgrade to Advanced for production.

Next Steps

Extraction Flow

Deep dive into the complete extraction workflow

Review Polling

Advanced patterns for handling reviews in automation

Data Endpoints

Query and filter extraction results

Error Handling

Robust error handling strategies

Getting Started

Extraction Workflow

Review Workflow

Data Endpoints

Integration Patterns

Prerequisites

Complete Example

Extraction Mode Comparison

Common Patterns

Batch Processing

Error Handling

Check Credits Before Processing

Testing Your Integration

Next Steps

Extraction Flow

Review Polling

Data Endpoints

Error Handling

Getting Started

Extraction Workflow

Review Workflow

Data Endpoints

Integration Patterns

​Prerequisites

​Complete Example

​Extraction Mode Comparison

​Common Patterns

​Batch Processing

​Error Handling

​Check Credits Before Processing

​Testing Your Integration

​Next Steps

Extraction Flow

Review Polling

Data Endpoints

Error Handling

Prerequisites

Complete Example

Extraction Mode Comparison

Common Patterns

Batch Processing

Error Handling

Check Credits Before Processing

Testing Your Integration

Next Steps