Skip to main content

Overview

This guide will walk you through extracting data from a document using Documind in just a few minutes.

Prerequisites

  • A Documind account (sign up at app.documind.cloud)
  • An API key (create one in the dashboard)
  • A document to process (PDF, DOCX, or image)

Step 1: Get Your API Key

  1. Log in to Documind Dashboard
  2. Navigate to API Keys section
  3. Click Create New API Key
  4. Give it a name (e.g., “Development Key”)
  5. Copy and save the API key securely
The API key is only shown once. Store it securely - never commit it to version control.

Step 2: Upload a Document

import requests

API_KEY = "your_api_key_here"
headers = {"X-API-Key": API_KEY}

# Upload a document
with open("invoice.pdf", "rb") as f:
    files = {"files": f}
    response = requests.post(
        "https://api.documind.cloud/api/v1/upload",
        headers=headers,
        files=files
    )

document_ids = response.json()
document_id = document_ids[0]
print(f"Document uploaded: {document_id}")

Step 3: Define Your Schema

Create a simple schema to specify what data to extract:
{
  "type": "object",
  "named_entities": {
    "invoice_number": {
      "type": "string",
      "description": "The invoice number"
    },
    "invoice_date": {
      "type": "string",
      "description": "The invoice date"
    },
    "total_amount": {
      "type": "number",
      "description": "The total amount"
    },
    "vendor_name": {
      "type": "string",
      "description": "The vendor or company name"
    }
  },
  "required": ["invoice_number", "total_amount"]
}
You can also auto-generate schemas using the /schema/{document_id} endpoint or use predefined schemas for common document types.

Step 4: Extract Data

Now extract data from the uploaded document:
# Extract data using Basic mode
schema = {
    "type": "object",
    "named_entities": {
        "invoice_number": {"type": "string", "description": "Invoice number"},
        "invoice_date": {"type": "string", "description": "Invoice date"},
        "total_amount": {"type": "number", "description": "Total amount"},
        "vendor_name": {"type": "string", "description": "Vendor name"}
    },
    "required": ["invoice_number", "total_amount"]
}

response = requests.post(
    f"https://api.documind.cloud/api/v1/extract/{document_id}",
    headers=headers,
    json={
        "schema": schema,
        "model": "google-gemini-2.0-flash",  # Basic mode: 2 credits/page
        "prompt": "Extract invoice information accurately"
    }
)

result = response.json()
print("Extraction Results:")
print(result["results"])

Step 5: Handle the Response

The response contains the extracted data:
{
  "document_id": "123e4567-e89b-12d3-a456-426614174000",
  "results": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "total_amount": 1250.00,
    "vendor_name": "Acme Corporation"
  },
  "needs_review": false,
  "needs_review_metadata": {}
}
When needs_review is false: Use the results immediately in your workflow.When needs_review is true: Wait for human review before processing. See Review Workflow Guide.

What’s Next?

Complete Example

Here’s a complete Python script that puts it all together:
import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.cloud/api/v1"
headers = {"X-API-Key": API_KEY}

# 1. Upload document
with open("invoice.pdf", "rb") as f:
    files = {"files": f}
    upload_response = requests.post(
        f"{BASE_URL}/upload",
        headers=headers,
        files=files
    )
document_id = upload_response.json()[0]

# 2. Define schema
schema = {
    "type": "object",
    "named_entities": {
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"},
        "vendor_name": {"type": "string"}
    },
    "required": ["invoice_number", "total_amount"]
}

# 3. Extract data
extract_response = requests.post(
    f"{BASE_URL}/extract/{document_id}",
    headers=headers,
    json={"schema": schema, "model": "google-gemini-2.0-flash"}
)

result = extract_response.json()

# 4. Process results
if not result["needs_review"]:
    print("Extracted Data:")
    print(result["results"])
else:
    print("Document flagged for review. Waiting for human verification...")
    # Poll for reviewed results
    # See review workflow guide for details

Troubleshooting

Check that your API key is correct and included in the X-API-Key header.
You’ve run out of credits. Check your balance in the dashboard or upgrade your plan.
Your schema might be invalid. Ensure it follows JSON Schema format with named_entities for the fields you want to extract.
  • Try using Advanced mode for better accuracy
  • Add more descriptive field descriptions in your schema
  • Include example values or constraints
  • See Schema Design Guide for best practices