Skip to main content

Overview

This guide will walk you through extracting data from a document using Documind in just a few minutes.

Prerequisites

  • A Documind account (sign up at app.documind.cloud)
  • An API key (create one in the dashboard)
  • A document to process (PDF, DOCX, or image)

Step 1: Get Your API Key

  1. Log in to Documind Dashboard
  2. Navigate to API Keys section
  3. Click Create New API Key
  4. Give it a name (e.g., “Development Key”)
  5. Copy and save the API key securely
The API key is only shown once. Store it securely - never commit it to version control.

Step 2: Upload a Document

import requests

API_KEY = "your_api_key_here"
headers = {"X-API-Key": API_KEY}

# Upload a document
with open("invoice.pdf", "rb") as f:
    files = {"files": f}
    response = requests.post(
        "https://api.documind.cloud/api/v1/upload",
        headers=headers,
        files=files
    )

document_ids = response.json()
document_id = document_ids[0]
print(f"Document uploaded: {document_id}")

Step 3: Define Your Schema

Create a simple schema to specify what data to extract:
{
  "type": "object",
  "named_entities": {
    "invoice_number": {
      "type": "string",
      "description": "The invoice number"
    },
    "invoice_date": {
      "type": "string",
      "description": "The invoice date"
    },
    "total_amount": {
      "type": "number",
      "description": "The total amount"
    },
    "vendor_name": {
      "type": "string",
      "description": "The vendor or company name"
    }
  },
  "required": ["invoice_number", "total_amount"]
}
You can also auto-generate schemas using the /schema/{document_id} endpoint or use predefined schemas for common document types.

Step 4: Extract Data

Now extract data from the uploaded document:
# Extract data using Basic mode
schema = {
    "type": "object",
    "named_entities": {
        "invoice_number": {"type": "string", "description": "Invoice number"},
        "invoice_date": {"type": "string", "description": "Invoice date"},
        "total_amount": {"type": "number", "description": "Total amount"},
        "vendor_name": {"type": "string", "description": "Vendor name"}
    },
    "required": ["invoice_number", "total_amount"]
}

response = requests.post(
    f"https://api.documind.cloud/api/v1/extract/{document_id}",
    headers=headers,
    json={
        "schema": schema,
        "model": "google-gemini-2.0-flash",  # Basic mode: 2 credits/page
        "prompt": "Extract invoice information accurately"
    }
)

result = response.json()
print("Extraction Results:")
print(result["results"])

Step 5: Handle the Response

The response contains the extracted data:
{
  "document_id": "123e4567-e89b-12d3-a456-426614174000",
  "results": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "total_amount": 1250.00,
    "vendor_name": "Acme Corporation"
  },
  "needs_review": false,
  "needs_review_metadata": {}
}
When needs_review is false: Use the results immediately in your workflow.When needs_review is true: Wait for human review before processing. See Review Workflow Guide.

What’s Next?

Schema Design Guide

Learn how to design schemas for better extraction accuracy

Invoice Processing Tutorial

Complete tutorial for processing invoices at scale

Extraction Modes

Understand Basic, VLM, and Advanced extraction modes

Review Workflow

Handle documents that need human review

Complete Example

Here’s a complete Python script that puts it all together:
import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.cloud/api/v1"
headers = {"X-API-Key": API_KEY}

# 1. Upload document
with open("invoice.pdf", "rb") as f:
    files = {"files": f}
    upload_response = requests.post(
        f"{BASE_URL}/upload",
        headers=headers,
        files=files
    )
document_id = upload_response.json()[0]

# 2. Define schema
schema = {
    "type": "object",
    "named_entities": {
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"},
        "vendor_name": {"type": "string"}
    },
    "required": ["invoice_number", "total_amount"]
}

# 3. Extract data
extract_response = requests.post(
    f"{BASE_URL}/extract/{document_id}",
    headers=headers,
    json={"schema": schema, "model": "google-gemini-2.0-flash"}
)

result = extract_response.json()

# 4. Process results
if not result["needs_review"]:
    print("Extracted Data:")
    print(result["results"])
else:
    print("Document flagged for review. Waiting for human verification...")
    # Poll for reviewed results
    # See review workflow guide for details

Troubleshooting

Check that your API key is correct and included in the X-API-Key header.
You’ve run out of credits. Check your balance in the dashboard or upgrade your plan.
Your schema might be invalid. Ensure it follows JSON Schema format with named_entities for the fields you want to extract.
  • Try using Advanced mode for better accuracy
  • Add more descriptive field descriptions in your schema
  • Include example values or constraints
  • See Schema Design Guide for best practices