Quick Start

Overview

This guide will walk you through extracting data from a document using Documind in just a few minutes.

Prerequisites

A Documind account (sign up at app.documind.cloud)
An API key (create one in the dashboard)
A document to process (PDF, DOCX, or image)

Step 1: Get Your API Key

Log in to Documind Dashboard
Navigate to API Keys section
Click Create New API Key
Give it a name (e.g., “Development Key”)
Copy and save the API key securely

The API key is only shown once. Store it securely - never commit it to version control.

Step 2: Upload a Document

import requests

API_KEY = "your_api_key_here"
headers = {"X-API-Key": API_KEY}

# Upload a document
with open("invoice.pdf", "rb") as f:
    files = {"files": f}
    response = requests.post(
        "https://api.documind.cloud/api/v1/upload",
        headers=headers,
        files=files
    )

document_ids = response.json()
document_id = document_ids[0]
print(f"Document uploaded: {document_id}")

Step 3: Define Your Schema

Create a simple schema to specify what data to extract:

{
  "type": "object",
  "named_entities": {
    "invoice_number": {
      "type": "string",
      "description": "The invoice number"
    },
    "invoice_date": {
      "type": "string",
      "description": "The invoice date"
    },
    "total_amount": {
      "type": "number",
      "description": "The total amount"
    },
    "vendor_name": {
      "type": "string",
      "description": "The vendor or company name"
    }
  },
  "required": ["invoice_number", "total_amount"]
}

You can also auto-generate schemas using the /schema/{document_id} endpoint or use predefined schemas for common document types.

Step 4: Extract Data

Now extract data from the uploaded document:

# Extract data using Basic mode
schema = {
    "type": "object",
    "named_entities": {
        "invoice_number": {"type": "string", "description": "Invoice number"},
        "invoice_date": {"type": "string", "description": "Invoice date"},
        "total_amount": {"type": "number", "description": "Total amount"},
        "vendor_name": {"type": "string", "description": "Vendor name"}
    },
    "required": ["invoice_number", "total_amount"]
}

response = requests.post(
    f"https://api.documind.cloud/api/v1/extract/{document_id}",
    headers=headers,
    json={
        "schema": schema,
        "model": "google-gemini-2.0-flash",  # Basic mode: 2 credits/page
        "prompt": "Extract invoice information accurately"
    }
)

result = response.json()
print("Extraction Results:")
print(result["results"])

Step 5: Handle the Response

The response contains the extracted data:

{
  "document_id": "123e4567-e89b-12d3-a456-426614174000",
  "results": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "total_amount": 1250.00,
    "vendor_name": "Acme Corporation"
  },
  "needs_review": false,
  "needs_review_metadata": {}
}

When needs_review is false: Use the results immediately in your workflow.When needs_review is true: Wait for human review before processing. See Review Workflow Guide.

What’s Next?

Schema Design Guide

Learn how to design schemas for better extraction accuracy

Invoice Processing Tutorial

Complete tutorial for processing invoices at scale

Extraction Modes

Understand Basic, VLM, and Advanced extraction modes

Review Workflow

Handle documents that need human review

Complete Example

Here’s a complete Python script that puts it all together:

import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.cloud/api/v1"
headers = {"X-API-Key": API_KEY}

# 1. Upload document
with open("invoice.pdf", "rb") as f:
    files = {"files": f}
    upload_response = requests.post(
        f"{BASE_URL}/upload",
        headers=headers,
        files=files
    )
document_id = upload_response.json()[0]

# 2. Define schema
schema = {
    "type": "object",
    "named_entities": {
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"},
        "vendor_name": {"type": "string"}
    },
    "required": ["invoice_number", "total_amount"]
}

# 3. Extract data
extract_response = requests.post(
    f"{BASE_URL}/extract/{document_id}",
    headers=headers,
    json={"schema": schema, "model": "google-gemini-2.0-flash"}
)

result = extract_response.json()

# 4. Process results
if not result["needs_review"]:
    print("Extracted Data:")
    print(result["results"])
else:
    print("Document flagged for review. Waiting for human verification...")
    # Poll for reviewed results
    # See review workflow guide for details

Troubleshooting

401 Unauthorized Error

Check that your API key is correct and included in the X-API-Key header.

402 Payment Required

You’ve run out of credits. Check your balance in the dashboard or upgrade your plan.

400 Bad Request

Your schema might be invalid. Ensure it follows JSON Schema format with named_entities for the fields you want to extract.

Poor Extraction Quality

Try using Advanced mode for better accuracy
Add more descriptive field descriptions in your schema
Include example values or constraints
See Schema Design Guide for best practices

Getting Started

Use-Case Tutorials

Advanced Guides

Overview

Prerequisites

Step 1: Get Your API Key

Step 2: Upload a Document

Step 3: Define Your Schema

Step 4: Extract Data

Step 5: Handle the Response

What’s Next?

Schema Design Guide

Invoice Processing Tutorial

Extraction Modes

Review Workflow

Complete Example

Troubleshooting

Getting Started

Use-Case Tutorials

Advanced Guides

​Overview

​Prerequisites

​Step 1: Get Your API Key

​Step 2: Upload a Document

​Step 3: Define Your Schema

​Step 4: Extract Data

​Step 5: Handle the Response

​What’s Next?

Schema Design Guide

Invoice Processing Tutorial

Extraction Modes

Review Workflow

​Complete Example

​Troubleshooting

Overview

Prerequisites

Step 1: Get Your API Key

Step 2: Upload a Document

Step 3: Define Your Schema

Step 4: Extract Data

Step 5: Handle the Response

What’s Next?

Complete Example

Troubleshooting