Extract Data

Endpoint

POST /extract/{document_id}

Extract structured information from an uploaded document using a defined schema. Choose between Basic, VLM, or Advanced extraction modes based on your accuracy and speed requirements.

Authentication

X-API-Key

string

required

API key for authentication. Your unique API key.

Path Parameters

document_id

string

required

UUID of the uploaded document. Obtained from the /upload endpoint.

Request Body

schema

object

required

JSON Schema defining the structure of data to extract. Uses named_entities format.

{
  "named_entities": {
    "field_name": {
      "type": "string|number|boolean|array|object",
      "description": "Field description for AI context"
    }
  },
  "required": ["field1", "field2"]
}

prompt

string

Additional instructions for extraction. Optional but recommended for complex documents.Default: "No additional instructions provided."

model

string

For Basic Extraction only. Specify the AI model to use:

openai-gpt-4o (6 credits/page) - Most accurate
openai-gpt-4.1 (4 credits/page) - Balanced
google-gemini-2.0-flash (2 credits/page) - Fastest

If provided, uses Basic extraction mode (single model, no confidence scores).

extraction_mode

string

For VLM Extraction only. Set to:

vlm (10 credits/page) - Vision-based extraction for scanned docs

For Advanced extraction (15 credits/page): Don’t set this parameter AND don’t set model.
For Basic extraction: Set model parameter instead.

review_threshold

number

default:"80"

Confidence threshold (0-100) for automatic review flagging. Only applies to Advanced/VLM modes.Fields with confidence below this threshold are flagged for review if they’re marked as required in the schema.

Response

document_id

string

UUID of the processed document.

results

object

Extracted data matching your schema structure. Fields are ordered according to schema definition.

needs_review

boolean

Whether this extraction requires human review. true if any required fields have confidence below the review threshold.

needs_review_metadata

object

Metadata about fields needing review. Only present in Advanced/VLM modes.

Show Metadata Structure

confidence_scores

object

Confidence scores (0-100) for each extracted field. Calculated as:
0.4 × lexical_similarity + 0.6 × semantic_similarity

review_flags

object

Boolean flags indicating which fields need review.

Examples

Basic Extraction

Fast, single-model extraction for simple documents:

curl -X POST https://api.documind.com/api/v1/extract/550e8400-e29b-41d4-a716-446655440000 \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "schema": {
      "named_entities": {
        "invoice_number": {
          "type": "string",
          "description": "The invoice number"
        },
        "total_amount": {
          "type": "number",
          "description": "Total invoice amount"
        }
      },
      "required": ["invoice_number"]
    },
    "prompt": "Extract invoice details",
    "model": "openai-gpt-4.1"
  }'

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "results": {
    "invoice_number": "INV-2024-001",
    "total_amount": 1250.00,
    "vendor_name": "Acme Corporation"
  },
  "needs_review": false,
  "needs_review_metadata": {}
}

Advanced Extraction

Multi-model validation with confidence scores:

advanced_extract = {
    "schema": {
        "named_entities": {
            "invoice_number": {
                "type": "string",
                "description": "The invoice number"
            },
            "line_items": {
                "type": "array",
                "description": "Invoice line items",
                "items": {
                    "type": "object",
                    "named_entities": {
                        "description": {
                            "type": "string",
                            "description": "Item description"
                        },
                        "amount": {
                            "type": "number",
                            "description": "Line total"
                        }
                    }
                }
            }
        },
        "required": ["invoice_number", "line_items"]
    },
    "prompt": "Extract all invoice details with high accuracy",
    # Advanced mode: don't set 'model' or 'extraction_mode' - 15 credits per page
    "review_threshold": 85
}

response = requests.post(
    f"https://api.documind.com/api/v1/extract/{document_id}",
    headers={
        "X-API-Key": API_KEY,
        "Content-Type": "application/json"
    },
    json=advanced_extract
)

result = response.json()

# Check if review is needed
if result["needs_review"]:
    print("⚠️  Some fields need review:")
    for field, needs_review in result["needs_review_metadata"]["review_flags"].items():
        if needs_review:
            confidence = result["needs_review_metadata"]["confidence_scores"][field]
            print(f"  - {field}: {confidence:.1f}% confidence")

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "results": {
    "invoice_number": "INV-2024-001",
    "line_items": [
      {
        "description": "Professional Services",
        "amount": 1000.00
      },
      {
        "description": "Software License",
        "amount": 250.00
      }
    ]
  },
  "needs_review": true,
  "needs_review_metadata": {
    "confidence_scores": {
      "invoice_number": 95.2,
      "line_items": {
        "0": {
          "description": 88.5,
          "amount": 92.1
        },
        "1": {
          "description": 72.3,
          "amount": 95.8
        }
      }
    },
    "review_flags": {
      "invoice_number": false,
      "line_items": {
        "0": {
          "description": false,
          "amount": false
        },
        "1": {
          "description": true,
          "amount": false
        }
      }
    }
  }
}

Extraction Mode Comparison

Feature	Basic	VLM	Advanced
Credits/Page	2-6	10	15
Speed	Fastest	Fast	Moderate
Accuracy	Good	Very Good	Highest
Confidence Scores	No	Yes	Yes
Review Flagging	No	Yes	Yes
Best For	Simple docs	Scanned images	Critical data
How to use	Set `model` param	Set `extraction_mode="vlm"`	Don’t set model or extraction_mode

Schema Guidelines

Field Types

String Fields

"customer_name": {
  "type": "string",
  "description": "Full name of the customer"
}

Use for text data: names, addresses, identifiers.

Number Fields

"total_amount": {
  "type": "number",
  "description": "Total invoice amount in USD"
}

For numeric values: amounts, quantities, percentages.

Array Fields

"line_items": {
  "type": "array",
  "description": "List of invoice line items",
  "items": {
    "type": "object",
    "named_entities": {
      "description": {"type": "string"},
      "quantity": {"type": "number"}
    }
  }
}

For repeating data: tables, lists, multiple entries.

Nested Objects

"billing_address": {
  "type": "object",
  "description": "Customer billing address",
  "named_entities": {
    "street": {"type": "string"},
    "city": {"type": "string"},
    "zip": {"type": "string"}
  }
}

For structured data groups.

Best Practices

Descriptive Field Names: Use clear, meaningful names (invoice_date not date1)
Detailed Descriptions: Help the AI understand context and format
Mark Critical Fields: Add to required array for automatic review
Consistent Naming: Use snake_case throughout your schema

Error Responses

402 Payment Required

{
  "detail": "Insufficient credits. Please upgrade your plan or wait for your daily credits to refresh."
}

Check your credit balance before processing large batches.

403 Forbidden

{
  "detail": "You don't have access to this document"
}

Document belongs to another user or organization.

500 Internal Server Error

{
  "detail": "Failed to extract information. Please contact support."
}

Extraction processing failed. Retry or contact support if it persists.

Next Steps

Review Workflow

Handle documents that need review

Polling Pattern

Implement review polling for automation

List Extractions

Query extraction results

Getting Started

Extraction Workflow

Review Workflow

Data Endpoints

Integration Patterns

Endpoint

Authentication

Path Parameters

Request Body

Response

Examples

Basic Extraction

Advanced Extraction

Extraction Mode Comparison

Schema Guidelines

Field Types

Best Practices

Error Responses

402 Payment Required

403 Forbidden

500 Internal Server Error

Next Steps

Review Workflow

Polling Pattern

List Extractions

Getting Started

Extraction Workflow

Review Workflow

Data Endpoints

Integration Patterns

​Endpoint

​Authentication

​Path Parameters

​Request Body

​Response

​Examples

​Basic Extraction

​Advanced Extraction

​Extraction Mode Comparison

​Schema Guidelines

​Field Types

​Best Practices

​Error Responses

​402 Payment Required

​403 Forbidden

​500 Internal Server Error

​Next Steps

Review Workflow

Polling Pattern

List Extractions

Endpoint

Authentication

Path Parameters

Request Body

Response

Examples

Basic Extraction

Advanced Extraction

Extraction Mode Comparison

Schema Guidelines

Field Types

Best Practices

Error Responses

402 Payment Required

403 Forbidden

500 Internal Server Error

Next Steps