Overview
This guide will walk you through extracting data from a document using Documind in just a few minutes.
Prerequisites
A Documind account (sign up at app.documind.cloud )
An API key (create one in the dashboard)
A document to process (PDF, DOCX, or image)
Step 1: Get Your API Key
Log in to Documind Dashboard
Navigate to API Keys section
Click Create New API Key
Give it a name (e.g., “Development Key”)
Copy and save the API key securely
The API key is only shown once. Store it securely - never commit it to version control.
Step 2: Upload a Document
import requests
API_KEY = "your_api_key_here"
headers = { "X-API-Key" : API_KEY }
# Upload a document
with open ( "invoice.pdf" , "rb" ) as f:
files = { "files" : f}
response = requests.post(
"https://api.documind.cloud/api/v1/upload" ,
headers = headers,
files = files
)
document_ids = response.json()
document_id = document_ids[ 0 ]
print ( f "Document uploaded: { document_id } " )
Step 3: Define Your Schema
Create a simple schema to specify what data to extract:
{
"type" : "object" ,
"named_entities" : {
"invoice_number" : {
"type" : "string" ,
"description" : "The invoice number"
},
"invoice_date" : {
"type" : "string" ,
"description" : "The invoice date"
},
"total_amount" : {
"type" : "number" ,
"description" : "The total amount"
},
"vendor_name" : {
"type" : "string" ,
"description" : "The vendor or company name"
}
},
"required" : [ "invoice_number" , "total_amount" ]
}
You can also auto-generate schemas using the /schema/{document_id} endpoint or use predefined schemas for common document types.
Now extract data from the uploaded document:
# Extract data using Basic mode
schema = {
"type" : "object" ,
"named_entities" : {
"invoice_number" : { "type" : "string" , "description" : "Invoice number" },
"invoice_date" : { "type" : "string" , "description" : "Invoice date" },
"total_amount" : { "type" : "number" , "description" : "Total amount" },
"vendor_name" : { "type" : "string" , "description" : "Vendor name" }
},
"required" : [ "invoice_number" , "total_amount" ]
}
response = requests.post(
f "https://api.documind.cloud/api/v1/extract/ { document_id } " ,
headers = headers,
json = {
"schema" : schema,
"model" : "google-gemini-2.0-flash" , # Basic mode: 2 credits/page
"prompt" : "Extract invoice information accurately"
}
)
result = response.json()
print ( "Extraction Results:" )
print (result[ "results" ])
Step 5: Handle the Response
The response contains the extracted data:
{
"document_id" : "123e4567-e89b-12d3-a456-426614174000" ,
"results" : {
"invoice_number" : "INV-2024-001" ,
"invoice_date" : "2024-01-15" ,
"total_amount" : 1250.00 ,
"vendor_name" : "Acme Corporation"
},
"needs_review" : false ,
"needs_review_metadata" : {}
}
When needs_review is false : Use the results immediately in your workflow.When needs_review is true : Wait for human review before processing. See Review Workflow Guide .
What’s Next?
Schema Design Guide Learn how to design schemas for better extraction accuracy
Invoice Processing Tutorial Complete tutorial for processing invoices at scale
Extraction Modes Understand Basic, VLM, and Advanced extraction modes
Review Workflow Handle documents that need human review
Complete Example
Here’s a complete Python script that puts it all together:
import requests
import time
API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.cloud/api/v1"
headers = { "X-API-Key" : API_KEY }
# 1. Upload document
with open ( "invoice.pdf" , "rb" ) as f:
files = { "files" : f}
upload_response = requests.post(
f " { BASE_URL } /upload" ,
headers = headers,
files = files
)
document_id = upload_response.json()[ 0 ]
# 2. Define schema
schema = {
"type" : "object" ,
"named_entities" : {
"invoice_number" : { "type" : "string" },
"total_amount" : { "type" : "number" },
"vendor_name" : { "type" : "string" }
},
"required" : [ "invoice_number" , "total_amount" ]
}
# 3. Extract data
extract_response = requests.post(
f " { BASE_URL } /extract/ { document_id } " ,
headers = headers,
json = { "schema" : schema, "model" : "google-gemini-2.0-flash" }
)
result = extract_response.json()
# 4. Process results
if not result[ "needs_review" ]:
print ( "Extracted Data:" )
print (result[ "results" ])
else :
print ( "Document flagged for review. Waiting for human verification..." )
# Poll for reviewed results
# See review workflow guide for details
Troubleshooting
Check that your API key is correct and included in the X-API-Key header.
You’ve run out of credits. Check your balance in the dashboard or upgrade your plan.
Your schema might be invalid. Ensure it follows JSON Schema format with named_entities for the fields you want to extract.