Overview
This guide will walk you through extracting data from a document using Documind in just a few minutes.
Prerequisites
A Documind account (sign up at app.documind.cloud )
An API key (create one in the dashboard)
A document to process (PDF, DOCX, or image)
Step 1: Get Your API Key
Log in to Documind Dashboard
Navigate to API Keys section
Click Create New API Key
Give it a name (e.g., “Development Key”)
Copy and save the API key securely
The API key is only shown once. Store it securely - never commit it to version control.
Step 2: Upload a Document
import requests
API_KEY = "your_api_key_here"
headers = { "X-API-Key" : API_KEY }
# Upload a document
with open ( "invoice.pdf" , "rb" ) as f:
files = { "files" : f}
response = requests.post(
"https://api.documind.cloud/api/v1/upload" ,
headers = headers,
files = files
)
document_ids = response.json()
document_id = document_ids[ 0 ]
print ( f "Document uploaded: { document_id } " )
Step 3: Define Your Schema
Create a simple schema to specify what data to extract:
{
"type" : "object" ,
"named_entities" : {
"invoice_number" : {
"type" : "string" ,
"description" : "The invoice number"
},
"invoice_date" : {
"type" : "string" ,
"description" : "The invoice date"
},
"total_amount" : {
"type" : "number" ,
"description" : "The total amount"
},
"vendor_name" : {
"type" : "string" ,
"description" : "The vendor or company name"
}
},
"required" : [ "invoice_number" , "total_amount" ]
}
You can also auto-generate schemas using the /schema/{document_id} endpoint or use predefined schemas for common document types.
Now extract data from the uploaded document:
# Extract data using Basic mode
schema = {
"type" : "object" ,
"named_entities" : {
"invoice_number" : { "type" : "string" , "description" : "Invoice number" },
"invoice_date" : { "type" : "string" , "description" : "Invoice date" },
"total_amount" : { "type" : "number" , "description" : "Total amount" },
"vendor_name" : { "type" : "string" , "description" : "Vendor name" }
},
"required" : [ "invoice_number" , "total_amount" ]
}
response = requests.post(
f "https://api.documind.cloud/api/v1/extract/ { document_id } " ,
headers = headers,
json = {
"schema" : schema,
"model" : "google-gemini-2.0-flash" , # Basic mode: 2 credits/page
"prompt" : "Extract invoice information accurately"
}
)
result = response.json()
print ( "Extraction Results:" )
print (result[ "results" ])
Step 5: Handle the Response
The response contains the extracted data:
{
"document_id" : "123e4567-e89b-12d3-a456-426614174000" ,
"results" : {
"invoice_number" : "INV-2024-001" ,
"invoice_date" : "2024-01-15" ,
"total_amount" : 1250.00 ,
"vendor_name" : "Acme Corporation"
},
"needs_review" : false ,
"needs_review_metadata" : {}
}
When needs_review is false : Use the results immediately in your workflow.When needs_review is true : Wait for human review before processing. See Review Workflow Guide .
What’s Next?
Complete Example
Here’s a complete Python script that puts it all together:
import requests
import time
API_KEY = "your_api_key_here"
BASE_URL = "https://api.documind.cloud/api/v1"
headers = { "X-API-Key" : API_KEY }
# 1. Upload document
with open ( "invoice.pdf" , "rb" ) as f:
files = { "files" : f}
upload_response = requests.post(
f " { BASE_URL } /upload" ,
headers = headers,
files = files
)
document_id = upload_response.json()[ 0 ]
# 2. Define schema
schema = {
"type" : "object" ,
"named_entities" : {
"invoice_number" : { "type" : "string" },
"total_amount" : { "type" : "number" },
"vendor_name" : { "type" : "string" }
},
"required" : [ "invoice_number" , "total_amount" ]
}
# 3. Extract data
extract_response = requests.post(
f " { BASE_URL } /extract/ { document_id } " ,
headers = headers,
json = { "schema" : schema, "model" : "google-gemini-2.0-flash" }
)
result = extract_response.json()
# 4. Process results
if not result[ "needs_review" ]:
print ( "Extracted Data:" )
print (result[ "results" ])
else :
print ( "Document flagged for review. Waiting for human verification..." )
# Poll for reviewed results
# See review workflow guide for details
Troubleshooting
Check that your API key is correct and included in the X-API-Key header.
You’ve run out of credits. Check your balance in the dashboard or upgrade your plan.
Your schema might be invalid. Ensure it follows JSON Schema format with named_entities for the fields you want to extract.