Uploaded documents are stored for 30 days after upload. After this period, documents are automatically deleted. Extraction results remain accessible indefinitely.
Download original documents via /documents/{document_id}/download if you need to archive them beyond 30 days.
Upload multiple documents in a single request to reduce API calls:
Copy
# ✓ Good: Batch uploadfiles = [("files", open(f, "rb")) for f in file_paths]response = requests.post(url, headers=headers, files=files)# ✗ Avoid: Multiple single uploadsfor file_path in file_paths: response = requests.post(url, headers=headers, files={"files": open(file_path, "rb")})
Handle File Handles Properly
Always close file handles after upload:
Copy
# Using context manager (recommended)with open("document.pdf", "rb") as f: response = requests.post(url, files={"files": f})# Or explicitly closefiles = [("files", open(f, "rb")) for f in file_paths]try: response = requests.post(url, files=files)finally: for _, fh in files: fh.close()
Validate Files Before Upload
Check file format and size before uploading:
Copy
import osMAX_SIZE = 50 * 1024 * 1024 # 50 MBSUPPORTED = {'.pdf', '.docx', '.doc', '.jpg', '.jpeg', '.png'}def validate_file(file_path): ext = os.path.splitext(file_path)[1].lower() size = os.path.getsize(file_path) if ext not in SUPPORTED: raise ValueError(f"Unsupported format: {ext}") if size > MAX_SIZE: raise ValueError(f"File too large: {size/1024/1024:.1f}MB") return True# Validate before uploadvalid_files = [f for f in file_paths if validate_file(f)]
Store Document IDs
Map document IDs to original filenames for tracking:
Copy
# Create mappingfilename_to_id = {}for file_path in file_paths: with open(file_path, "rb") as f: response = requests.post(url, files={"files": f}) doc_id = response.json()[0] filename_to_id[os.path.basename(file_path)] = doc_id# Save mapping for later referenceimport jsonwith open("document_mapping.json", "w") as f: json.dump(filename_to_id, f, indent=2)