How to Parse Purchase Orders to JSON with an API
Arthur Sterling
Lead Developer Advocate, Parse
How to Parse Purchase Orders to JSON with an API
Every procurement workflow eventually hits the same wall: a PDF purchase order arrives and someone has to key the data into an ERP, an AP system, or a spreadsheet. At low volume it's annoying. At scale it becomes a real operational cost and a source of transcription errors.
The Parse Purchase Order Parser API removes that step entirely. POST a PO document, receive a clean JSON object with every field extracted and the arithmetic already validated. No configuration, no training data, no templates required.
What Gets Extracted
A single API call returns:
- •PO meta: PO number, issue date, expected delivery date, currency, payment terms, incoterms, and notes
- •Vendor: Name, address, email, phone, and tax ID
- •Buyer: Name, address, email, and phone
- •Ship-to: Separate delivery address when it differs from buyer
- •Line items: Line number, SKU, description, quantity, unit, unit price, line total, and per-line tax rate
- •Financials: Subtotal, discount amount, tax amount, shipping cost, and grand total
- •Validation:
total_mismatchflag and aline_arithmetic_errorsarray listing every line whereqty × unit_price ≠ line_total
The validation object is computed server-side — no extra API call needed. If line 3 has a pricing discrepancy, it shows up in line_arithmetic_errors with the expected total, the stated total, and the delta.
Getting Started
Sign up at cparse.com/dashboard and grab your API key. New accounts include free credit with no credit card required.
The base URL is https://api.cparse.com/po/v1.
Extract a PO: Python
import requests
url = "https://api.cparse.com/po/v1/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}
with open("purchase_order.pdf", "rb") as f:
response = requests.post(url, files={"file": f}, headers=headers)
result = response.json()[0]
data = result["data"]
print(data["po_number"]) # "PO-2026-00187"
print(data["vendor"]["name"]) # "Global Parts Supply Co."
print(data["grand_total"]) # 1832.40
# Check validation
if data["validation"]["total_mismatch"]:
print("Warning: grand total does not reconcile with line items")
for err in data["validation"]["line_arithmetic_errors"]:
print(f"Line {err['line_number']}: expected {err['expected_total']}, got {err['stated_total']} (delta {err['delta']})")
# Print line items
for item in data.get("line_items", []):
print(f"{item['sku']} {item['description']} qty={item['quantity']} total={item['line_total']}")
Extract a PO: JavaScript
import FormData from 'form-data';
import fs from 'fs';
import axios from 'axios';
const form = new FormData();
form.append('file', fs.createReadStream('purchase_order.pdf'));
const response = await axios.post('https://api.cparse.com/po/v1/parse', form, {
headers: {
...form.getHeaders(),
'X-API-Key': 'YOUR_API_KEY',
},
});
const [result] = response.data;
const { data } = result;
console.log(data.po_number); // "PO-2026-00187"
console.log(data.vendor.name); // "Global Parts Supply Co."
console.log(data.grand_total); // 1832.40
if (data.validation.total_mismatch) {
console.warn('Grand total does not reconcile');
}
data.validation.line_arithmetic_errors.forEach(err => {
console.warn(`Line ${err.line_number}: expected ${err.expected_total}, stated ${err.stated_total}`);
});
The Validation Object
The validation object is what makes this API useful for automated workflows, not just data display. Human-keyed POs regularly contain arithmetic errors. A line with qty=10, unit_price=$49.99 and line_total=$450.00 looks fine at a glance — the correct value is $499.90.
{
"validation": {
"total_mismatch": false,
"line_arithmetic_errors": [
{
"line_number": 2,
"expected_total": 499.90,
"stated_total": 450.00,
"delta": -49.90
}
]
}
}
- •
total_mismatchistruewhengrand_total ≠ subtotal + tax + shipping − discount. - •
line_arithmetic_errorslists every line whereqty × unit_price ≠ line_total, with the delta so you can decide whether to flag it, raise a query with the supplier, or treat it as a rounding difference.
Use the delta to set your own tolerance. A $0.01 rounding difference is probably fine to accept automatically. A $49.90 gap warrants a supplier query before payment.
Supported Formats
The API accepts PDF, DOCX, JPEG, PNG, and ZIP or 7z archives containing a single document. For scanned or image-based POs, OCR is handled automatically on the MEGA plan.
Integration Patterns
ERP ingestion: Extract line items and push them directly to your purchase order matching workflow in SAP, Oracle, or any ERP that exposes a REST API.
AP automation: On receipt of a supplier invoice, cross-reference po_number and vendor.name from the parsed PO to confirm the invoice matches an approved order before payment.
Supplier portal: Accept PO uploads from buyers, parse them automatically, and create orders in your fulfilment system without manual data entry.
Audit trail: Store the structured JSON alongside the original PDF. The validation object gives auditors a machine-readable record of any discrepancies found at intake.
Next Steps
See the complete field reference and plan details in the Purchase Order Parser documentation. Get your API key from cparse.com/dashboard.
Arthur Sterling is the Lead Developer Advocate at Parse.