All articles
purchase-orderprocurementapiautomationfinanceerp

How to Parse Purchase Orders to JSON with an API

Arthur Sterling

Arthur Sterling

Lead Developer Advocate, Parse

How to Parse Purchase Orders to JSON with an API

Every procurement workflow eventually hits the same wall: a PDF purchase order arrives and someone has to key the data into an ERP, an AP system, or a spreadsheet. At low volume it's annoying. At scale it becomes a real operational cost and a source of transcription errors.

The Parse Purchase Order Parser API removes that step entirely. POST a PO document, receive a clean JSON object with every field extracted and the arithmetic already validated. No configuration, no training data, no templates required.

What Gets Extracted

A single API call returns:

  • PO meta: PO number, issue date, expected delivery date, currency, payment terms, incoterms, and notes
  • Vendor: Name, address, email, phone, and tax ID
  • Buyer: Name, address, email, and phone
  • Ship-to: Separate delivery address when it differs from buyer
  • Line items: Line number, SKU, description, quantity, unit, unit price, line total, and per-line tax rate
  • Financials: Subtotal, discount amount, tax amount, shipping cost, and grand total
  • Validation: total_mismatch flag and a line_arithmetic_errors array listing every line where qty × unit_price ≠ line_total

The validation object is computed server-side — no extra API call needed. If line 3 has a pricing discrepancy, it shows up in line_arithmetic_errors with the expected total, the stated total, and the delta.

Getting Started

Sign up at cparse.com/dashboard and grab your API key. New accounts include free credit with no credit card required.

The base URL is https://api.cparse.com/po/v1.

Extract a PO: Python

import requests

url = "https://api.cparse.com/po/v1/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}

with open("purchase_order.pdf", "rb") as f:
    response = requests.post(url, files={"file": f}, headers=headers)

result = response.json()[0]
data = result["data"]

print(data["po_number"])           # "PO-2026-00187"
print(data["vendor"]["name"])      # "Global Parts Supply Co."
print(data["grand_total"])         # 1832.40

# Check validation
if data["validation"]["total_mismatch"]:
    print("Warning: grand total does not reconcile with line items")

for err in data["validation"]["line_arithmetic_errors"]:
    print(f"Line {err['line_number']}: expected {err['expected_total']}, got {err['stated_total']} (delta {err['delta']})")

# Print line items
for item in data.get("line_items", []):
    print(f"{item['sku']}  {item['description']}  qty={item['quantity']}  total={item['line_total']}")

Extract a PO: JavaScript

import FormData from 'form-data';
import fs from 'fs';
import axios from 'axios';

const form = new FormData();
form.append('file', fs.createReadStream('purchase_order.pdf'));

const response = await axios.post('https://api.cparse.com/po/v1/parse', form, {
  headers: {
    ...form.getHeaders(),
    'X-API-Key': 'YOUR_API_KEY',
  },
});

const [result] = response.data;
const { data } = result;

console.log(data.po_number);           // "PO-2026-00187"
console.log(data.vendor.name);         // "Global Parts Supply Co."
console.log(data.grand_total);         // 1832.40

if (data.validation.total_mismatch) {
  console.warn('Grand total does not reconcile');
}

data.validation.line_arithmetic_errors.forEach(err => {
  console.warn(`Line ${err.line_number}: expected ${err.expected_total}, stated ${err.stated_total}`);
});

The Validation Object

The validation object is what makes this API useful for automated workflows, not just data display. Human-keyed POs regularly contain arithmetic errors. A line with qty=10, unit_price=$49.99 and line_total=$450.00 looks fine at a glance — the correct value is $499.90.

{
  "validation": {
    "total_mismatch": false,
    "line_arithmetic_errors": [
      {
        "line_number": 2,
        "expected_total": 499.90,
        "stated_total": 450.00,
        "delta": -49.90
      }
    ]
  }
}
  • total_mismatch is true when grand_total ≠ subtotal + tax + shipping − discount.
  • line_arithmetic_errors lists every line where qty × unit_price ≠ line_total, with the delta so you can decide whether to flag it, raise a query with the supplier, or treat it as a rounding difference.

Use the delta to set your own tolerance. A $0.01 rounding difference is probably fine to accept automatically. A $49.90 gap warrants a supplier query before payment.

Supported Formats

The API accepts PDF, DOCX, JPEG, PNG, and ZIP or 7z archives containing a single document. For scanned or image-based POs, OCR is handled automatically on the MEGA plan.

Integration Patterns

ERP ingestion: Extract line items and push them directly to your purchase order matching workflow in SAP, Oracle, or any ERP that exposes a REST API.

AP automation: On receipt of a supplier invoice, cross-reference po_number and vendor.name from the parsed PO to confirm the invoice matches an approved order before payment.

Supplier portal: Accept PO uploads from buyers, parse them automatically, and create orders in your fulfilment system without manual data entry.

Audit trail: Store the structured JSON alongside the original PDF. The validation object gives auditors a machine-readable record of any discrepancies found at intake.

Next Steps

See the complete field reference and plan details in the Purchase Order Parser documentation. Get your API key from cparse.com/dashboard.


Arthur Sterling is the Lead Developer Advocate at Parse.