Statement Parser API
v1.0Turn any bank or credit card statement into structured, categorized transaction data.
Overview
The Statement Parser API extracts every transaction from a bank statement PDF and returns clean, categorized JSON. Each transaction is classified using the Plaid taxonomy, normalized merchant names are identified, and the statement math is automatically reconciled.
🏦 Full Extraction
Every transaction, balance, and metadata from any statement format
🏷️ Auto-Categorized
Plaid-standard categories with deterministic matching for known brands
✅ Reconciled
Automatic math verification: opening + credits - debits = closing
💡 Perfect For
- • Accounting firms and bookkeeping automation
- • Personal finance and budgeting apps
- • Real estate transaction tracking
- • Expense management and categorization
- • Financial reconciliation workflows
Quick Start
Base URL
https://api.cparse.com/statement/v1Authentication
X-API-Key: YOUR_API_KEYGet your API key from the dashboard. New accounts include free credit — no credit card required.
Quick Example
cURL
curl --request POST \
--url https://api.cparse.com/statement/v1/parse \
--header 'X-API-Key: YOUR_API_KEY' \
--form 'file=@statement.pdf'Endpoints
/healthHealth check endpoint. Returns service status.
Response
{ "status": "ok", "service": "statement-parser", "version": "0.1.0" }/parseExtract structured transaction data from a bank statement.
Request
Content-Type: multipart/form-datafile(required): Bank statement file (PDF, DOCX, JPEG, PNG, ZIP, 7z)
Response
{
"success": true,
"file_name": "statement.pdf",
"pages_processed": 1,
"ocr_used": false,
"data": {
"document_metadata": {
"document_id": "doc_a1b2c3d4",
"institution": "ACME BANK",
"account_number_last4": "4829",
"account_currency": "GBP",
"statement_period_start": "2026-03-01",
"statement_period_end": "2026-03-31"
},
"summary": {
"opening_balance": 1250.50,
"closing_balance": 1685.53,
"total_debits": 64.97,
"total_credits": 500.00,
"transaction_count": 4,
"math_reconciliation_status": "verified",
"reconciliation_difference": 0.0
},
"transactions": [
{
"transaction_id": "tx_001",
"date": "2026-03-04",
"amount": -42.99,
"currency": "GBP",
"description_raw": "AMZN MKTPLACE AMZN.CO.UK GBR",
"description_normalised": "Amazon Marketplace",
"merchant_name": "Amazon",
"classification": {
"category": "general_merchandise",
"category_detailed": "general_merchandise_online_marketplaces",
"method": "deterministic",
"confidence_score": 1.0
},
"flags": {
"is_subscription": false,
"is_recurring": false,
"is_duplicate_candidate": false
}
},
{
"transaction_id": "tx_002",
"date": "2026-03-07",
"amount": -10.99,
"currency": "GBP",
"description_raw": "NETFLIX.COM BEVERLY HILLS CA",
"description_normalised": "Netflix",
"merchant_name": "Netflix",
"classification": {
"category": "entertainment",
"category_detailed": "entertainment_tv_and_movies",
"method": "deterministic",
"confidence_score": 1.0
},
"flags": {
"is_subscription": true,
"is_recurring": true,
"is_duplicate_candidate": false
}
}
]
}
}Response Fields
Document Metadata
| Field | Description |
|---|---|
document_id | Unique identifier for this parsed document |
institution | Bank or financial institution name |
account_number_last4 | Last 4 digits of the account number |
account_currency | 3-letter ISO currency code (e.g. GBP, USD) |
statement_period_start | Statement start date (YYYY-MM-DD) |
statement_period_end | Statement end date (YYYY-MM-DD) |
Summary
| Field | Description |
|---|---|
opening_balance | Starting balance for the statement period |
closing_balance | Ending balance for the statement period |
total_debits | Sum of all debit transactions (positive number) |
total_credits | Sum of all credit transactions (positive number) |
transaction_count | Total number of extracted transactions |
math_reconciliation_status | "verified" if balances add up, "discrepancy" if not, "not_checked" if data missing |
reconciliation_difference | Difference between expected and actual closing balance |
Transaction Fields
| Field | Description |
|---|---|
transaction_id | Sequential ID (e.g. tx_001) |
date | Transaction date (YYYY-MM-DD) |
amount | Signed amount: negative for debits, positive for credits |
currency | 3-letter ISO currency code |
description_raw | Original text as it appears on the statement |
description_normalised | Cleaned, human-readable version |
merchant_name | Identified merchant or entity |
classification.category | Plaid primary category (e.g. food_and_drink) |
classification.category_detailed | Plaid detailed category (e.g. food_and_drink_groceries) |
classification.method | "deterministic" for known brands, "ai_inference" for AI-classified |
classification.confidence_score | 1.0 for deterministic, 0.0-0.99 for AI inference |
flags.is_subscription | Whether this is a subscription payment |
flags.is_recurring | Whether this charge recurs on a schedule |
flags.is_duplicate_candidate | True if same date + amount + description appears more than once |
Error Handling
| Status | Error | Solution |
|---|---|---|
| 400 | No file provided | Include a file in the multipart request |
| 401 | Authentication required | Pass X-API-Key header with a valid key |
| 402 | Insufficient balance | Top up your account at cparse.com/dashboard |
| 413 | File too large / too many pages | Reduce file size or split the document |
| 415 | Unsupported file type | Use PDF, DOCX, JPEG, PNG, ZIP, or 7z |
| 500 | Internal error | Retry or contact support |
Tips
📌 Best Practices
- Text-based PDFs work best. For scanned statements, ensure the PDF has selectable text, or pre-process with an OCR tool before uploading.
- Check the reconciliation status. A
"verified"status means the statement math adds up. A"discrepancy"may indicate missing transactions or fees not shown as line items. - Use deterministic classifications with confidence. Transactions classified with
method: deterministicandconfidence_score: 1.0are matched via regex against known global brands. - Review duplicate candidates. The
is_duplicate_candidateflag highlights transactions with identical date, amount, and description within the same document.
🔒 Security & Privacy
- ✅ Statements are processed in memory and never stored
- ✅ All requests over HTTPS
- ✅ API key authentication required on every request
- ✅ No transaction data is logged or retained
Code Examples
Python
import requests
url = "https://api.cparse.com/statement/v1/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}
with open("statement.pdf", "rb") as f:
response = requests.post(url, files={"file": f}, headers=headers)
data = response.json()
print(f"Institution: {data['data']['document_metadata']['institution']}")
print(f"Reconciliation: {data['data']['summary']['math_reconciliation_status']}")
for tx in data["data"]["transactions"]:
print(f"{tx['date']} {tx['amount']:>10.2f} {tx['description_normalised']}")JavaScript (Node.js)
import FormData from 'form-data';
import fs from 'fs';
import axios from 'axios';
const form = new FormData();
form.append('file', fs.createReadStream('statement.pdf'));
const response = await axios.post(
'https://api.cparse.com/statement/v1/parse',
form,
{
headers: {
...form.getHeaders(),
'X-API-Key': 'YOUR_API_KEY',
},
}
);
const { document_metadata, summary, transactions } = response.data.data;
console.log(`${transactions.length} transactions extracted`);
console.log(`Reconciliation: ${summary.math_reconciliation_status}`);