How to Parse Bank Statements to JSON with an API
Arthur Sterling
Lead Developer Advocate, Parse
How to Parse Bank Statements to JSON with an API
Bank statement PDFs are one of the most common documents in fintech, accounting, and personal finance, yet they come in hundreds of formats with no consistent structure. Parsing them manually or building format-specific scrapers is expensive, fragile, and hard to maintain.
The Parse Statement Parser API handles any bank or credit card statement and returns every transaction as clean, categorized JSON. It also reconciles the statement math automatically, so you know immediately whether your extraction is complete.
What You Get Back
A single API call returns:
Document metadata: Institution name, account last four digits, currency, statement period start and end dates.
Financial summary: Opening balance, closing balance, total debits, total credits, transaction count, and a reconciliation status that verifies whether opening + credits - debits = closing.
Transactions: Every line item with:
- •Date and signed amount (negative for debits, positive for credits)
- •Raw description and a cleaned, human-readable version
- •Identified merchant name
- •Plaid-standard category and detailed category (e.g.
food_and_drink_groceries) - •Classification method:
deterministicfor known global brands,ai_inferencefor everything else - •Flags:
is_subscription,is_recurring,is_duplicate_candidate
Getting Started
Sign up at cparse.com/dashboard and copy your API key. New accounts include free credit with no credit card required.
The base URL is https://api.cparse.com/statement/v1.
Python Example
import requests
url = "https://api.cparse.com/statement/v1/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}
with open("statement.pdf", "rb") as f:
response = requests.post(url, files={"file": f}, headers=headers)
data = response.json()["data"]
meta = data["document_metadata"]
summary = data["summary"]
print(f"Institution: {meta['institution']}")
print(f"Period: {meta['statement_period_start']} to {meta['statement_period_end']}")
print(f"Reconciliation: {summary['math_reconciliation_status']}")
for tx in data["transactions"]:
flag = " [SUBSCRIPTION]" if tx["flags"]["is_subscription"] else ""
print(f"{tx['date']} {tx['amount']:>10.2f} {tx['description_normalised']}{flag}")
JavaScript Example
import FormData from 'form-data';
import fs from 'fs';
import axios from 'axios';
const form = new FormData();
form.append('file', fs.createReadStream('statement.pdf'));
const response = await axios.post('https://api.cparse.com/statement/v1/parse', form, {
headers: {
...form.getHeaders(),
'X-API-Key': 'YOUR_API_KEY',
},
});
const { document_metadata, summary, transactions } = response.data.data;
console.log(`${transactions.length} transactions extracted`);
console.log(`Reconciliation: ${summary.math_reconciliation_status}`);
const subscriptions = transactions.filter((tx) => tx.flags.is_subscription);
console.log(`Subscriptions found: ${subscriptions.length}`);
Understanding the Reconciliation Check
The math_reconciliation_status field is one of:
- •
"verified"— opening balance + credits - debits equals the closing balance exactly - •
"discrepancy"— the math does not add up, which may indicate missing transactions or rounding differences - •
"not_checked"— opening or closing balance was not found in the document
A discrepancy does not always mean an error, but it is a useful signal to trigger a manual review.
Category Classification
Transactions are classified using the Plaid taxonomy. For well-known global brands (Amazon, Netflix, Uber, Spotify, and thousands more), the classification is deterministic and always consistent. For everything else, the AI assigns a category and returns a confidence_score between 0 and 1.
You can filter for high-confidence categories in your application logic:
reliable = [
tx for tx in transactions
if tx["classification"]["confidence_score"] >= 0.85
]
Subscription and Recurring Detection
The is_subscription and is_recurring flags are set based on known subscription brands and pattern matching within the statement period. This lets you build subscription trackers, budget alerts, and cashflow forecasting features without writing any classification logic yourself.
Supported Formats
The API accepts PDF, DOCX, JPEG, PNG, ZIP, and 7z files. This covers the full range of statement exports from online banking portals, scanned paper statements, and email attachments.
Use Cases
- •Accounting automation: Extract and categorize transactions for bookkeeping without manual entry
- •Personal finance apps: Show users a spending breakdown the moment they upload a statement
- •Mortgage and credit applications: Pull income, recurring payments, and balances for underwriting
- •Expense categorization: Classify business expenses directly from bank data
- •Real estate management: Reconcile rental income and expenses from monthly statements
Next Steps
See the full field reference and response schema in the Statement Parser documentation. Get your API key from cparse.com/dashboard.
Arthur Sterling is the Lead Developer Advocate at Parse.