All articles
bank-statementtransactionsapiautomationfintech

How to Parse Bank Statements to JSON with an API

Arthur Sterling

Arthur Sterling

Lead Developer Advocate, Parse

How to Parse Bank Statements to JSON with an API

Bank statement PDFs are one of the most common documents in fintech, accounting, and personal finance, yet they come in hundreds of formats with no consistent structure. Parsing them manually or building format-specific scrapers is expensive, fragile, and hard to maintain.

The Parse Statement Parser API handles any bank or credit card statement and returns every transaction as clean, categorized JSON. It also reconciles the statement math automatically, so you know immediately whether your extraction is complete.

What You Get Back

A single API call returns:

Document metadata: Institution name, account last four digits, currency, statement period start and end dates.

Financial summary: Opening balance, closing balance, total debits, total credits, transaction count, and a reconciliation status that verifies whether opening + credits - debits = closing.

Transactions: Every line item with:

  • Date and signed amount (negative for debits, positive for credits)
  • Raw description and a cleaned, human-readable version
  • Identified merchant name
  • Plaid-standard category and detailed category (e.g. food_and_drink_groceries)
  • Classification method: deterministic for known global brands, ai_inference for everything else
  • Flags: is_subscription, is_recurring, is_duplicate_candidate

Getting Started

Sign up at cparse.com/dashboard and copy your API key. New accounts include free credit with no credit card required.

The base URL is https://api.cparse.com/statement/v1.

Python Example

import requests

url = "https://api.cparse.com/statement/v1/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}

with open("statement.pdf", "rb") as f:
    response = requests.post(url, files={"file": f}, headers=headers)

data = response.json()["data"]
meta = data["document_metadata"]
summary = data["summary"]

print(f"Institution: {meta['institution']}")
print(f"Period: {meta['statement_period_start']} to {meta['statement_period_end']}")
print(f"Reconciliation: {summary['math_reconciliation_status']}")

for tx in data["transactions"]:
    flag = " [SUBSCRIPTION]" if tx["flags"]["is_subscription"] else ""
    print(f"{tx['date']}  {tx['amount']:>10.2f}  {tx['description_normalised']}{flag}")

JavaScript Example

import FormData from 'form-data';
import fs from 'fs';
import axios from 'axios';

const form = new FormData();
form.append('file', fs.createReadStream('statement.pdf'));

const response = await axios.post('https://api.cparse.com/statement/v1/parse', form, {
  headers: {
    ...form.getHeaders(),
    'X-API-Key': 'YOUR_API_KEY',
  },
});

const { document_metadata, summary, transactions } = response.data.data;

console.log(`${transactions.length} transactions extracted`);
console.log(`Reconciliation: ${summary.math_reconciliation_status}`);

const subscriptions = transactions.filter((tx) => tx.flags.is_subscription);
console.log(`Subscriptions found: ${subscriptions.length}`);

Understanding the Reconciliation Check

The math_reconciliation_status field is one of:

  • "verified" — opening balance + credits - debits equals the closing balance exactly
  • "discrepancy" — the math does not add up, which may indicate missing transactions or rounding differences
  • "not_checked" — opening or closing balance was not found in the document

A discrepancy does not always mean an error, but it is a useful signal to trigger a manual review.

Category Classification

Transactions are classified using the Plaid taxonomy. For well-known global brands (Amazon, Netflix, Uber, Spotify, and thousands more), the classification is deterministic and always consistent. For everything else, the AI assigns a category and returns a confidence_score between 0 and 1.

You can filter for high-confidence categories in your application logic:

reliable = [
    tx for tx in transactions
    if tx["classification"]["confidence_score"] >= 0.85
]

Subscription and Recurring Detection

The is_subscription and is_recurring flags are set based on known subscription brands and pattern matching within the statement period. This lets you build subscription trackers, budget alerts, and cashflow forecasting features without writing any classification logic yourself.

Supported Formats

The API accepts PDF, DOCX, JPEG, PNG, ZIP, and 7z files. This covers the full range of statement exports from online banking portals, scanned paper statements, and email attachments.

Use Cases

  • Accounting automation: Extract and categorize transactions for bookkeeping without manual entry
  • Personal finance apps: Show users a spending breakdown the moment they upload a statement
  • Mortgage and credit applications: Pull income, recurring payments, and balances for underwriting
  • Expense categorization: Classify business expenses directly from bank data
  • Real estate management: Reconcile rental income and expenses from monthly statements

Next Steps

See the full field reference and response schema in the Statement Parser documentation. Get your API key from cparse.com/dashboard.


Arthur Sterling is the Lead Developer Advocate at Parse.