Semantic Search

Powerful natural language search for financial data.

Semantic Search

Open Ledger’s semantic search functionality allows you to search your financial data using natural language queries. This makes it easier to find specific transactions, identify patterns, and gain insights without needing to know exact dates, amounts, or categories.

How It Works

Our semantic search utilizes OpenAI’s embedding models to convert your natural language query into a vector that can be compared with the vectors of your financial data. This enables searching based on meaning rather than just keywords.

API Endpoint

The semantic search endpoint is available at:

1POST /v1/ai/semantic-search
2Content-Type: application/json
3Authorization: Bearer your_token_here

Request Parameters

ParameterTypeRequiredDescription
entityIdstringYesThe ID of the entity to search within
querystringYesThe natural language query to search for
limitnumberNoMaximum number of results to return (default: 10)
sourceTypesstring[]NoTypes of sources to search (e.g., “transaction”, “ledger_account”)
documentTypesstring[]NoTypes of documents to search (e.g., “financial_report”, “general_ledger”)
timeStartISO date stringNoStart date for filtering results
timeEndISO date stringNoEnd date for filtering results

Example Request

1{
2 "entityId": "entity_123456",
3 "query": "software subscription expenses in April",
4 "limit": 10,
5 "sourceTypes": ["transaction", "ledger_account"],
6 "documentTypes": ["financial_report", "general_ledger"],
7 "timeStart": "2023-04-01T00:00:00Z",
8 "timeEnd": "2023-04-30T23:59:59Z"
9}

Response

The response includes both enriched results from the vector index and legacy results from direct transaction and account searches:

1{
2 "success": true,
3 "query": "software subscription expenses in April",
4 "entityId": "entity_123456",
5 "results": [
6 {
7 "id": "vi_12345",
8 "source_id": "tx_abcdef123456",
9 "source_type": "transaction",
10 "document_type": "general_ledger",
11 "period_start": "2023-04-01T00:00:00Z",
12 "period_end": "2023-04-30T23:59:59Z",
13 "metadata": {
14 "entityId": "entity_123456"
15 },
16 "similarity": 0.92,
17 "details": {
18 "description": "AWS Monthly Subscription",
19 "amount": 150.0,
20 "currency": "USD",
21 "timestamp": "2023-04-15T10:30:00Z",
22 "debitAccount": { "name": "Software Expenses" },
23 "creditAccount": { "name": "Checking Account" }
24 }
25 },
26 {
27 "id": "vi_67890",
28 "source_id": "la_12345",
29 "source_type": "ledger_account",
30 "document_type": null,
31 "metadata": {
32 "entityId": "entity_123456"
33 },
34 "similarity": 0.85,
35 "details": {
36 "name": "Software Subscriptions",
37 "type": "EXPENSE",
38 "code": "6010",
39 "financialType": "UTILITIES_EXPENSES"
40 }
41 }
42 ],
43 "legacyResults": [
44 {
45 "id": "tx_abcdef123456",
46 "description": "AWS Monthly Subscription",
47 "amount": 150.0,
48 "currency": "USD",
49 "timestamp": "2023-04-15T10:30:00Z",
50 "debit_account_name": "Software Expenses",
51 "credit_account_name": "Checking Account",
52 "similarity": 0.92,
53 "type": "transaction"
54 }
55 ]
56}

Example Queries

Here are some examples of effective semantic search queries:

Transaction Queries

  • “AWS subscription payments over $100”
  • “All office supply purchases from Staples”
  • “Marketing expenses in Q2”
  • “Rent payments for the past 3 months”
  • “Transactions with missing receipts”

Account Queries

  • “Advertising expense accounts”
  • “Assets with decreasing balances”
  • “Travel-related expense categories”
  • “Software subscription expenses”

Time-Based Queries

  • “Transactions from last week”
  • “Q1 capital expenditures”
  • “Monthly recurring payments”
  • “Year-end closing entries”

Implementation Details

The semantic search functionality:

  1. Transforms your query into a vector embedding using OpenAI’s text-embedding-3-small model
  2. Searches your vectorized financial data (transactions, accounts, reports) using PostgreSQL’s vector similarity operators
  3. Enriches results with additional data based on the source type
  4. Returns both newer vector-based results and legacy results for backward compatibility

Vector Search Capabilities

Our implementation uses PostgreSQL’s vector similarity search with the following features:

  • 384-dimensional text embeddings for precise semantic matching
  • Fast vector comparisons using the cosine similarity operator (<=>)
  • Combined filtering by source type, document type, and time ranges
  • Result sorting by similarity score

Best Practices

  1. Be specific: Include relevant details in your query
  2. Use natural language: Write queries as complete thoughts rather than keywords
  3. Limit results: Use the limit parameter to control the number of results
  4. Filter appropriately: Use source types and document types to narrow your search
  5. Set date ranges: Use timeStart and timeEnd for time-bounded searches

Integration Examples

1const searchFinancialData = async (query) => {
2 const response = await fetch(
3 "https://api.openledger.com/v1/ai/semantic-search",
4 {
5 method: "POST",
6 headers: {
7 "Content-Type": "application/json",
8 Authorization: `Bearer ${token}`,
9 },
10 body: JSON.stringify({
11 entityId: "entity_123456",
12 query: query,
13 limit: 10,
14 }),
15 }
16 );
17
18return await response.json();
19};
20
21// Example usage
22const results = await searchFinancialData(
23"software subscription expenses in April"
24);
25console.log(results);

Limitations

  • The quality of results depends on the specificity of your query
  • Results are ranked by semantic similarity, not exact matches
  • Performance may be affected by the size of your financial dataset
  • The API has rate limits to ensure fair usage
  • Embedding generation may add slight latency to the search process

Future Enhancements

We’re continually improving our semantic search capabilities:

  • Expanding search to more financial data types
  • Adding conversational context for follow-up queries
  • Implementing more advanced filtering options
  • Providing insights and analytics based on search results
  • Integrating with our reporting and visualization tools