Semantic Search
Powerful natural language search for financial data.
Semantic Search
Open Ledger’s semantic search functionality allows you to search your financial data using natural language queries. This makes it easier to find specific transactions, identify patterns, and gain insights without needing to know exact dates, amounts, or categories.
How It Works
Our semantic search utilizes OpenAI’s embedding models to convert your natural language query into a vector that can be compared with the vectors of your financial data. This enables searching based on meaning rather than just keywords.
API Endpoint
The semantic search endpoint is available at:
Request Parameters
Example Request
Response
The response includes both enriched results from the vector index and legacy results from direct transaction and account searches:
Example Queries
Here are some examples of effective semantic search queries:
Transaction Queries
- “AWS subscription payments over $100”
- “All office supply purchases from Staples”
- “Marketing expenses in Q2”
- “Rent payments for the past 3 months”
- “Transactions with missing receipts”
Account Queries
- “Advertising expense accounts”
- “Assets with decreasing balances”
- “Travel-related expense categories”
- “Software subscription expenses”
Time-Based Queries
- “Transactions from last week”
- “Q1 capital expenditures”
- “Monthly recurring payments”
- “Year-end closing entries”
Implementation Details
The semantic search functionality:
- Transforms your query into a vector embedding using OpenAI’s text-embedding-3-small model
- Searches your vectorized financial data (transactions, accounts, reports) using PostgreSQL’s vector similarity operators
- Enriches results with additional data based on the source type
- Returns both newer vector-based results and legacy results for backward compatibility
Vector Search Capabilities
Our implementation uses PostgreSQL’s vector similarity search with the following features:
- 384-dimensional text embeddings for precise semantic matching
- Fast vector comparisons using the cosine similarity operator (
<=>
) - Combined filtering by source type, document type, and time ranges
- Result sorting by similarity score
Best Practices
- Be specific: Include relevant details in your query
- Use natural language: Write queries as complete thoughts rather than keywords
- Limit results: Use the
limit
parameter to control the number of results - Filter appropriately: Use source types and document types to narrow your search
- Set date ranges: Use timeStart and timeEnd for time-bounded searches
Integration Examples
Limitations
- The quality of results depends on the specificity of your query
- Results are ranked by semantic similarity, not exact matches
- Performance may be affected by the size of your financial dataset
- The API has rate limits to ensure fair usage
- Embedding generation may add slight latency to the search process
Future Enhancements
We’re continually improving our semantic search capabilities:
- Expanding search to more financial data types
- Adding conversational context for follow-up queries
- Implementing more advanced filtering options
- Providing insights and analytics based on search results
- Integrating with our reporting and visualization tools