Overview
AI token usage & cost intelligence
Total Cost
—
last 30 days
Events
—
Asks
—
Input Tokens
—
Output Tokens
—
Error Rate
—
avg 0ms
Daily Cost (USD)
Top Models by Cost
Pipeline Optimization Recommendations
Ranked by estimated cost savings across your Pulse AI pipeline
interpret-narrative
claude-sonnet-4-6
92%
est. savings
Accounts for 68% of total pipeline spend. Sonnet 4.6 at $0.003/$0.015 per 1K tokens is your dominant cost driver — 3.2K tokens per invocation.
Recommended Action
Fine-tune a 7B local model (Mistral / Qwen) on gold narrative examples. Deploy via llama.cpp on local hardware. Target cost: $0/call.
vanna-sql
gpt-4o
95%
est. savings
gpt-4o at $0.005/$0.015 per 1K with 4K+ context window makes this the highest per-event cost in the heavy pipeline variant.
Recommended Action
Replace with SQLCoder-7B fine-tuned on your schema and query history. Local inference eliminates this cost entirely. Phase 1 priority.
methodology
claude-sonnet-4-6
82%
est. savings
Second-largest cost step at 2K+ input tokens. Strong candidate for immediate model downgrade before full local replacement in Phase 2.
Recommended Action
Migrate to Haiku 4.5 or gpt-4o-mini now — 83% cost reduction with no fine-tuning. Queue local replacement after narrative baseline is proven.
matcher-dispatch
gpt-4o-mini
40%
est. savings
Many queries are semantically similar. A pgvector similarity cache at 0.92 cosine threshold could intercept 40–60% of calls before they hit the API.
Recommended Action
Add semantic response caching layer using pgvector. Eliminates redundant API calls on repeat brand methodology dispatches.