Confidence Scores Too Low
Files match but confidence is below threshold (default 0.75).
What This Means
Files are being classified but with <75% confidence.
Example:
Category: 4_Archives/factures/utilities
Confidence: 65% (Semantic Router)
Since 65% < 75%, it’s rejected and goes to Inbox.
Solutions
1. Add Better Utterances
# For the correct route, add more specific phrases
uv run para-files add-utterance factures-utilities "monthly power bill"
uv run para-files add-utterance factures-utilities "electricity consumption statement"
# Re-test
uv run para-files classify problem-file.pdf
2. Register the Issuer
# If from known company
uv run para-files add-issuer "Energy Company" -c utilities
# Now matches with 90% confidence automatically
uv run para-files classify invoice.pdf
3. Lower Threshold Slightly
# If utterances/issuers don't help
export PARA_FILES_MLX_SCORE_THRESHOLD=0.70
# Test
uv run para-files classify problem-file.pdf
Note: Better to improve utterances than lower threshold.
Debugging
See why confidence is low:
# Test against the route
uv run para-files test-route expected-route --file problem-file.pdf -v
# Shows similarity calculation details
# Check utterances
uv run para-files routes --utterances
Best Approach
Order by effectiveness:
- Register issuer (if known company) → 90% confidence
- Add utterances (specific phrases) → Usually 80%+ confidence
- Lower threshold (only if needed) → Risk of errors
- Use learning (interactive) → Build training data
Prevention
- Add 5-10 utterances per route
- Register all known senders
- Test before moving many files
Related
- Adjust Confidence - Threshold details
- Improve Matching - Add utterances
- Manage Issuers - Register companies