LLM Fallback Configuration
Optionally enable AI-powered classification for ambiguous files.
What Is LLM Fallback?
LLM (Large Language Model) Fallback is the last classification signal (Signal 5).
If no other signals match confidently, para-files can ask an AI model to classify the file.
Signal 1: Validated DB → No match
Signal 2: Rules → No match
Signal 3: Domain KB → No match
Signal 4: Semantic Router → No match
Signal 5: LLM Fallback ← Asks AI model
Disabled by Default
config:
llm:
enabled: false # Default
LLM fallback is optional and disabled by default because it requires:
- A running LLM server (e.g., Ollama)
- Extra processing time
- Internet connectivity (for online models)
Enable LLM Fallback
Step 1: Start LLM Server
Using Ollama (recommended):
# Install Ollama from ollama.ai
# Then start the server:
ollama run qwen2.5:1.5b
# Leaves Ollama running at http://localhost:11434
Step 2: Configure para-files
# Via environment variables:
export PARA_FILES_LLM_ENABLED=true
export PARA_FILES_LLM_API_BASE=http://localhost:11434
# Or in .env file:
# PARA_FILES_LLM_ENABLED=true
# PARA_FILES_LLM_API_BASE=http://localhost:11434
# Or in YAML config:
config:
llm:
enabled: true
api_base: "http://localhost:11434"
Step 3: Verify
uv run para-files classify ambiguous_file.pdf
# Should use LLM if other signals don't match
LLM Settings
LLM_ENABLED
Enable/disable LLM fallback:
export PARA_FILES_LLM_ENABLED=true
Default: false
LLM_MODEL
Which model to use:
export PARA_FILES_LLM_MODEL=ollama/qwen2.5:1.5b
Default: ollama/qwen2.5:1.5b
Other options:
ollama/mistral- Faster, less accurateollama/neural-chat- Good balanceopenai/gpt-4- Online (requires API key)ollama/llama2- Larger, slower
LLM_API_BASE
Where the LLM server is running:
export PARA_FILES_LLM_API_BASE=http://localhost:11434
Default: null (disabled)
Examples:
http://localhost:11434- Local Ollamahttp://192.168.1.100:11434- Remote machinehttps://api.openai.com- OpenAI (requires API key)
LLM_CONFIDENCE_THRESHOLD
Minimum confidence for LLM classifications:
export PARA_FILES_LLM_CONFIDENCE_THRESHOLD=0.6
Default: 0.6 (60%)
Lower values = more matches, higher false positive rate.
Full YAML Example
config:
llm:
enabled: true
model: "ollama/qwen2.5:1.5b"
api_base: "http://localhost:11434"
confidence_threshold: 0.6
Performance Impact
Without LLM: Fast (uses embeddings)
- Embedding matching: 10-15ms
With LLM: Slower (only for unmatched files)
- Local model (Ollama): +500ms-2s per file
- Online model (OpenAI): +1-3s per file
LLM only runs when other signals don’t match, so impact varies.
When to Use LLM Fallback
Good for:
- Ambiguous documents that don’t fit patterns
- Learning what documents are about
- Complex classification rules
Not needed when:
- You have good utterances and issuers
- You’re willing to manually fix misclassifications
- Speed is critical
Troubleshooting
“Connection refused” error?
# Make sure Ollama is running:
ollama run qwen2.5:1.5b
# Check API base URL:
export PARA_FILES_LLM_API_BASE=http://localhost:11434
LLM not being used?
# Verify it's enabled:
uv run para-files config --show
# Check PARA_FILES_LLM_ENABLED=true
LLM too slow?
# Use smaller model:
export PARA_FILES_LLM_MODEL=ollama/mistral
# Or disable and improve utterances instead
export PARA_FILES_LLM_ENABLED=false
Related
- Configuration Overview - All settings
- Architecture: Signal 5 - LLM details
- Task: Enable LLM - Setup guide