LLM Fallback Configuration

Optionally enable AI-powered classification for ambiguous files.

What Is LLM Fallback?

LLM (Large Language Model) Fallback is the last classification signal (Signal 5).

If no other signals match confidently, para-files can ask an AI model to classify the file.

Signal 1: Validated DB → No match
Signal 2: Rules → No match
Signal 3: Domain KB → No match
Signal 4: Semantic Router → No match
Signal 5: LLM Fallback ← Asks AI model

Disabled by Default

config:
  llm:
    enabled: false  # Default

LLM fallback is optional and disabled by default because it requires:

  • A running LLM server (e.g., Ollama)
  • Extra processing time
  • Internet connectivity (for online models)

Enable LLM Fallback

Step 1: Start LLM Server

Using Ollama (recommended):

# Install Ollama from ollama.ai
# Then start the server:
ollama run qwen2.5:1.5b

# Leaves Ollama running at http://localhost:11434

Step 2: Configure para-files

# Via environment variables:
export PARA_FILES_LLM_ENABLED=true
export PARA_FILES_LLM_API_BASE=http://localhost:11434

# Or in .env file:
# PARA_FILES_LLM_ENABLED=true
# PARA_FILES_LLM_API_BASE=http://localhost:11434

# Or in YAML config:
config:
  llm:
    enabled: true
    api_base: "http://localhost:11434"

Step 3: Verify

uv run para-files classify ambiguous_file.pdf

# Should use LLM if other signals don't match

LLM Settings

LLM_ENABLED

Enable/disable LLM fallback:

export PARA_FILES_LLM_ENABLED=true

Default: false

LLM_MODEL

Which model to use:

export PARA_FILES_LLM_MODEL=ollama/qwen2.5:1.5b

Default: ollama/qwen2.5:1.5b

Other options:

  • ollama/mistral - Faster, less accurate
  • ollama/neural-chat - Good balance
  • openai/gpt-4 - Online (requires API key)
  • ollama/llama2 - Larger, slower

LLM_API_BASE

Where the LLM server is running:

export PARA_FILES_LLM_API_BASE=http://localhost:11434

Default: null (disabled)

Examples:

  • http://localhost:11434 - Local Ollama
  • http://192.168.1.100:11434 - Remote machine
  • https://api.openai.com - OpenAI (requires API key)

LLM_CONFIDENCE_THRESHOLD

Minimum confidence for LLM classifications:

export PARA_FILES_LLM_CONFIDENCE_THRESHOLD=0.6

Default: 0.6 (60%)

Lower values = more matches, higher false positive rate.

Full YAML Example

config:
  llm:
    enabled: true
    model: "ollama/qwen2.5:1.5b"
    api_base: "http://localhost:11434"
    confidence_threshold: 0.6

Performance Impact

Without LLM: Fast (uses embeddings)

  • Embedding matching: 10-15ms

With LLM: Slower (only for unmatched files)

  • Local model (Ollama): +500ms-2s per file
  • Online model (OpenAI): +1-3s per file

LLM only runs when other signals don’t match, so impact varies.

When to Use LLM Fallback

Good for:

  • Ambiguous documents that don’t fit patterns
  • Learning what documents are about
  • Complex classification rules

Not needed when:

  • You have good utterances and issuers
  • You’re willing to manually fix misclassifications
  • Speed is critical

Troubleshooting

“Connection refused” error?

# Make sure Ollama is running:
ollama run qwen2.5:1.5b

# Check API base URL:
export PARA_FILES_LLM_API_BASE=http://localhost:11434

LLM not being used?

# Verify it's enabled:
uv run para-files config --show

# Check PARA_FILES_LLM_ENABLED=true

LLM too slow?

# Use smaller model:
export PARA_FILES_LLM_MODEL=ollama/mistral

# Or disable and improve utterances instead
export PARA_FILES_LLM_ENABLED=false