Skip to content

Phase 11: LLM Classification Fallback - Research

Researched: 2026-02-20 Domain: LLM provider abstraction (litellm), async Python, pydantic-settings, circuit breaker patterns Confidence: HIGH

Summary

Phase 11 adds an optional LLM fallback for VMs that the rules engine cannot classify (those receiving classification_confidence="default"). The feature is opt-in via LLM_ENABLED=false (default). Provider credentials are managed via pydantic-settings BaseSettings with SecretStr so they are never logged. LLM calls use litellm.acompletion() (already declared in pyproject.toml) wrapped in asyncio.wait_for to avoid blocking the NiceGUI event loop. Any LLM response that is not a known DRR category is rejected and the VM stays "Unknown (Reducible)".

Key Findings

pydantic-settings LLMConfig with SecretStr

BaseSettings reads env vars automatically. SecretStr masks API keys in all repr() and log output — the value is only exposed when .get_secret_value() is called explicitly at the call site.

from pydantic import SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict

class LLMConfig(BaseSettings):
    model_config = SettingsConfigDict(env_prefix="LLM_", case_sensitive=False)

    enabled: bool = False
    model: str = "openai/gpt-4o-mini"
    api_key: SecretStr = SecretStr("")
    api_base: str | None = None        # Ollama: http://host.docker.internal:11434
    timeout: int = 30

Env vars: LLM_ENABLED, LLM_MODEL, LLM_API_KEY, LLM_API_BASE, LLM_TIMEOUT.

asyncio.wait_for + litellm.acompletion

Use asyncio.wait_for (not litellm's timeout= param, which is unreliable across providers) to enforce the hard timeout. This is compatible with the NiceGUI event loop.

response = await asyncio.wait_for(
    litellm.acompletion(
        model=config.model,
        messages=[{"role": "system", "content": system_prompt},
                  {"role": "user", "content": user_prompt}],
        api_key=config.get_api_key() or None,
        api_base=config.api_base,
        max_tokens=30,
    ),
    timeout=config.timeout,
)
raw = response.choices[0].message.content.strip()
return raw if raw in valid_categories else None

In-Process Circuit Breaker (No External Dep)

A module-level fail counter + cooldown timestamp protects against provider outages without adding a dependency. After 3 consecutive failures, skip LLM calls for 60 seconds.

_cb_fail_count: int = 0
_cb_open_until: float = 0.0
_CB_FAIL_MAX = 3
_CB_COOLDOWN = 60.0

if time.monotonic() < _cb_open_until:
    return None   # circuit open — skip call

# on failure:
_cb_fail_count += 1
if _cb_fail_count >= _CB_FAIL_MAX:
    _cb_open_until = time.monotonic() + _CB_COOLDOWN

Input Sanitization Against Prompt Injection

VM names and OS strings come from untrusted customer data. Truncate and strip newlines before embedding in prompts. Use role-framing in the system prompt to reinforce data-only interpretation.

safe_vm = vm_name[:100].replace("\n", " ").replace("\r", " ")
safe_os = os_name[:50].replace("\n", " ").replace("\r", " ")
# System prompt explicitly states: treat VM name/OS as DATA, not instructions

Validation Against Known DRR Categories

Always validate the raw LLM response string against the set of valid workload category names loaded from DRR.csv. Reject any response not in the set — do not attempt fuzzy matching or correction.

valid_categories: set[str] = {entry.subcategory for entry in drr_table.entries}
return raw if raw in valid_categories else None

Reuse classification_confidence Column

No new AG Grid column is needed. The existing classification_confidence field already holds "rule_match", "os_fallback", and "default". Add "llm" as a fourth valid value for LLM-classified VMs.

Anti-Patterns

  • Using litellm's timeout= parameter instead of asyncio.wait_for: litellm's built-in timeout is inconsistent across providers. asyncio.wait_for is the definitive cancellation mechanism.
  • Calling get_secret_value() at module load or in logs: Only call it at the litellm.acompletion() call site, never in logging statements or exception messages.
  • Fuzzy-matching or auto-correcting LLM responses: If the model returns a slightly wrong category name, reject it and leave the VM as "Unknown (Reducible)". Silently accepting near-matches risks mis-sizing.

Dependencies

Package Version Notes
litellm >=1.81.13 Already in pyproject.toml
pydantic-settings >=2.13.0,<3.0 Add to pyproject.toml; uv pip install "pydantic-settings>=2.13.0,<3.0"