Skip to content

Architecture

Overview

StorePredict is a full-Python web application for Dell pre-sales engineers. It implements a 5-stage pipeline that ingests VMware workload exports, lets engineers filter by datacenter/cluster scope, classifies virtual machines into workload categories, predicts Data Reduction Ratios (DRR) for Dell PowerStore arrays, and produces datastore layout recommendations using three placement strategies.

The result is a one-page PDF sizing report that pre-sales engineers can present to customers with defensible capacity numbers and optimal datastore layout proposals.

Pipeline Architecture

flowchart TD
    Upload["User uploads .xlsx / .csv"] --> Detect["Format Detection"]

    Detect --> RV["parse_rvtools"]
    Detect --> LOX["parse_liveoptics_xlsx"]
    Detect --> LOC["parse_liveoptics_csv"]

    RV --> Norm["Normalized DataFrame<br/>(9 canonical columns)"]
    LOX --> Norm
    LOC --> Norm

    Norm --> Scope["/scope page<br/>DC / Cluster filter"]
    Scope --> Class["Classification Engine<br/>RuleRegistry — 50 rules"]
    Class --> Categories["Workload Categories"]

    Categories --> DRR["DRR Lookup<br/>(DRR.csv — 42 entries, incl. encrypted variants)"]
    DRR --> Review["Review Page<br/>AG Grid + WorkloadDialog"]
    Review --> Model["Storage Model Selector<br/>PowerStore / PowerFlex / PowerVault"]
    Model --> Overrides["User Overrides"]

    Overrides --> Calc["Calculation Engine"]
    Calc --> Summary["SizingSummary"]

    Summary --> Layout["Layout Engine<br/>3 strategies (BFD / tier BFD / LPT)"]
    Layout --> Proposals["LayoutProposal[]<br/>(consolidation, performance, uniform)"]
    Proposals --> PDF["PDF Report<br/>(ReportLab)"]

Data Flow

flowchart LR
    A[".xlsx / .csv<br/>Upload"] --> B["DataFrame<br/>9 canonical columns"]
    B --> SC["Scope filter<br/>(DC / Cluster)"]
    SC --> C["Classified<br/>DataFrame"]
    C --> D["SizingSummary<br/>per-VM + grouped"]
    D --> E1["LayoutProposal[]<br/>3 strategies"]
    E1 --> E["PDF Report"]

The canonical columns after ingestion are:

Column Description
vm_name Virtual machine name
os Guest OS as reported by VMware Tools
provisioned_mib Total provisioned disk (MiB)
in_use_mib Actual disk usage (MiB)
cpu_count Number of vCPUs
memory_mib Allocated RAM (MiB)
power_state Power state (on/off)
is_template Whether the VM is a template
source_format Origin format (rvtools / liveoptics)

Key Components

Parsers

  • pipeline/parsers/rvtools.py -- Parses RVTools .xlsx exports (vInfo tab).
  • pipeline/parsers/liveoptics.py -- Parses LiveOptics .xlsx and .csv exports (VMs tab).
  • pipeline/parsers/columns.py -- Column alias resolution via dict lookup for format normalization.

Classification

  • pipeline/classification.py -- Rule-based classification engine with 50 priority-ordered rules. Each rule matches patterns in VM name and OS fields to assign workload categories (e.g., SQL, Oracle, VDI, SAP). Windows Desktop OS VMs (Win 10/11/7) fall back to VDI Linked Clone rather than the generic Virtual Machines bucket (ADR-065).

DRR Table

  • services/drr_table.py -- Loads the reference DRR table from src/store_predict/data/DRR.csv (semicolon-delimited, 42 entries). Maps workload categories and subcategories to reduction ratios, including application-level encryption/compression variants (Oracle TDE, SQL Server Page Compression, DDVE, etc.).

Storage Model

  • config.pyStorageModel enum — Three target platforms with different data-reduction capabilities:
Platform Dedup Compression DRR source
PowerStore Per workload from DRR.csv
PowerFlex Flat 2.0
PowerVault Flat 1.0
  • services/drr_table.pyapply_storage_model() -- Overwrites per-VM DRR values in session based on the selected platform. Called on every review page load and on toggle change.
  • ui/state.pyget/set_storage_model() -- Persists the selection in app.storage.tab["storage_model"].

Calculation

  • services/calculation.py -- Computes per-VM required capacity as Provisioned / DRR. For multi-workload VMs, uses the lowest (most conservative) DRR. Weighted average DRR = total_provisioned / total_required.

Layout Engine

  • pipeline/layout_models.py — Frozen dataclasses for layout domain: PlacementConstraints (4 TiB DS, 25 VMs/DS, 100K IOPS/DS defaults), DatastoreRecommendation (immutable DS snapshot with assigned VMs), LayoutMetrics (15-field aggregate metrics), LayoutProposal (strategy name + datastores + metrics). Also provides DEFAULT_IOPS_BY_WORKLOAD loaded from src/store_predict/data/IOPS.csv.

  • pipeline/layout_engine.py — Three layout strategies producing datastore placement recommendations:

  • Consolidation: Multi-dimensional BFD bin-packing minimizing datastore count
  • Performance: Phase 0 mission-critical isolation (SAP HANA, Exchange, >2 TiB, >5000 IOPS) + three-tier (HOT/WARM/COLD) independent BFD
  • Uniform: LPT (Longest Processing Time) across pre-computed equal-sized bins
  • generate_all_proposals() — Public entry point returning all 3 strategies

PDF Report

  • services/pdf_report.py -- Generates a branded one-page PDF using ReportLab with Vera/VeraBd fonts for French character support.

Session Persistence

  • pipeline/session_archive.py — Self-contained session archive module. save_session_zip() serialises the full app.storage.tab state plus the original uploaded file into a .zip archive. restore_session_zip() reads the archive and returns a flat dict ready to write back to app.storage.tab. is_session_zip() detects StorePredict archives via the session.json sentinel without parsing JSON (see ADR-066, ADR-067).

Concerns Export

  • services/concerns_export.py — Pure-service module (zero UI imports) for standalone concerns exports. generate_concerns_pdf() produces an A4 ReportLab PDF with severity-coloured tables and remediation hints. generate_concerns_csv() produces a UTF-8-BOM CSV with one row per finding. Both functions accept a HealthCheckResult and return raw bytes (see ADR-069).

Scope Filtering

  • ui/pages/scope_page.py -- /scope page rendered between upload and review. Reads datacenter and cluster columns from the canonical DataFrame and presents multi-select pickers. Persists selection via save_scope_selection().

Session State

  • ui/state.py -- Tab-scoped session storage via app.storage.tab. Each browser tab maintains independent pipeline state. Key scope helpers:
  • save_scope_selection(datacenters, clusters) — persist selected sets
  • get_scope_selection() — retrieve (set[str], set[str])
  • load_filtered_session_data() — return DataFrame filtered to selected scope
  • save_filtered_rows(row_data) — merge AG Grid edits back into the full dataset

Session Model

flowchart TD
    subgraph Browser
        Tab1["Tab 1"]
        Tab2["Tab 2"]
    end

    subgraph Server["NiceGUI Server"]
        S1["app.storage.tab<br/>(Tab 1 state)"]
        S2["app.storage.tab<br/>(Tab 2 state)"]
        U["app.storage.user<br/>(dark mode pref)"]
    end

    Tab1 --> S1
    Tab2 --> S2
    Tab1 --> U
    Tab2 --> U
  • Tab-scoped (app.storage.tab): uploaded file, DataFrame, classification results, SizingSummary, selected storage model, AI toggle state, scope selection (selected datacenters and clusters), layout constraints. The full tab state can be serialised to a portable .zip archive via pipeline/session_archive.py and restored on a subsequent upload.
  • User-scoped (app.storage.user): dark mode preference (persists across pages and tabs).

Technology Stack

Layer Technology
Web framework NiceGUI
Styling Tailwind CSS
Data grid AG Grid (Community)
Data processing pandas, openpyxl
PDF generation ReportLab
Testing pytest
Linting ruff, mypy
Documentation MkDocs Material
Deployment Docker Compose