Skip to content

ADR-001: Lightroom Classic Catalog Automation Approach

Status: Accepted Date: 2026-02-17 Decision Makers: fjacquet

Context

Photos in a Lightroom Classic catalog are organized in YYYY/MM folders on disk. Two problems exist:

  1. Misplaced photos: Some photos were filed by file modification date instead of EXIF DateTimeOriginal, placing them in the wrong YYYY/MM folder.
  2. Duplicate date prefixes in filenames: Files like 29122012-29122012-IMG_20121229_131334 have the date prefix duplicated and need cleaning to 29122012-IMG_131334.

All operations must maintain Lightroom catalog link integrity -- files cannot be moved/renamed outside of Lightroom's awareness.

Decision

Build a Python CLI tool that directly manipulates the .lrcat SQLite catalog file and moves/renames files on disk, with Lightroom closed.

Why Not Lightroom Classic Lua SDK Plugin?

The Lua SDK was evaluated and rejected because:

  • LrPhoto has no moveToFolder() method -- the SDK cannot move photos between folders programmatically
  • LrFileUtils can rename/move files on disk, but Lightroom will then think the photo is missing, with no way to reconnect
  • The SDK can only read metadata and create collections, not perform the file operations we need

Why Not Lightroom CC (Cloud) MCP/API?

Existing MCP servers (Zapier, Pipedream) target Lightroom CC (cloud version), not Lightroom Classic (desktop). They operate through Adobe's cloud API and cannot interact with local .lrcat catalogs.

Why Direct SQLite Manipulation?

The .lrcat catalog file is a SQLite database with well-documented tables:

  • AgLibraryRootFolder - absolute root paths
  • AgLibraryFolder - folder paths relative to root
  • AgLibraryFile - filenames with folder FK
  • Adobe_images - core image data including captureTime

Moving a photo requires only: UPDATE AgLibraryFile SET folder = :new_folder_id WHERE id_local = :file_id

This is the approach used by community tools like LightroomClassicCatalogReader and documented by camerahacks/lightroom-database.

Architecture

Project Structure

lrc-automation/
├── pyproject.toml
├── .env                 # LRC_CATALOG_PATH, LRC_BACKUP_DIR
├── docs/adr/
├── src/lrc_automation/
│   ├── __init__.py
│   ├── cli.py           # Click CLI: scan, plan, apply, validate, restore
│   ├── catalog.py       # CatalogConnection: open, backup, lock check
│   ├── models.py        # PhotoRecord, Folder, FileChange, ChangePlan
│   ├── scanner.py       # Scan for misplaced photos + duplicate filenames
│   ├── planner.py       # Build ChangePlan (folder creation, collision handling)
│   ├── executor.py      # Apply: move files on disk + update SQLite
│   ├── reporter.py      # Rich terminal output, CSV/JSON export
│   ├── validators.py    # Pre-flight and post-flight integrity checks
│   ├── utils.py         # Date parsing, path helpers
│   └── constants.py     # Regex patterns, table names
└── tests/

Key SQLite Operations

Move photo to different folder:

UPDATE AgLibraryFile SET folder = :new_folder_id WHERE id_local = :file_id;

Rename file:

UPDATE AgLibraryFile SET baseName = :new_name, idx_filename = :new_idx WHERE id_local = :file_id;

Create folder:

INSERT INTO AgLibraryFolder (id_local, id_global, pathFromRoot, rootFolder) VALUES (...);

Path reconstruction:

full_path = AgLibraryRootFolder.absolutePath + AgLibraryFolder.pathFromRoot + AgLibraryFile.baseName + "." + extension

Rename Logic

Duplicate date prefix removal:

29122012-29122012-IMG_20121229_131334 -> 29122012-IMG_131334

  1. Detect duplicated date prefix: (\d{8})-\1-(.+) -> keep one prefix
  2. Strip redundant date from IMG_YYYYMMDD_NNNNNN -> IMG_NNNNNN
  3. Validate the embedded DDMMYYYY is a real date (rejects bogus 01011904- prefixes from 1904-epoch videos)

Prefix format normalisation (DDMMYYYY → YYMMDD):

Camera utilities sometimes prepend dates in European day-first order (DDMMYYYY), while the rest of the archive uses YYMMDD. The scanner detects this pattern and proposes a normalised name:

02052002-volcan -> 020502-volcan

  • Pattern: ^(\d{2})(\d{2})(\d{4})-(.+)$ — day, month, 4-digit year, hyphen, rest
  • Conversion: YYMMDD-rest where YY = last two digits of year
  • Files with GPS coordinates also get a Country/City/ target-folder proposal (location is in the folder path, not the filename)
  • GPS-tagged proposals are sorted first in the scan report for immediate actionability

Pre-import Filename Visibility

AgLibraryFile.originalFilename stores the filename as it existed before the Lightroom import. This column is now included in all photo queries and exposed in the PhotoRecord model, allowing the duplicate-prefix report to show the original camera filename alongside the current catalog name.

Safety Measures

  1. Lightroom must be closed - enforced by checking .lrcat-lock file + process check
  2. Mandatory backup of .lrcat before any write
  3. Single SQLite transaction (BEGIN IMMEDIATE ... COMMIT) for all SQL changes
  4. Disk rollback stack - every file move records an undo operation, reversed on error
  5. SQLite integrity_check before and after modifications
  6. Post-flight validation - verify all files exist at new paths, no broken FKs

CLI Commands

lrc-auto scan -c ~/Pictures/MyCatalog.lrcat        # Read-only scan
lrc-auto plan -c ~/Pictures/MyCatalog.lrcat -o plan.json  # Generate plan
lrc-auto apply -c ~/Pictures/MyCatalog.lrcat        # Apply changes
lrc-auto validate -c ~/Pictures/MyCatalog.lrcat     # Integrity check
lrc-auto restore -c ~/Pictures/MyCatalog.lrcat --backup-path ...  # Restore

Dependencies

  • click - CLI framework
  • rich - Terminal tables and progress bars
  • python-dotenv - .env configuration
  • sqlite3 (stdlib) - Catalog database access
  • shutil (stdlib) - File move operations
  • pytest - Testing

Optional [geo] extra:

  • reverse-geocoder - Offline K-D tree over GeoNames (~50 MB); converts GPS coordinates to country code + city name without any network call
  • pycountry - Maps ISO 3166-1 alpha-2 codes (e.g. FR) to full country names (e.g. France) for human-readable folder paths

Why pycountry over embedding a lookup dict?

reverse-geocoder returns only the two-letter ISO code. Rolling a hand-maintained mapping would drift over time (country name changes, new territories). pycountry bundles the authoritative ISO 3166 data and is updated with each release. It adds ~4 MB and zero runtime network calls.

Edge Cases

Case Handling
No captureTime (NULL) Skip with warning
Target folder not in catalog Create AgLibraryFolder row + mkdir
Filename collision in target Append _1, _2, etc.
RAW+JPG sidecar pairs Move together (parse sidecarExtensions)
XMP sidecars Move .xmp alongside main file
Virtual copies Skip (share master's physical file)
Cross-filesystem moves Copy+delete instead of rename

Implementation Phases

Phase Deliverable
1 Foundation: models.py, constants.py, utils.py, catalog.py
2 Scanner: scanner.py + reporter.py + cli.py scan
3 Planner: planner.py + cli.py plan with export
4 Executor: validators.py + executor.py + cli.py apply/validate/restore
5 Tests: unit + integration tests
6 Optional: MCP server wrapper

Real Catalog Analysis (2026-02-17, updated 2026-02-28)

Scanned against a production catalog (schema v1400000, LR Classic v14):

  • 92,717 photos across 4,180 folders and 42 root folders
  • 1,175 misplaced photos detected (capture date != folder date)
  • 20,178 photos (22%) carry GPS coordinates (AgHarvestedExifMetadata.gpsLatitude IS NOT NULL)
  • 21,531 files have DDMMYYYY-format name prefixes requiring normalisation to YYMMDD

GPS Detection Fix

The original implementation gated GPS data on AgHarvestedExifMetadata.hasGPS = 1. Field inspection of a production catalog revealed that hasGPS is NULL for the vast majority of photos, even when gpsLatitude/gpsLongitude are populated. This appears to be a Lightroom version-specific quirk (LR v14).

Decision: Remove hasGPS from all queries. Gate GPS processing exclusively on gpsLatitude IS NOT NULL. This correctly detects 20,178 GPS-tagged photos versus 0 with the old approach.

Folder Structure (Not What We Expected)

The initial design assumed YYYY/MM/ subfolders. The real catalog uses:

Pattern Count Example
YYYY-MM-DD in pathFromRoot 1,663 2023-12-24/
French dates in pathFromRoot 978 1 avril 2016/
Year only 54 2017/
Other (topical) 1,443 Italy/Corentin/
Empty pathFromRoot 42 (files directly in root)

The date is often encoded in the root folder path itself (e.g. /Volumes/photo/2016/) and the subfolder carries the day (e.g. 2016-10-30/). This means the scanner must parse dates from the full path (root + pathFromRoot), not just pathFromRoot.

Misplacement Patterns

  • Many photos show captureTime = 2023-10-29T18:07:xx while sitting in folders from 2016-2018. This indicates the file modification date was imported as capture time.
  • 4 videos have bogus 1904-01-01 capture times (should be skipped).
  • Duplicate filenames like IMG_0005.JPG appear up to 65 times across different folders (normal camera numbering resets, not a problem).

Required Scanner Updates

  1. Extract dates from the full path (root + pathFromRoot), not just pathFromRoot
  2. Support YYYY-MM-DD date folders (the dominant pattern)
  3. Support French date folders (D mois YYYY)
  4. Filter out bogus dates (1904, far-future)
  5. Consider that topical folders (no date) should be skipped, not flagged

Consequences

Positive:

  • Full automation of photo organization fixes
  • Safe with mandatory backup and rollback
  • Read-only scan mode for review before changes
  • No dependency on Lightroom's limited SDK

Negative:

  • Requires Lightroom to be closed during apply
  • Risk of catalog corruption if SQLite schema changes between LR versions (mitigated by schema version check)
  • Not officially supported by Adobe

Risks:

  • Adobe could change the SQLite schema in future LR updates -> check Adobe_variablesTable for version
  • Partial disk failure during move -> rollback stack + backup restoration

References