ADR-001: Lightroom Classic Catalog Automation Approach
Status: Accepted Date: 2026-02-17 Decision Makers: fjacquet
Context
Photos in a Lightroom Classic catalog are organized in YYYY/MM folders on disk. Two problems exist:
- Misplaced photos: Some photos were filed by file modification date instead of EXIF
DateTimeOriginal, placing them in the wrongYYYY/MMfolder. - Duplicate date prefixes in filenames: Files like
29122012-29122012-IMG_20121229_131334have the date prefix duplicated and need cleaning to29122012-IMG_131334.
All operations must maintain Lightroom catalog link integrity -- files cannot be moved/renamed outside of Lightroom's awareness.
Decision
Build a Python CLI tool that directly manipulates the .lrcat SQLite catalog file and moves/renames files on disk, with Lightroom closed.
Why Not Lightroom Classic Lua SDK Plugin?
The Lua SDK was evaluated and rejected because:
LrPhotohas nomoveToFolder()method -- the SDK cannot move photos between folders programmaticallyLrFileUtilscan rename/move files on disk, but Lightroom will then think the photo is missing, with no way to reconnect- The SDK can only read metadata and create collections, not perform the file operations we need
Why Not Lightroom CC (Cloud) MCP/API?
Existing MCP servers (Zapier, Pipedream) target Lightroom CC (cloud version), not Lightroom Classic (desktop). They operate through Adobe's cloud API and cannot interact with local .lrcat catalogs.
Why Direct SQLite Manipulation?
The .lrcat catalog file is a SQLite database with well-documented tables:
AgLibraryRootFolder- absolute root pathsAgLibraryFolder- folder paths relative to rootAgLibraryFile- filenames with folder FKAdobe_images- core image data includingcaptureTime
Moving a photo requires only: UPDATE AgLibraryFile SET folder = :new_folder_id WHERE id_local = :file_id
This is the approach used by community tools like LightroomClassicCatalogReader and documented by camerahacks/lightroom-database.
Architecture
Project Structure
lrc-automation/
├── pyproject.toml
├── .env # LRC_CATALOG_PATH, LRC_BACKUP_DIR
├── docs/adr/
├── src/lrc_automation/
│ ├── __init__.py
│ ├── cli.py # Click CLI: scan, plan, apply, validate, restore
│ ├── catalog.py # CatalogConnection: open, backup, lock check
│ ├── models.py # PhotoRecord, Folder, FileChange, ChangePlan
│ ├── scanner.py # Scan for misplaced photos + duplicate filenames
│ ├── planner.py # Build ChangePlan (folder creation, collision handling)
│ ├── executor.py # Apply: move files on disk + update SQLite
│ ├── reporter.py # Rich terminal output, CSV/JSON export
│ ├── validators.py # Pre-flight and post-flight integrity checks
│ ├── utils.py # Date parsing, path helpers
│ └── constants.py # Regex patterns, table names
└── tests/
Key SQLite Operations
Move photo to different folder:
Rename file:
Create folder:
Path reconstruction:
full_path = AgLibraryRootFolder.absolutePath + AgLibraryFolder.pathFromRoot + AgLibraryFile.baseName + "." + extension
Rename Logic
Duplicate date prefix removal:
29122012-29122012-IMG_20121229_131334 -> 29122012-IMG_131334
- Detect duplicated date prefix:
(\d{8})-\1-(.+)-> keep one prefix - Strip redundant date from
IMG_YYYYMMDD_NNNNNN->IMG_NNNNNN - Validate the embedded DDMMYYYY is a real date (rejects bogus
01011904-prefixes from 1904-epoch videos)
Prefix format normalisation (DDMMYYYY → YYMMDD):
Camera utilities sometimes prepend dates in European day-first order (DDMMYYYY), while the rest of the archive uses YYMMDD. The scanner detects this pattern and proposes a normalised name:
02052002-volcan -> 020502-volcan
- Pattern:
^(\d{2})(\d{2})(\d{4})-(.+)$— day, month, 4-digit year, hyphen, rest - Conversion:
YYMMDD-restwhereYY= last two digits of year - Files with GPS coordinates also get a
Country/City/target-folder proposal (location is in the folder path, not the filename) - GPS-tagged proposals are sorted first in the scan report for immediate actionability
Pre-import Filename Visibility
AgLibraryFile.originalFilename stores the filename as it existed before the Lightroom import. This column is now included in all photo queries and exposed in the PhotoRecord model, allowing the duplicate-prefix report to show the original camera filename alongside the current catalog name.
Safety Measures
- Lightroom must be closed - enforced by checking
.lrcat-lockfile + process check - Mandatory backup of
.lrcatbefore any write - Single SQLite transaction (
BEGIN IMMEDIATE ... COMMIT) for all SQL changes - Disk rollback stack - every file move records an undo operation, reversed on error
- SQLite
integrity_checkbefore and after modifications - Post-flight validation - verify all files exist at new paths, no broken FKs
CLI Commands
lrc-auto scan -c ~/Pictures/MyCatalog.lrcat # Read-only scan
lrc-auto plan -c ~/Pictures/MyCatalog.lrcat -o plan.json # Generate plan
lrc-auto apply -c ~/Pictures/MyCatalog.lrcat # Apply changes
lrc-auto validate -c ~/Pictures/MyCatalog.lrcat # Integrity check
lrc-auto restore -c ~/Pictures/MyCatalog.lrcat --backup-path ... # Restore
Dependencies
click- CLI frameworkrich- Terminal tables and progress barspython-dotenv-.envconfigurationsqlite3(stdlib) - Catalog database accessshutil(stdlib) - File move operationspytest- Testing
Optional [geo] extra:
reverse-geocoder- Offline K-D tree over GeoNames (~50 MB); converts GPS coordinates to country code + city name without any network callpycountry- Maps ISO 3166-1 alpha-2 codes (e.g.FR) to full country names (e.g.France) for human-readable folder paths
Why pycountry over embedding a lookup dict?
reverse-geocoder returns only the two-letter ISO code. Rolling a hand-maintained mapping would drift over time (country name changes, new territories). pycountry bundles the authoritative ISO 3166 data and is updated with each release. It adds ~4 MB and zero runtime network calls.
Edge Cases
| Case | Handling |
|---|---|
No captureTime (NULL) |
Skip with warning |
| Target folder not in catalog | Create AgLibraryFolder row + mkdir |
| Filename collision in target | Append _1, _2, etc. |
| RAW+JPG sidecar pairs | Move together (parse sidecarExtensions) |
| XMP sidecars | Move .xmp alongside main file |
| Virtual copies | Skip (share master's physical file) |
| Cross-filesystem moves | Copy+delete instead of rename |
Implementation Phases
| Phase | Deliverable |
|---|---|
| 1 | Foundation: models.py, constants.py, utils.py, catalog.py |
| 2 | Scanner: scanner.py + reporter.py + cli.py scan |
| 3 | Planner: planner.py + cli.py plan with export |
| 4 | Executor: validators.py + executor.py + cli.py apply/validate/restore |
| 5 | Tests: unit + integration tests |
| 6 | Optional: MCP server wrapper |
Real Catalog Analysis (2026-02-17, updated 2026-02-28)
Scanned against a production catalog (schema v1400000, LR Classic v14):
- 92,717 photos across 4,180 folders and 42 root folders
- 1,175 misplaced photos detected (capture date != folder date)
- 20,178 photos (22%) carry GPS coordinates (
AgHarvestedExifMetadata.gpsLatitude IS NOT NULL) - 21,531 files have DDMMYYYY-format name prefixes requiring normalisation to YYMMDD
GPS Detection Fix
The original implementation gated GPS data on AgHarvestedExifMetadata.hasGPS = 1. Field inspection of a production catalog revealed that hasGPS is NULL for the vast majority of photos, even when gpsLatitude/gpsLongitude are populated. This appears to be a Lightroom version-specific quirk (LR v14).
Decision: Remove hasGPS from all queries. Gate GPS processing exclusively on gpsLatitude IS NOT NULL. This correctly detects 20,178 GPS-tagged photos versus 0 with the old approach.
Folder Structure (Not What We Expected)
The initial design assumed YYYY/MM/ subfolders. The real catalog uses:
| Pattern | Count | Example |
|---|---|---|
YYYY-MM-DD in pathFromRoot |
1,663 | 2023-12-24/ |
| French dates in pathFromRoot | 978 | 1 avril 2016/ |
| Year only | 54 | 2017/ |
| Other (topical) | 1,443 | Italy/Corentin/ |
| Empty pathFromRoot | 42 | (files directly in root) |
The date is often encoded in the root folder path itself (e.g. /Volumes/photo/2016/) and the subfolder carries the day (e.g. 2016-10-30/). This means the scanner must parse dates from the full path (root + pathFromRoot), not just pathFromRoot.
Misplacement Patterns
- Many photos show
captureTime = 2023-10-29T18:07:xxwhile sitting in folders from 2016-2018. This indicates the file modification date was imported as capture time. - 4 videos have bogus
1904-01-01capture times (should be skipped). - Duplicate filenames like
IMG_0005.JPGappear up to 65 times across different folders (normal camera numbering resets, not a problem).
Required Scanner Updates
- Extract dates from the full path (root + pathFromRoot), not just pathFromRoot
- Support
YYYY-MM-DDdate folders (the dominant pattern) - Support French date folders (
D mois YYYY) - Filter out bogus dates (1904, far-future)
- Consider that topical folders (no date) should be skipped, not flagged
Consequences
Positive:
- Full automation of photo organization fixes
- Safe with mandatory backup and rollback
- Read-only scan mode for review before changes
- No dependency on Lightroom's limited SDK
Negative:
- Requires Lightroom to be closed during apply
- Risk of catalog corruption if SQLite schema changes between LR versions (mitigated by schema version check)
- Not officially supported by Adobe
Risks:
- Adobe could change the SQLite schema in future LR updates -> check
Adobe_variablesTablefor version - Partial disk failure during move -> rollback stack + backup restoration
References
- Lightroom Classic SDK - evaluated and rejected for this use case
- camerahacks/lightroom-database - unofficial schema documentation
- LightroomClassicCatalogReader - Python catalog reader
- Lightroom Queen Forums - Database schema
- Seachess - Dive into Lightroom catalogues
- Jeffrey Friedl - Accessing LR SQLite directly