Skip to content

Three-Pass Processing

For IMAP providers, MailTag processes emails in three passes for performance optimization. Each pass progressively uses more expensive operations.

Overview

INBOX (N emails)
    |
    v
[Pass 1: Fast Parse] -- headers only --> classify via Signals 1-3
    |                                     (batch_size: 500)
    | remaining UIDs + cached headers
    v
[Pass 2: Domain]     -- headers only --> classify via domain rules
    |                                     (groups by commercial domain)
    | remaining UIDs
    v
[Pass 3: AI]         -- full body   --> classify via Signals 5-6
                                         (batch embeddings + LLM)

Pass 1: Fast Parse

Processes emails using only headers (sender, subject). Uses validated and historical databases for instant classification.

  • Processes emails in configurable batches (default: 500)
  • Also runs on the Junk folder before INBOX
  • Returns unclassified UIDs and cached headers to Pass 2

Performance: Classifies known senders in microseconds. Typically handles 60-80% of emails.

Pass 2: Domain Classification

Groups remaining emails by commercial domain and applies domain-based rules in bulk.

  • Skips non-commercial domains (gmail.com, yahoo.com, etc.)
  • Reuses headers from Pass 1 (no duplicate IMAP fetch)
  • Generates data/pass3_manual_matching_*.json for review

Performance: One database lookup per domain, not per email. Handles 10-20% of remaining emails.

Pass 3: AI Classification

Fetches full email bodies and uses AI classification for remaining emails.

  • Batch embedding computation via route_batch() for Signal 5
  • Sequential LLM fallback for Signal 6
  • Batch IMAP moves accumulated by category

Performance: ~1-2 seconds per LLM call. Typically 5-15% of total emails reach this pass.

Gmail Processing

Gmail providers use single-pass processing with the full AMSC strategy applied per-email, since Gmail API doesn't support the same batch header operations as IMAP.