Three-Pass Processing¶
For IMAP providers, MailTag processes emails in three passes for performance optimization. Each pass progressively uses more expensive operations.
Overview¶
INBOX (N emails)
|
v
[Pass 1: Fast Parse] -- headers only --> classify via Signals 1-3
| (batch_size: 500)
| remaining UIDs + cached headers
v
[Pass 2: Domain] -- headers only --> classify via domain rules
| (groups by commercial domain)
| remaining UIDs
v
[Pass 3: AI] -- full body --> classify via Signals 5-6
(batch embeddings + LLM)
Pass 1: Fast Parse¶
Processes emails using only headers (sender, subject). Uses validated and historical databases for instant classification.
- Processes emails in configurable batches (default: 500)
- Also runs on the Junk folder before INBOX
- Returns unclassified UIDs and cached headers to Pass 2
Performance: Classifies known senders in microseconds. Typically handles 60-80% of emails.
Pass 2: Domain Classification¶
Groups remaining emails by commercial domain and applies domain-based rules in bulk.
- Skips non-commercial domains (gmail.com, yahoo.com, etc.)
- Reuses headers from Pass 1 (no duplicate IMAP fetch)
- Generates
data/pass3_manual_matching_*.jsonfor review
Performance: One database lookup per domain, not per email. Handles 10-20% of remaining emails.
Pass 3: AI Classification¶
Fetches full email bodies and uses AI classification for remaining emails.
- Batch embedding computation via
route_batch()for Signal 5 - Sequential LLM fallback for Signal 6
- Batch IMAP moves accumulated by category
Performance: ~1-2 seconds per LLM call. Typically 5-15% of total emails reach this pass.
Gmail Processing¶
Gmail providers use single-pass processing with the full AMSC strategy applied per-email, since Gmail API doesn't support the same batch header operations as IMAP.