Architecture Documentation¶

Overview¶

CAMT-CSV follows a clean, layered architecture built on dependency injection principles. The system transforms various financial statement formats into standardized CSV files with intelligent categorization.

High-Level Architecture¶

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   CLI Layer     │    │  Configuration  │    │   Logging       │
│   (cmd/)        │    │   (config/)     │    │  (logging/)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Dependency Container                         │
│                    (container/)                                 │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
│  │   Logger    │ │   Config    │ │    Store    │ │AIClient   │ │
│  └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│    Parsers      │    │  Categorizer    │    │     Store       │
│   (parsers/)    │    │ (categorizer/)  │    │   (store/)      │
│  ┌───────────┐  │    │  ┌───────────┐  │    │                 │
│  │BaseParser │  │    │  │Strategies │  │    │                 │
│  └───────────┘  │    │  └───────────┘  │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Core Models                                │
│                     (models/)                                   │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
│  │Transaction  │ │   Builder   │ │  Constants  │ │  Errors   │ │
│  │   Types     │ │   Pattern   │ │             │ │           │ │
│  └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────────┘

Core Principles¶

1. Dependency Injection¶

All components receive their dependencies through constructors, eliminating global state and improving testability.

Container Pattern:

      } //

href="#__codelineno-1-1">type Container struct { Logger logging.Logger Config *config.Config Store *store.CategoryStore AIClient categorizer.AIClient Categorizer *categorizer.Categorizer Parsers map[parser.ParserType]parser.FullParser span> NewContainer creates and wires all dependencies >func NewContainer(cfg *config.Config) (*Container, error) { logger := logging.NewLogrusAdapter(cfg.Log.Level, cfg.Log.Format) store := store.NewCategoryStore(cfg.Categories.File, cfg.Categories.CreditorsFile, cfg.Categories.DebtorsFile) var aiClient categorizer.AIClient if cfg.AI.Enabled { aiClient = categorizer.NewGeminiClient(cfg.AI.APIKey, logger, cfg.AI.RequestsPerMinute) } cat := categorizer.NewCategorizer(aiClient, store, logger, cfg.Categorization.AutoLearn) // Wire staging store when AI is on but auto-learn is off if cfg.AI.Enabled && !cfg.Categorization.AutoLearn && cfg.Staging.Enabled { stagingStore := store.NewStagingStore(cfg.Staging.CreditorsFile, cfg.Staging.DebtorsFile) cat.SetStagingStore(stagingStore) } parsers := make(map[parser.ParserType]parser.FullParser) parsers[parser.CAMT] = camtparser.NewParser(logger) parsers[parser.PDF] = pdfparser.NewParser(logger) // ... other parsers return &Container{ Logger: logger, Config: cfg, Store: store, AIClient: aiClient, Categorizer: cat, Parsers: parsers, }, nil }

Benefits: - Explicit dependencies with no global state - Easy testing with mock dependencies - Runtime configuration flexibility - Centralized dependency management - Proper resource lifecycle management

2. Interface Segregation¶

Parsers implement segregated interfaces based on their capabilities:

type Parser interface {
    Parse(ctx context.Context, r io.Reader) ([]models.Transaction, error)
}

type Validator interface {
    ValidateFormat(filePath string) (bool, error)
}

type CSVConverter interface {
    ConvertToCSV(ctx context.Context, inputFile, outputFile string) error
}

type LoggerConfigurable interface {
    SetLogger(logger logging.Logger)
}

type CategorizerConfigurable interface {
    SetCategorizer(categorizer models.TransactionCategorizer)
}

type BatchConverter interface {
    BatchConvert(ctx context.Context, inputDir, outputDir string) (int, error)
}

type FullParser interface {
    Parser
    Validator
    CSVConverter
    LoggerConfigurable
    CategorizerConfigurable
    BatchConverter
}

Benefits: - Clients depend only on needed interfaces - Easy to implement new parsers - Clear separation of concerns - Flexible composition

3. BaseParser Foundation¶

All parsers embed BaseParser to eliminate code duplication:

type BaseParser struct {
    logger logging.Logger
}

func (b *BaseParser) SetLogger(logger logging.Logger) {
    b.logger = logger
}

func (b *BaseParser) WriteToCSV(transactions []models.Transaction, csvFile string) error {
    return common.WriteTransactionsToCSV(transactions, csvFile)
}

Benefits: - Consistent behavior across parsers - Shared functionality (logging, CSV writing) - Reduced code duplication - Easier maintenance

Component Architecture¶

Parser Layer¶

Structure:

internal/
├── parser/
│   ├── parser.go          # Interface definitions
│   ├── base.go           # BaseParser implementation
│   └── constitution.go   # Constitution loading
├── camtparser/           # CAMT.053 XML parser
├── pdfparser/           # PDF statement parser
├── revolutparser/       # Revolut CSV parser
├── revolutinvestmentparser/ # Revolut investment parser
├── selmaparser/         # Selma investment parser
└── debitparser/         # Generic debit CSV parser

Parser Implementation Pattern:

type MyParser struct {
    parser.BaseParser
    // parser-specific fields
}

func NewMyParser(logger logging.Logger) *MyParser {
    return &MyParser{
        BaseParser: parser.NewBaseParser(logger),
    }
}

func (p *MyParser) Parse(ctx context.Context, r io.Reader) ([]models.Transaction, error) {
    p.GetLogger().Info("Starting parse operation")
    // implementation
}

Categorization Layer¶

Strategy Pattern Implementation:

type CategorizationStrategy interface {
    Categorize(ctx context.Context, tx Transaction) (Category, bool, error)
    Name() string
}

type Categorizer struct {
    strategies []CategorizationStrategy
    store      *store.CategoryStore
    logger     logging.Logger
    mu         sync.RWMutex
}

func NewCategorizer(aiClient AIClient, store CategoryStoreInterface, logger logging.Logger, autoLearnEnabled bool) *Categorizer {
    c := &Categorizer{
        store:               store,
        logger:              logger,
        isAutoLearnEnabled:  autoLearnEnabled,
    }

    // Initialize strategies in priority order
    c.strategies = []CategorizationStrategy{
        NewDirectMappingStrategy(store, logger),
        NewKeywordStrategy(store, logger),
        NewSemanticStrategy(aiClient, logger, c.categories),
        NewAIStrategy(aiClient, logger),
    }

    return c
}

func (c *Categorizer) Categorize(ctx context.Context, tx Transaction) (Category, error) {
    for _, strategy := range c.strategies {
        category, found, err := strategy.Categorize(ctx, tx)
        if err != nil {
            c.logger.Warn("Strategy failed", 
                logging.Field{Key: "strategy", Value: strategy.Name()},
                logging.Field{Key: "error", Value: err})
            continue
        }
        if found {
            c.logger.Debug("Transaction categorized",
                logging.Field{Key: "strategy", Value: strategy.Name()},
                logging.Field{Key: "category", Value: category.Name})
            return category, nil
        }
    }

    return UncategorizedCategory, nil
}

Four-Tier Strategy Approach: 1. DirectMappingStrategy: Exact name matches from creditors.yaml/debtors.yaml (fastest) 2. KeywordStrategy: Pattern matching from categories.yaml (local processing) 3. SemanticStrategy: Vector-based embedding similarity matching transactions to category concepts (local AI) 4. AIStrategy: Gemini API fallback with rate limiting (optional, controlled by autoLearnEnabled)

AI Result Persistence: - Auto-learn ON: AI results saved directly to creditors.yaml/debtors.yaml (with backups) - Auto-learn OFF (default): AI results saved to staging files (staging_creditors.yaml/staging_debtors.yaml) for manual review via StagingStore - Staging is optional and controlled by staging.enabled config

Data Layer¶

Transaction Model Decomposition:

// Money represents a monetary value with currency
type Money struct {
    Amount   decimal.Decimal
    Currency string
}

// Party represents a transaction party (payer or payee)
type Party struct {
    Name string
    IBAN string
}

// TransactionCore contains essential transaction data
type TransactionCore struct {
    ID            string
    Date          time.Time
    ValueDate     time.Time
    Amount        Money
    Description   string
    Status        string
    Reference     string
}

// TransactionWithParties adds party information
type TransactionWithParties struct {
    TransactionCore
    Payer         Party
    Payee         Party
    Direction     TransactionDirection // DEBIT or CREDIT
}

// CategorizedTransaction adds categorization data
type CategorizedTransaction struct {
    TransactionWithParties
    Category      string
    Type          string
    Fund          string
}

// Transaction maintains backward compatibility
type Transaction struct {
    CategorizedTransaction

    // Additional fields for specific formats
    BookkeepingNumber string
    RemittanceInfo    string
    PartyIBAN         string
    Investment        string
    NumberOfShares    int
    Fees              Money
    EntryReference    string
    AccountServicer   string
    BankTxCode        string
    OriginalAmount    Money
    ExchangeRate      decimal.Decimal

    // Tax-related fields
    AmountExclTax Money
    AmountTax     Money
    TaxRate       decimal.Decimal
}

Builder Pattern with Validation:

tx, err := NewTransactionBuilder().
    WithDate("2025-01-15").
    WithAmount(decimal.NewFromFloat(100.50), "CHF").
    WithPayer("John Doe", "CH1234567890").
    WithPayee("Acme Corp", "CH0987654321").
    AsDebit().
    Build()

if err != nil {
    return fmt.Errorf("transaction construction failed: %w", err)
}

Backward Compatibility Methods:

// Legacy accessor methods for backward compatibility
func (t *Transaction) GetPayee() string {
    return t.Payee.Name
}

func (t *Transaction) GetPayer() string {
    return t.Payer.Name
}

// Deprecated: Use Amount.Amount.Float64() instead
func (t *Transaction) GetAmountAsFloat() float64 {
    f, _ := t.Amount.Amount.Float64()
    return f
}

Error Handling Architecture¶

Custom Error Types¶

Comprehensive Error Hierarchy:

// Base parsing error with context
type ParseError struct {
    Parser string
    Field  string
    Value  string
    Err    error
}

func (e *ParseError) Error() string {
    return fmt.Sprintf("%s: failed to parse %s='%s': %v", 
        e.Parser, e.Field, e.Value, e.Err)
}

func (e *ParseError) Unwrap() error {
    return e.Err
}

// Format validation failures
type ValidationError struct {
    FilePath string
    Field    string
    Reason   string
}

func (e *ValidationError) Error() string {
    if e.Field != "" {
        return fmt.Sprintf("validation failed for %s field %s: %s", e.FilePath, e.Field, e.Reason)
    }
    return fmt.Sprintf("validation failed for %s: %s", e.FilePath, e.Reason)
}

// Invalid format detection
type InvalidFormatError struct {
    FilePath     string
    ExpectedType string
    Reason       string
}

func (e *InvalidFormatError) Error() string {
    return fmt.Sprintf("invalid %s format in %s: %s", e.ExpectedType, e.FilePath, e.Reason)
}

// Data extraction failures
type DataExtractionError struct {
    FilePath string
    Field    string
    RawData  string
    Reason   string
}

func (e *DataExtractionError) Error() string {
    return fmt.Sprintf("failed to extract %s from %s: %s (raw data: %s)", 
        e.Field, e.FilePath, e.Reason, e.RawData)
}

// Categorization failures
type CategorizationError struct {
    Transaction string
    Strategy    string
    Err         error
}

func (e *CategorizationError) Error() string {
    return fmt.Sprintf("categorization failed for %s using %s: %v",
        e.Transaction, e.Strategy, e.Err)
}

func (e *CategorizationError) Unwrap() error {
    return e.Err
}

Error Handling Patterns¶

Pattern 1: Unrecoverable Errors (Return)

if err := xml.Unmarshal(data, &document); err != nil {
    return nil, &parsererror.ParseError{
        Parser: "CAMT",
        Field:  "document",
        Err:    err,
    }
}

Pattern 2: Recoverable Errors (Log and Continue)

amount, err := decimal.NewFromString(entry.Amt.Value)
if err != nil {
    p.logger.Warn("Failed to parse amount, using zero",
        logging.Field{Key: "value", Value: entry.Amt.Value},
        logging.Field{Key: "error", Value: err})
    amount = decimal.Zero
}

Logging Architecture¶

Framework-Agnostic Design¶

Logger Interface:

type Logger interface {
    Debug(msg string, fields ...Field)
    Info(msg string, fields ...Field)
    Warn(msg string, fields ...Field)
    Error(msg string, fields ...Field)
    WithError(err error) Logger
    WithField(key string, value interface{}) Logger
    WithFields(fields ...Field) Logger
}

Structured Logging:

logger.Info("Processing transaction",
    logging.Field{Key: "file", Value: filename},
    logging.Field{Key: "count", Value: len(transactions)})

Dependency Injection: - All components receive logger through constructor - BaseParser provides logger to all parsers - Mock loggers for testing

Configuration Architecture¶

Hierarchical Configuration¶

Priority Order (highest to lowest): 1. CLI flags (--log-level debug) 2. Environment variables (CAMT_LOG_LEVEL=debug) 3. Config file (~/.camt-csv/config.yaml) 4. Default values

Configuration Structure:

log:
  level: "info"
  format: "text"
csv:
  delimiter: ","
  include_headers: true
ai:
  enabled: true
  model: "gemini-2.0-flash"

Testing Architecture¶

Dependency Injection for Testing¶

Mock Implementations:

type MockLogger struct {
    Entries []LogEntry
}

type MockAIClient struct {
    CategorizeFunc func(context.Context, models.Transaction) (models.Transaction, error)
}

type MockCategoryStore struct {
    Categories       []models.CategoryConfig
    CreditorMappings map[string]string
    DebtorMappings   map[string]string
}

Test Structure:

func TestCategorizer_Categorize(t *testing.T) {
    // Setup
    mockStore := &MockCategoryStore{...}
    mockLogger := &MockLogger{}

    cat := NewCategorizer(mockStore, nil, mockLogger)

    // Execute & Assert
    result, err := cat.Categorize(context.Background(), transaction)
    assert.NoError(t, err)
    assert.Equal(t, expectedCategory, result.Name)
}

Performance Architecture¶

Optimization Strategies¶

String Operations with Builder Pattern:

// Before: Multiple string operations creating temporary strings
func (c *Categorizer) categorizeByMapping(tx Transaction) (Category, bool) {
    partyNameLower := strings.ToLower(tx.PartyName)
    normalized := strings.ReplaceAll(partyNameLower, " ", "")
    normalized = strings.ReplaceAll(normalized, "-", "")
    // Each operation allocates new strings
}

// After: Single-pass normalization with pre-allocated builder
func (c *Categorizer) categorizeByMapping(tx Transaction) (Category, bool) {
    var builder strings.Builder
    builder.Grow(len(tx.PartyName)) // Avoid reallocations

    for _, r := range strings.ToLower(tx.PartyName) {
        if r != ' ' && r != '-' {
            builder.WriteRune(r)
        }
    }
    normalized := builder.String()
    // 60-80% reduction in string allocations
}

Lazy Initialization with Thread Safety:

type Categorizer struct {
    aiClient     AIClient
    aiClientOnce sync.Once
    aiFactory    func() AIClient
    logger       logging.Logger
}

func (c *Categorizer) getAIClient() AIClient {
    c.aiClientOnce.Do(func() {
        if c.aiClient == nil && c.aiFactory != nil {
            c.aiClient = c.aiFactory()
            c.logger.Debug("AI client initialized lazily")
        }
    })
    return c.aiClient
}

Pre-allocation and Capacity Management:

// Pre-allocate slices with known capacity
transactions := make([]models.Transaction, 0, len(entries))

// Pre-allocate maps with size hints to reduce rehashing
mappings := make(map[string]string, len(items))

// For large datasets, consider batch processing
const batchSize = 1000
if len(entries) > batchSize {
    for i := 0; i < len(entries); i += batchSize {
        end := i + batchSize
        if end > len(entries) {
            end = len(entries)
        }
        batch := entries[i:end]
        processBatch(batch)
    }
}

Performance Benefits: - Eliminates slice reallocations during growth - Reduces map rehashing operations
- Controls memory usage for large datasets - Improves cache locality through better memory layout

Security Architecture¶

Input Validation¶

All file paths validated for directory traversal
XML/CSV content validated before processing
Amount values validated for reasonable ranges
Date formats validated before parsing

Error Message Sanitization¶

No sensitive data in error messages
File paths relativized in logs
API keys never logged
Transaction details redacted in non-debug logs

File Permissions¶

Config files: 0600 (owner read/write only)
Directories: 0750 (owner full, group read/execute)
Output files: 0644 (owner read/write, others read)

Migration and Compatibility¶

Backward Compatibility Strategy¶

Deprecated Code Marking:

// Deprecated: Use NewCategorizer with dependency injection instead.
// This function will be removed in v2.0.0.
func GetDefaultCategorizer() *Categorizer {
    // Provide backward compatible implementation
}

Adapter Pattern:

type LegacyTransactionAdapter struct {
    tx Transaction
}

func (a *LegacyTransactionAdapter) GetAmountAsFloat() float64 {
    f, _ := a.tx.Amount.Amount.Float64()
    return f
}

Migration Path¶

Phase 1: Introduce new interfaces alongside existing code
Phase 2: Add deprecation warnings to old APIs
Phase 3: Migrate internal usage to new patterns
Phase 4: Remove deprecated code in major version bump

Formatter Architecture¶

Output Formatter System¶

Formatter Interface:

type OutputFormatter interface {
    Header() []string
    Format(transactions []models.Transaction) ([][]string, error)
    Delimiter() rune
}

FormatterRegistry Pattern:

type FormatterRegistry struct {
    formatters map[string]OutputFormatter
    mu         sync.RWMutex
}

func (r *FormatterRegistry) Register(name string, formatter OutputFormatter) {
    r.mu.Lock()
    defer r.mu.Unlock()
    r.formatters[name] = formatter
}

func (r *FormatterRegistry) Get(name string) (OutputFormatter, error) {
    r.mu.RLock()
    defer r.mu.RUnlock()
    formatter, ok := r.formatters[name]
    if !ok {
        return nil, fmt.Errorf("formatter not found: %s", name)
    }
    return formatter, nil
}

Built-in Formatters: - StandardFormatter: 29-column CSV with comma delimiter (default) - iComptaFormatter: 10-column CSV with semicolon delimiter and dd.MM.yyyy date format

Usage:

registry := container.GetFormatterRegistry()
formatter, err := registry.Get("icompta")
if err != nil {
    return err
}

rows, err := formatter.Format(transactions)
csvFile.SetDelimiter(formatter.Delimiter())

Benefits: - Cross-parser output formatting - Easy addition of new export formats - Consistent formatting across all parsers - User-selectable output format via --format flag

Extension Points¶

Adding New Parsers¶

Create package: internal/<format>parser/
Embed BaseParser: parser.BaseParser
Implement interfaces: parser.Parser (minimum) with context.Context parameter
Use dependency injection: Accept logger in constructor
Follow error handling patterns: Use custom error types
Add comprehensive tests: Mock dependencies

Adding New Categorization Strategies¶

Implement CategorizationStrategy interface
Add to strategy list in Categorizer constructor
Ensure proper priority ordering
Add comprehensive tests with mock dependencies

Adding New Configuration Options¶

Add to config struct in internal/config/
Update Viper configuration loading
Add environment variable mapping
Update CLI flags if needed
Document in user guide

This architecture provides a solid foundation for maintainable, testable, and extensible financial data processing while ensuring reliability and performance.