ADR-001: Parser Interface Standardization¶

Status¶

Accepted

Context¶

The CAMT-CSV project supports multiple financial file formats (CAMT.053 XML, PDF statements, Revolut CSV, Selma CSV). Initially, each parser had different method signatures and behaviors, making it difficult to:

Add new parsers consistently
Test parsers uniformly
Maintain code quality across parsers
Provide consistent user experience

Decision¶

We will standardize all parsers using a segregated interface architecture with a common foundation:

Core Interfaces:

type Parser interface {
    Parse(r io.Reader) ([]models.Transaction, error)
}

type Validator interface {
    ValidateFormat(r io.Reader) error
}

type CSVConverter interface {
    ConvertToCSV(transactions []models.Transaction, csvFile string) error
}

type LoggerConfigurable interface {
    SetLogger(logger logging.Logger)
}

type CategorizerConfigurable interface {
    SetCategorizer(categorizer models.TransactionCategorizer)
}

type BatchConverter interface {
    BatchConvert(inputDir, outputDir string) (int, error)
}

type FullParser interface {
    Parser
    Validator
    CSVConverter
    LoggerConfigurable
    CategorizerConfigurable
    BatchConverter
}

BaseParser Foundation: All parsers embed a BaseParser struct that provides common functionality: - Logger management (implements LoggerConfigurable) - Categorizer management (implements CategorizerConfigurable) - CSV writing capability (implements CSVConverter) - Consistent initialization patterns

Implementation Pattern:

type MyParser struct {
    parser.BaseParser
    // parser-specific fields
}

func NewMyParser(logger logging.Logger) *MyParser {
    return &MyParser{
        BaseParser: parser.NewBaseParser(logger),
    }
}

Consequences¶

Positive¶

Consistency: All parsers behave identically from the caller's perspective
Code Reuse: BaseParser eliminates duplication of common functionality (logging, CSV writing)
Testability: Uniform test patterns across all parsers with shared test utilities
Maintainability: Easier to add new parsers following established patterns
Polymorphism: Can treat all parsers uniformly in the CLI layer
Segregated Interfaces: Parsers can implement only the interfaces they need
Documentation: Clear interface contracts to document and understand

Negative¶

Refactoring Cost: Required significant changes to existing parsers
Interface Constraints: Some parser-specific optimizations may be limited
Backward Compatibility: Breaking changes to internal APIs

Mitigation Strategies¶

Implemented changes incrementally, one parser at a time
Maintained comprehensive test coverage during refactoring
Used adapter pattern where necessary to fit existing code into new interface

Implementation Notes¶

All parsers embed BaseParser for common functionality
Segregated interfaces allow parsers to implement only needed capabilities
Error handling uses custom error types (InvalidFormatError, DataExtractionError)
Logging is standardized using abstraction layer with structured logging
CSV output format is unified across all parsers via shared WriteToCSV method
Dependency injection pattern used for testability (e.g., PDF extractor interface)

ADR-002: Hybrid categorization approach
ADR-003: Functional programming adoption

Date¶

2024-12-19

Authors¶

Development Team