Design Principles - CAMT-CSV Project¶
Overview¶
The CAMT-CSV project is built on a foundation of solid software engineering principles that prioritize maintainability, extensibility, and reliability. This document outlines the core design principles that guide the development and evolution of this financial data processing system.
Core Design Principles¶
1. Interface-Driven Design with Segregated Interfaces¶
Principle: All parsers implement segregated interfaces that separate concerns and provide only the capabilities they need.
Implementation:
- Core Interfaces:
Parser,Validator,CSVConverter,LoggerConfigurable,CategorizerConfigurable,BatchConverter,FullParser - BaseParser Foundation: All parsers embed
BaseParserstruct for common functionality - Composition over Inheritance: Parsers compose interfaces rather than inheriting from large base classes
- Single Responsibility per Interface: Each interface has one clear purpose
Example:
type MyParser struct {
parser.BaseParser // Provides logging and CSV writing
// parser-specific fields
}
Benefits:
- Easy to add new financial data formats following established patterns
- Eliminates code duplication through BaseParser
- Consistent API across all parsers with shared functionality
- Simplified testing with common test utilities
- Interface segregation principle compliance
2. Single Responsibility Principle¶
Principle: Each component has a single, well-defined responsibility.
Implementation:
- Parsers: Handle format-specific parsing logic
- Models: Define data structures and validation
- Categorizer: Handles transaction categorization logic
- Common: Provides shared utilities and CSV writing
- Store: Manages configuration and category storage
Benefits:
- Clear separation of concerns
- Easier debugging and testing
- Reduced coupling between components
3. Dependency Injection & Inversion of Control¶
Principle: Dependencies are injected rather than hard-coded, allowing for flexibility and testability.
Implementation:
- Container Pattern: Central
Containerstruct manages all application dependencies - Logger Injection: All parsers receive logger through
BaseParserconstructor - Interface Dependencies: Components depend on
logging.Loggerinterface rather than concrete implementations - PDF Extractor Injection: PDF parser uses
PDFExtractorinterface for testability - Categorizer Injection: Transaction classification through injected categorizer with strategy dependencies
- Store Injection: Configuration management through injected store
- AI Client Injection: Optional AI client for categorization with lazy initialization
- Test-specific Injection: Mock dependencies in test suites
Container Pattern Example:
type Container struct {
Logger logging.Logger
Config *config.Config
Store *store.CategoryStore
AIClient categorizer.AIClient
Categorizer *categorizer.Categorizer
Parsers map[parser.ParserType]parser.FullParser
}
func NewContainer(cfg *config.Config) (*Container, error) {
logger := logging.NewLogrusAdapter(cfg.Log.Level, cfg.Log.Format)
store := store.NewCategoryStore(cfg.Categories.File, cfg.Categories.CreditorsFile, cfg.Categories.DebtorsFile)
var aiClient categorizer.AIClient
if cfg.AI.Enabled {
aiClient = categorizer.NewGeminiClient(cfg.AI.APIKey, logger, cfg.AI.RequestsPerMinute)
}
cat := categorizer.NewCategorizer(aiClient, store, logger, cfg.AI.AutoLearnEnabled)
return &Container{
Logger: logger,
Config: cfg,
Store: store,
AIClient: aiClient,
Categorizer: cat,
}, nil
}
Parser Injection Example:
func NewMyParser(logger logging.Logger) *MyParser {
return &MyParser{
BaseParser: parser.NewBaseParser(logger),
}
}
Benefits:
- Complete elimination of global mutable state
- Improved testability with mock dependencies
- Runtime configuration flexibility
- Cleaner separation between components
- Easier unit testing without shared state
- Consistent dependency management through BaseParser
- Centralized dependency lifecycle management
- Explicit dependency relationships
4. Strategy Pattern for Extensibility¶
Principle: Use the Strategy pattern to enable pluggable algorithms and easy extension of functionality.
Implementation:
- Categorization Strategies: Multiple algorithms for transaction categorization
- Strategy Interface: Common interface for all categorization approaches
- Priority-Based Execution: Strategies executed in order of efficiency and accuracy
- Independent Testing: Each strategy can be tested in isolation
Strategy Interface:
type CategorizationStrategy interface {
Categorize(ctx context.Context, tx Transaction) (Category, bool, error)
Name() string
}
Strategy Implementations:
- DirectMappingStrategy: Exact name matches from YAML files (fastest)
- KeywordStrategy: Pattern matching from configuration (local processing)
- SemanticStrategy: Vector-based embedding similarity matching to category concepts (local AI)
- AIStrategy: AI-based categorization (optional, controlled by autoLearnEnabled)
- StagingStore: Persists AI suggestions to staging YAML files when auto-learn is off (optional, injected via SetStagingStore)
Orchestration:
func (c *Categorizer) Categorize(ctx context.Context, tx Transaction) (Category, error) {
for _, strategy := range c.strategies {
category, found, err := strategy.Categorize(ctx, tx)
if err != nil {
c.logger.Warn("Strategy failed",
logging.Field{Key: "strategy", Value: strategy.Name()})
continue
}
if found {
return category, nil
}
}
return UncategorizedCategory, nil
}
Benefits:
- Easy addition of new categorization algorithms
- Independent testing and optimization of each strategy
- Clear separation of concerns between strategies
- Flexible priority ordering and configuration
- Improved maintainability through focused implementations
5. Fail-Fast with Graceful Degradation¶
Principle: Detect errors early but provide meaningful fallbacks when possible.
Implementation:
- Comprehensive input validation at parser entry points
- Early return on invalid file formats
- Graceful handling of malformed data with logging
- Default values for missing optional fields
- Custom error types with detailed context
Benefits:
- Better user experience with clear error messages
- System stability under adverse conditions
- Easier debugging with detailed logging
6. Immutable Data Structures¶
Principle: Core data models are designed to be immutable where possible.
Implementation:
- Transaction models with read-only fields after creation
- Decimal types for financial amounts (preventing floating-point errors)
- Configuration objects that don't change after initialization
Benefits:
- Thread safety
- Predictable behavior
- Reduced bugs from unexpected state changes
7. Comprehensive Logging & Observability¶
Principle: All significant operations are logged with appropriate detail levels using a framework-agnostic abstraction.
Implementation:
- Logging Abstraction Layer:
logging.Loggerinterface decouples application from specific frameworks - Dependency Injection: Logger instances injected through constructors via
BaseParser - Structured Logging: Consistent field names using
logging.Fieldstruct for key-value pairs - Multiple Log Levels: Debug, Info, Warn, Error, Fatal with appropriate usage
- Context-Rich Messages: Metadata using
WithField,WithFields, andWithErrormethods - Default Implementation:
LogrusAdapterwrapping logrus with JSON and text formatters - Test Support: Mock logger implementations for unit testing
Example:
logger.Info("Processing transaction",
logging.Field{Key: "file", Value: filename},
logging.Field{Key: "count", Value: len(transactions)})
Benefits:
- Easy troubleshooting and debugging with structured data
- Production monitoring capabilities
- Audit trail for financial data processing
- Improved testability with mock loggers
- Flexibility to change logging implementations without modifying business logic
- Consistent logging patterns across all parsers through
BaseParser
8. Test-Driven Quality Assurance¶
Principle: Comprehensive testing ensures reliability and prevents regressions.
Implementation:
- Unit tests for all parser packages
- Integration tests for end-to-end workflows
- Test data that covers edge cases and error conditions
- Consistent test structure across all packages
Benefits:
- High confidence in code changes
- Documentation through test examples
- Regression prevention
8. Configuration Over Convention¶
Principle: Behavior should be configurable rather than hard-coded.
Implementation:
- YAML-based configuration files for categories and mappings
- Configurable output formatters via
--formatflag - FormatterRegistry for runtime formatter selection
- Environment-specific settings
- Runtime parser selection
- Auto-learn toggle for AI categorization
Formatter System:
type OutputFormatter interface {
Header() []string
Format(transactions []models.Transaction) ([][]string, error)
Delimiter() rune
}
Built-in Formatters: - StandardFormatter: 29-column CSV with comma delimiter (default) - iComptaFormatter: 10-column CSV with semicolon delimiter and dd.MM.yyyy date format
Benefits:
- Adaptability to different environments and import targets
- User customization without code changes
- Easy deployment across different setups
- Cross-parser output format consistency
9. Error Handling & Recovery¶
Principle: Errors should be handled gracefully with clear communication to users using standardized error types.
Implementation:
- Custom Error Types: Comprehensive error types in
internal/parsererror/ ParseError: General parsing failures with parser, field, and value contextValidationError: Format validation failures with file path and reasonCategorizationError: Transaction categorization failures with strategy contextInvalidFormatError: Files not matching expected format with content snippetsDataExtractionError: Field extraction failures with raw data context- Error Wrapping: Proper error context using
fmt.Errorfwith%wverb - Error Inspection: Use of
errors.Isanderrors.Asfor error type checking - Graceful Degradation: Log warnings for recoverable issues, return errors for unrecoverable ones
- Resource Cleanup: Proper cleanup in error scenarios with
deferstatements
Example:
if err != nil {
return nil, &parsererror.ParseError{
Parser: "CAMT",
Field: "amount",
Value: rawValue,
Err: err,
}
}
Benefits:
- Better user experience with detailed error context
- System resilience through graceful degradation
- Easier troubleshooting with structured error information
- Consistent error handling patterns across all parsers
10. Performance & Resource Management¶
Principle: Efficient resource usage and performance optimization where needed.
Implementation:
- Streaming file processing for large datasets
- Proper resource cleanup (file handles, memory)
- Efficient data structures (decimal for financial calculations)
- Lazy loading where appropriate
Benefits:
- Scalability for large files
- Reduced memory footprint
- Better system resource utilization
Design Patterns Used¶
Strategy Pattern¶
- Different parser implementations for different file formats
- Pluggable categorization strategies
Adapter Pattern¶
- CAMT parser adapter to bridge different XML parsing approaches
- Common interface adaptation for different data sources
Factory Pattern¶
- Parser creation based on file type detection
- Configuration object creation
Template Method Pattern¶
- Common CSV writing logic with format-specific customizations
- Shared validation patterns with format-specific rules
Registry Pattern¶
- FormatterRegistry: Manages output formatters by name
- Thread-safe formatter registration and retrieval
- Extensible formatter system for different output formats
Anti-Patterns Avoided¶
God Objects¶
- No single class handles multiple responsibilities
- Clear separation between parsing, validation, and output
Tight Coupling¶
- Interfaces used to decouple components
- Dependency injection prevents hard dependencies
Magic Numbers/Strings¶
- Constants defined for configuration values
- Enum-like patterns for status codes and types
Premature Optimization¶
- Focus on correctness first, then performance
- Profiling-driven optimization decisions
Evolution Guidelines¶
Adding New Parsers¶
- Create Parser Package: Create
internal/<format>parser/directory - Embed BaseParser: Struct should embed
parser.BaseParserfor common functionality - Implement Interfaces: Implement
parser.Parserinterface (minimum requirement) - Constructor Pattern: Use
NewMyParser(logger logging.Logger)constructor accepting logger - Error Handling: Use custom error types from
internal/parsererror/ - Constants Usage: Use constants from
internal/models/constants.goinstead of magic strings - Structured Logging: Use injected logger with structured fields
- Testing: Include comprehensive tests with mock dependencies
- Documentation: Document format-specific considerations and usage examples
Example Structure:
type MyParser struct {
parser.BaseParser
// format-specific fields
}
func NewMyParser(logger logging.Logger) *MyParser {
return &MyParser{
BaseParser: parser.NewBaseParser(logger),
}
}
func (p *MyParser) Parse(ctx context.Context, r io.Reader) ([]models.Transaction, error) {
p.GetLogger().Info("Starting parse operation")
// Use constants instead of magic strings
transaction.CreditDebit = models.TransactionTypeDebit
// Use custom error types with context
if err != nil {
return nil, &parsererror.ParseError{
Parser: "MyParser",
Field: "amount",
Value: rawValue,
Err: err,
}
}
return transactions, nil
}
Extending Functionality¶
- Consider impact on existing interfaces
- Maintain backward compatibility
- Add configuration options rather than hard-coding behavior
- Update all relevant documentation
Performance Improvements¶
- Profile before optimizing
- Maintain correctness while improving performance
- Consider memory vs. speed trade-offs
- Document performance characteristics
Conclusion¶
These design principles have created a robust, maintainable, and extensible financial data processing system. By adhering to these principles, the codebase remains clean, testable, and adaptable to changing requirements while ensuring the reliability needed for financial data processing.