Architecture Documentation¶
Overview¶
CAMT-CSV follows a clean, layered architecture built on dependency injection principles. The system transforms various financial statement formats into standardized CSV files with intelligent categorization.
High-Level Architecture¶
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CLI Layer │ │ Configuration │ │ Logging │
│ (cmd/) │ │ (config/) │ │ (logging/) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Dependency Container │
│ (container/) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
│ │ Logger │ │ Config │ │ Store │ │AIClient │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Parsers │ │ Categorizer │ │ Store │
│ (parsers/) │ │ (categorizer/) │ │ (store/) │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ │
│ │BaseParser │ │ │ │Strategies │ │ │ │
│ └───────────┘ │ │ └───────────┘ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Core Models │
│ (models/) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
│ │Transaction │ │ Builder │ │ Constants │ │ Errors │ │
│ │ Types │ │ Pattern │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────────┘
Core Principles¶
1. Dependency Injection¶
All components receive their dependencies through constructors, eliminating global state and improving testability.
Container Pattern:
type Container struct {
Logger logging.Logger
Config *config.Config
Store *store.CategoryStore
AIClient categorizer.AIClient
Categorizer *categorizer.Categorizer
Parsers map[parser.ParserType]parser.FullParser
}
// NewContainer creates and wires all dependencies
func NewContainer(cfg *config.Config) (*Container, error) {
logger := logging.NewLogrusAdapter(cfg.Log.Level, cfg.Log.Format)
store := store.NewCategoryStore(cfg.Categories.File, cfg.Categories.CreditorsFile, cfg.Categories.DebtorsFile)
var aiClient categorizer.AIClient
if cfg.AI.Enabled {
aiClient = categorizer.NewGeminiClient(cfg.AI.APIKey, logger, cfg.AI.RequestsPerMinute)
}
cat := categorizer.NewCategorizer(aiClient, store, logger, cfg.Categorization.AutoLearn)
// Wire staging store when AI is on but auto-learn is off
if cfg.AI.Enabled && !cfg.Categorization.AutoLearn && cfg.Staging.Enabled {
stagingStore := store.NewStagingStore(cfg.Staging.CreditorsFile, cfg.Staging.DebtorsFile)
cat.SetStagingStore(stagingStore)
}
parsers := make(map[parser.ParserType]parser.FullParser)
parsers[parser.CAMT] = camtparser.NewParser(logger)
parsers[parser.PDF] = pdfparser.NewParser(logger)
// ... other parsers
return &Container{
Logger: logger,
Config: cfg,
Store: store,
AIClient: aiClient,
Categorizer: cat,
Parsers: parsers,
}, nil
}
Benefits: - Explicit dependencies with no global state - Easy testing with mock dependencies - Runtime configuration flexibility - Centralized dependency management - Proper resource lifecycle management
2. Interface Segregation¶
Parsers implement segregated interfaces based on their capabilities:
type Parser interface {
Parse(ctx context.Context, r io.Reader) ([]models.Transaction, error)
}
type Validator interface {
ValidateFormat(filePath string) (bool, error)
}
type CSVConverter interface {
ConvertToCSV(ctx context.Context, inputFile, outputFile string) error
}
type LoggerConfigurable interface {
SetLogger(logger logging.Logger)
}
type CategorizerConfigurable interface {
SetCategorizer(categorizer models.TransactionCategorizer)
}
type BatchConverter interface {
BatchConvert(ctx context.Context, inputDir, outputDir string) (int, error)
}
type FullParser interface {
Parser
Validator
CSVConverter
LoggerConfigurable
CategorizerConfigurable
BatchConverter
}
Benefits: - Clients depend only on needed interfaces - Easy to implement new parsers - Clear separation of concerns - Flexible composition
3. BaseParser Foundation¶
All parsers embed BaseParser to eliminate code duplication:
type BaseParser struct {
logger logging.Logger
}
func (b *BaseParser) SetLogger(logger logging.Logger) {
b.logger = logger
}
func (b *BaseParser) WriteToCSV(transactions []models.Transaction, csvFile string) error {
return common.WriteTransactionsToCSV(transactions, csvFile)
}
Benefits: - Consistent behavior across parsers - Shared functionality (logging, CSV writing) - Reduced code duplication - Easier maintenance
Component Architecture¶
Parser Layer¶
Structure:
internal/
├── parser/
│ ├── parser.go # Interface definitions
│ ├── base.go # BaseParser implementation
│ └── constitution.go # Constitution loading
├── camtparser/ # CAMT.053 XML parser
├── pdfparser/ # PDF statement parser
├── revolutparser/ # Revolut CSV parser
├── revolutinvestmentparser/ # Revolut investment parser
├── selmaparser/ # Selma investment parser
└── debitparser/ # Generic debit CSV parser
Parser Implementation Pattern:
type MyParser struct {
parser.BaseParser
// parser-specific fields
}
func NewMyParser(logger logging.Logger) *MyParser {
return &MyParser{
BaseParser: parser.NewBaseParser(logger),
}
}
func (p *MyParser) Parse(ctx context.Context, r io.Reader) ([]models.Transaction, error) {
p.GetLogger().Info("Starting parse operation")
// implementation
}
Categorization Layer¶
Strategy Pattern Implementation:
type CategorizationStrategy interface {
Categorize(ctx context.Context, tx Transaction) (Category, bool, error)
Name() string
}
type Categorizer struct {
strategies []CategorizationStrategy
store *store.CategoryStore
logger logging.Logger
mu sync.RWMutex
}
func NewCategorizer(aiClient AIClient, store CategoryStoreInterface, logger logging.Logger, autoLearnEnabled bool) *Categorizer {
c := &Categorizer{
store: store,
logger: logger,
isAutoLearnEnabled: autoLearnEnabled,
}
// Initialize strategies in priority order
c.strategies = []CategorizationStrategy{
NewDirectMappingStrategy(store, logger),
NewKeywordStrategy(store, logger),
NewSemanticStrategy(aiClient, logger, c.categories),
NewAIStrategy(aiClient, logger),
}
return c
}
func (c *Categorizer) Categorize(ctx context.Context, tx Transaction) (Category, error) {
for _, strategy := range c.strategies {
category, found, err := strategy.Categorize(ctx, tx)
if err != nil {
c.logger.Warn("Strategy failed",
logging.Field{Key: "strategy", Value: strategy.Name()},
logging.Field{Key: "error", Value: err})
continue
}
if found {
c.logger.Debug("Transaction categorized",
logging.Field{Key: "strategy", Value: strategy.Name()},
logging.Field{Key: "category", Value: category.Name})
return category, nil
}
}
return UncategorizedCategory, nil
}
Four-Tier Strategy Approach:
1. DirectMappingStrategy: Exact name matches from creditors.yaml/debtors.yaml (fastest)
2. KeywordStrategy: Pattern matching from categories.yaml (local processing)
3. SemanticStrategy: Vector-based embedding similarity matching transactions to category concepts (local AI)
4. AIStrategy: Gemini API fallback with rate limiting (optional, controlled by autoLearnEnabled)
AI Result Persistence:
- Auto-learn ON: AI results saved directly to creditors.yaml/debtors.yaml (with backups)
- Auto-learn OFF (default): AI results saved to staging files (staging_creditors.yaml/staging_debtors.yaml) for manual review via StagingStore
- Staging is optional and controlled by staging.enabled config
Data Layer¶
Transaction Model Decomposition:
// Money represents a monetary value with currency
type Money struct {
Amount decimal.Decimal
Currency string
}
// Party represents a transaction party (payer or payee)
type Party struct {
Name string
IBAN string
}
// TransactionCore contains essential transaction data
type TransactionCore struct {
ID string
Date time.Time
ValueDate time.Time
Amount Money
Description string
Status string
Reference string
}
// TransactionWithParties adds party information
type TransactionWithParties struct {
TransactionCore
Payer Party
Payee Party
Direction TransactionDirection // DEBIT or CREDIT
}
// CategorizedTransaction adds categorization data
type CategorizedTransaction struct {
TransactionWithParties
Category string
Type string
Fund string
}
// Transaction maintains backward compatibility
type Transaction struct {
CategorizedTransaction
// Additional fields for specific formats
BookkeepingNumber string
RemittanceInfo string
PartyIBAN string
Investment string
NumberOfShares int
Fees Money
EntryReference string
AccountServicer string
BankTxCode string
OriginalAmount Money
ExchangeRate decimal.Decimal
// Tax-related fields
AmountExclTax Money
AmountTax Money
TaxRate decimal.Decimal
}
Builder Pattern with Validation:
tx, err := NewTransactionBuilder().
WithDate("2025-01-15").
WithAmount(decimal.NewFromFloat(100.50), "CHF").
WithPayer("John Doe", "CH1234567890").
WithPayee("Acme Corp", "CH0987654321").
AsDebit().
Build()
if err != nil {
return fmt.Errorf("transaction construction failed: %w", err)
}
Backward Compatibility Methods:
// Legacy accessor methods for backward compatibility
func (t *Transaction) GetPayee() string {
return t.Payee.Name
}
func (t *Transaction) GetPayer() string {
return t.Payer.Name
}
// Deprecated: Use Amount.Amount.Float64() instead
func (t *Transaction) GetAmountAsFloat() float64 {
f, _ := t.Amount.Amount.Float64()
return f
}
Error Handling Architecture¶
Custom Error Types¶
Comprehensive Error Hierarchy:
// Base parsing error with context
type ParseError struct {
Parser string
Field string
Value string
Err error
}
func (e *ParseError) Error() string {
return fmt.Sprintf("%s: failed to parse %s='%s': %v",
e.Parser, e.Field, e.Value, e.Err)
}
func (e *ParseError) Unwrap() error {
return e.Err
}
// Format validation failures
type ValidationError struct {
FilePath string
Field string
Reason string
}
func (e *ValidationError) Error() string {
if e.Field != "" {
return fmt.Sprintf("validation failed for %s field %s: %s", e.FilePath, e.Field, e.Reason)
}
return fmt.Sprintf("validation failed for %s: %s", e.FilePath, e.Reason)
}
// Invalid format detection
type InvalidFormatError struct {
FilePath string
ExpectedType string
Reason string
}
func (e *InvalidFormatError) Error() string {
return fmt.Sprintf("invalid %s format in %s: %s", e.ExpectedType, e.FilePath, e.Reason)
}
// Data extraction failures
type DataExtractionError struct {
FilePath string
Field string
RawData string
Reason string
}
func (e *DataExtractionError) Error() string {
return fmt.Sprintf("failed to extract %s from %s: %s (raw data: %s)",
e.Field, e.FilePath, e.Reason, e.RawData)
}
// Categorization failures
type CategorizationError struct {
Transaction string
Strategy string
Err error
}
func (e *CategorizationError) Error() string {
return fmt.Sprintf("categorization failed for %s using %s: %v",
e.Transaction, e.Strategy, e.Err)
}
func (e *CategorizationError) Unwrap() error {
return e.Err
}
Error Handling Patterns¶
Pattern 1: Unrecoverable Errors (Return)
if err := xml.Unmarshal(data, &document); err != nil {
return nil, &parsererror.ParseError{
Parser: "CAMT",
Field: "document",
Err: err,
}
}
Pattern 2: Recoverable Errors (Log and Continue)
amount, err := decimal.NewFromString(entry.Amt.Value)
if err != nil {
p.logger.Warn("Failed to parse amount, using zero",
logging.Field{Key: "value", Value: entry.Amt.Value},
logging.Field{Key: "error", Value: err})
amount = decimal.Zero
}
Logging Architecture¶
Framework-Agnostic Design¶
Logger Interface:
type Logger interface {
Debug(msg string, fields ...Field)
Info(msg string, fields ...Field)
Warn(msg string, fields ...Field)
Error(msg string, fields ...Field)
WithError(err error) Logger
WithField(key string, value interface{}) Logger
WithFields(fields ...Field) Logger
}
Structured Logging:
logger.Info("Processing transaction",
logging.Field{Key: "file", Value: filename},
logging.Field{Key: "count", Value: len(transactions)})
Dependency Injection: - All components receive logger through constructor - BaseParser provides logger to all parsers - Mock loggers for testing
Configuration Architecture¶
Hierarchical Configuration¶
Priority Order (highest to lowest):
1. CLI flags (--log-level debug)
2. Environment variables (CAMT_LOG_LEVEL=debug)
3. Config file (~/.camt-csv/config.yaml)
4. Default values
Configuration Structure:
log:
level: "info"
format: "text"
csv:
delimiter: ","
include_headers: true
ai:
enabled: true
model: "gemini-2.0-flash"
Testing Architecture¶
Dependency Injection for Testing¶
Mock Implementations:
type MockLogger struct {
Entries []LogEntry
}
type MockAIClient struct {
CategorizeFunc func(context.Context, models.Transaction) (models.Transaction, error)
}
type MockCategoryStore struct {
Categories []models.CategoryConfig
CreditorMappings map[string]string
DebtorMappings map[string]string
}
Test Structure:
func TestCategorizer_Categorize(t *testing.T) {
// Setup
mockStore := &MockCategoryStore{...}
mockLogger := &MockLogger{}
cat := NewCategorizer(mockStore, nil, mockLogger)
// Execute & Assert
result, err := cat.Categorize(context.Background(), transaction)
assert.NoError(t, err)
assert.Equal(t, expectedCategory, result.Name)
}
Performance Architecture¶
Optimization Strategies¶
String Operations with Builder Pattern:
// Before: Multiple string operations creating temporary strings
func (c *Categorizer) categorizeByMapping(tx Transaction) (Category, bool) {
partyNameLower := strings.ToLower(tx.PartyName)
normalized := strings.ReplaceAll(partyNameLower, " ", "")
normalized = strings.ReplaceAll(normalized, "-", "")
// Each operation allocates new strings
}
// After: Single-pass normalization with pre-allocated builder
func (c *Categorizer) categorizeByMapping(tx Transaction) (Category, bool) {
var builder strings.Builder
builder.Grow(len(tx.PartyName)) // Avoid reallocations
for _, r := range strings.ToLower(tx.PartyName) {
if r != ' ' && r != '-' {
builder.WriteRune(r)
}
}
normalized := builder.String()
// 60-80% reduction in string allocations
}
Lazy Initialization with Thread Safety:
type Categorizer struct {
aiClient AIClient
aiClientOnce sync.Once
aiFactory func() AIClient
logger logging.Logger
}
func (c *Categorizer) getAIClient() AIClient {
c.aiClientOnce.Do(func() {
if c.aiClient == nil && c.aiFactory != nil {
c.aiClient = c.aiFactory()
c.logger.Debug("AI client initialized lazily")
}
})
return c.aiClient
}
Pre-allocation and Capacity Management:
// Pre-allocate slices with known capacity
transactions := make([]models.Transaction, 0, len(entries))
// Pre-allocate maps with size hints to reduce rehashing
mappings := make(map[string]string, len(items))
// For large datasets, consider batch processing
const batchSize = 1000
if len(entries) > batchSize {
for i := 0; i < len(entries); i += batchSize {
end := i + batchSize
if end > len(entries) {
end = len(entries)
}
batch := entries[i:end]
processBatch(batch)
}
}
Performance Benefits:
- Eliminates slice reallocations during growth
- Reduces map rehashing operations
- Controls memory usage for large datasets
- Improves cache locality through better memory layout
Security Architecture¶
Input Validation¶
- All file paths validated for directory traversal
- XML/CSV content validated before processing
- Amount values validated for reasonable ranges
- Date formats validated before parsing
Error Message Sanitization¶
- No sensitive data in error messages
- File paths relativized in logs
- API keys never logged
- Transaction details redacted in non-debug logs
File Permissions¶
- Config files: 0600 (owner read/write only)
- Directories: 0750 (owner full, group read/execute)
- Output files: 0644 (owner read/write, others read)
Migration and Compatibility¶
Backward Compatibility Strategy¶
Deprecated Code Marking:
// Deprecated: Use NewCategorizer with dependency injection instead.
// This function will be removed in v2.0.0.
func GetDefaultCategorizer() *Categorizer {
// Provide backward compatible implementation
}
Adapter Pattern:
type LegacyTransactionAdapter struct {
tx Transaction
}
func (a *LegacyTransactionAdapter) GetAmountAsFloat() float64 {
f, _ := a.tx.Amount.Amount.Float64()
return f
}
Migration Path¶
- Phase 1: Introduce new interfaces alongside existing code
- Phase 2: Add deprecation warnings to old APIs
- Phase 3: Migrate internal usage to new patterns
- Phase 4: Remove deprecated code in major version bump
Formatter Architecture¶
Output Formatter System¶
Formatter Interface:
type OutputFormatter interface {
Header() []string
Format(transactions []models.Transaction) ([][]string, error)
Delimiter() rune
}
FormatterRegistry Pattern:
type FormatterRegistry struct {
formatters map[string]OutputFormatter
mu sync.RWMutex
}
func (r *FormatterRegistry) Register(name string, formatter OutputFormatter) {
r.mu.Lock()
defer r.mu.Unlock()
r.formatters[name] = formatter
}
func (r *FormatterRegistry) Get(name string) (OutputFormatter, error) {
r.mu.RLock()
defer r.mu.RUnlock()
formatter, ok := r.formatters[name]
if !ok {
return nil, fmt.Errorf("formatter not found: %s", name)
}
return formatter, nil
}
Built-in Formatters: - StandardFormatter: 29-column CSV with comma delimiter (default) - iComptaFormatter: 10-column CSV with semicolon delimiter and dd.MM.yyyy date format
Usage:
registry := container.GetFormatterRegistry()
formatter, err := registry.Get("icompta")
if err != nil {
return err
}
rows, err := formatter.Format(transactions)
csvFile.SetDelimiter(formatter.Delimiter())
Benefits:
- Cross-parser output formatting
- Easy addition of new export formats
- Consistent formatting across all parsers
- User-selectable output format via --format flag
Extension Points¶
Adding New Parsers¶
- Create package:
internal/<format>parser/ - Embed BaseParser:
parser.BaseParser - Implement interfaces:
parser.Parser(minimum) withcontext.Contextparameter - Use dependency injection: Accept logger in constructor
- Follow error handling patterns: Use custom error types
- Add comprehensive tests: Mock dependencies
Adding New Categorization Strategies¶
- Implement
CategorizationStrategyinterface - Add to strategy list in
Categorizerconstructor - Ensure proper priority ordering
- Add comprehensive tests with mock dependencies
Adding New Configuration Options¶
- Add to config struct in
internal/config/ - Update Viper configuration loading
- Add environment variable mapping
- Update CLI flags if needed
- Document in user guide
This architecture provides a solid foundation for maintainable, testable, and extensible financial data processing while ensuring reliability and performance.