Skip to content

ADR-031: Server-Side File Validation with Magic Bytes

Status: Accepted Date: 2026-02-19

Context

Users upload files that the server will parse. Accepting arbitrary file content without validation is a security risk and a source of confusing parse errors.

Decision

Validate uploaded files server-side with two checks before processing: extension check and magic byte check. XLSX files must begin with the ZIP header (PK\x03\x04); CSV files must be UTF-8 decodable. Files failing either check are rejected before any temp file write.

Rationale

  • Extension alone can be spoofed; magic bytes verify the actual file format
  • XLSX is a ZIP archive; the PK header check is a reliable minimum-cost assertion
  • UTF-8 decodability for CSV catches binary uploads masquerading as text
  • Rejection before temp file write limits attack surface

Alternatives Considered

  • python-magic library: Requires libmagic system dependency; complicates Docker image; stdlib checks are sufficient for two known formats

Consequences

  • Files with correct extension but wrong format produce a clear rejection message rather than a confusing parse error
  • Only XLSX and CSV formats are accepted; all other uploads are rejected regardless of content