3. Epic News Architectural Patterns¶

This document describes key architectural patterns and solutions implemented in the Epic News system, with a strong focus on ensuring reliable, maintainable, and high-quality outputs.

1. HTML Rendering Architecture¶

The system employs a robust architecture for generating HTML reports, which has evolved to address several challenges. The core principle is to separate content generation from presentation, using deterministic Python-based rendering wherever possible.

1.1. Deterministic Python-Based Rendering¶

For crews with predictable and structured output (e.g., SAINT, POEM, FINDAILY), the system bypasses LLM-based HTML generation in favor of direct Python factory functions. This approach is faster, more reliable, and easier to maintain.

Pattern:

Execute Crew: The CrewAI flow runs as usual to generate the core content. The .kickoff() method returns a CrewOutput object.
Parse to Pydantic Model: The raw output from the crew is parsed into a structured Pydantic model (e.g., SaintData, FinancialReport).
Render via Python Factory: The Pydantic model is passed to a dedicated Python factory function (e.g., saint_to_html). This function uses a TemplateManager and a universal HTML template to produce the final, consistently styled report.

# Example of the deterministic rendering flow
report_content = SaintDailyCrew().crew().kickoff(inputs=inputs)

# Parse the raw output into a structured Pydantic model
saint_model = SaintData.model_validate(json.loads(report_content.raw))

# Render the final HTML using a dedicated factory
saint_to_html(saint_model, html_file="output/saint_daily/report.html")

1.2. Data Routing: From Factory to Renderer¶

A common issue arises when the data structure produced by a crew's "HTML factory" does not match what the "HTML renderer" expects.

Best Practice:

Ensure the data structure passed from the content-generating part of the crew to the renderer is consistent.
Create a clear contract (e.g., via Pydantic models) between the data source and the renderer.
Map fields explicitly to prevent mismatches (e.g., link → url, published → date).

1.3. HTML Rendering Best Practices¶

BeautifulSoup `class` Attribute Handling¶

Issue: BeautifulSoup's class_ parameter can result in invalid HTML (<div class_="...">) which breaks CSS.

Solution: Always use the attrs dictionary or dictionary unpacking to set class attributes.

# ✅ CORRECT - Using attrs dictionary
tag = soup.new_tag("div")
tag.attrs["class"] = ["container", "my-class"]

# ✅ ALSO CORRECT - Using dictionary unpacking
tag = soup.new_tag("div", **{"class": "container my-class"})

# ❌ PROBLEMATIC - Avoid this
tag = soup.new_tag("div", class_="container")

CSS Theme Compatibility¶

Issue: Hard-coded colors lead to poor readability in different UI themes (light/dark).

Solution: Use CSS variables with fallbacks for all color properties.

/* ✅ CORRECT - Using CSS variables with fallbacks */
.element {
    color: var(--text-color, #343a40);
    background: var(--highlight-bg, #f8f9fa);
    border-color: var(--border-color, #dee2e6);
}

Markdown Link Parsing¶

Issue: Incorrect regex can fail to parse Markdown links [text](url).

Solution: Use re.search() with a non-greedy pattern.

# ✅ CORRECT - Finds a link anywhere in the string
import re
match = re.search(r"\[(.*?)\]\((.*?)\)", text_with_link)
if match:
    link_text = match.group(1)
    link_url = match.group(2)

Empty State Handling¶

Issue: Renderers may fail or produce blank pages when data is missing.

Solution: Always check for empty data and render a user-friendly message.

if not data.get("items"):
    container.append("<div class='empty-state'><p>No data available for this report.</p></div>")

2. Information Retrieval Strategy¶

The project prioritizes data freshness over a static knowledge base. Instead of a traditional RAG that can become stale, agents use a suite of real-time tools to fetch information on demand.

2.1. Core Principle: Real-Time Retrieval¶

Financial markets are highly dynamic. To provide accurate and timely analysis, agents retrieve live data from the web for every task.

2.2. Key Retrieval Tools¶

HybridSearchTool: Cascading search with Perplexity (primary) → Brave → Serper fallback chain for reliable web searches.
ScraperFactory-selected scraper (get_scraper()): Centralized website scraping; defaults to ScrapeNinjaTool. Override via WEB_SCRAPER_PROVIDER (scrapeninja, firecrawl). Direct Firecrawl usage is deprecated in crews.
YahooFinanceNewsTool: For fetching the latest financial news for a specific ticker, providing timely market-moving information.

2.3. Why Not a Traditional RAG?¶

Data Staleness: A vector database would require constant, resource-intensive updates.
Scope Limitation: A pre-populated database is limited, whereas live tools can access the entire public web.

The SaveToRagTool is used not as a permanent knowledge base, but as a short-term memory or "scratchpad" for agents to share information within a single crew execution.

3. Case Study: Refactoring the Sales Prospecting Report¶

The evolution of the SalesProspectingReport provides a clear example of applying architectural principles to solve real-world data challenges.

3.1. The Initial Problem¶

The first version of the SalesProspectingReport relied on a highly complex, nested Pydantic model called StructuredDataReport. This model was designed to capture a wide array of metrics, KPIs, data series, and tables.

Challenge: The data generated by the LLM agents did not align with this rigid and complex structure. The output was a simple, flat JSON with fields like company_overview and key_contacts. As a result, the Pydantic validation failed, and the rendering process broke.

3.2. The Solution: Data-Centric Refactoring¶

Instead of forcing the agents to conform to an overly complex model, the architecture was adapted to the data.

Model Simplification: The SalesProspectingReport Pydantic model was completely redesigned to match the actual data being produced. The dependency on StructuredDataReport was removed, and a new, simpler structure was implemented:
- company_overview: str
- key_contacts: List[KeyContact] (with a new KeyContact sub-model)
- approach_strategy: str
- remaining_information: str
Renderer and Factory Update: The SalesProspectingRenderer and sales_prospecting_html_factory were rewritten to work with the new, simpler model. This involved:
- Removing the logic for rendering metrics and KPIs.
- Adding new sections for company_overview, key_contacts, and remaining_information.
- Updating the CSS to create a modern, professional layout with cards for key contacts.
Data File Correction: The debug/repair_attempt...json file was updated to conform to the new, simpler Pydantic model, ensuring that tests and local development would work correctly.

3.3. Architectural Lessons¶

Model the Data You Have: Design Pydantic models that reflect the actual data being generated, not an idealized version.
Simplicity Over Complexity: A simpler, flatter data structure is often more robust and easier to work with than a deeply nested one.
Decouple Rendering from Data Structure: While the renderer needs to understand the data, the refactoring was made easier because the rendering logic was contained within the SalesProspectingRenderer and not scattered across the application.