comparaison/docs/specs.md

# ComparAIson — Product Specification

## 1. Product Overview

**ComparAIson** is a self-hosted web application that enables users to compare two or more items using AI-powered deep research. The system performs multi-source research, generates structured comparison data, and presents results through interactive visualizations. Completed comparisons are saved as posts on user profiles, creating a browsable library of research.

## 2. Problem Statement

Comparing products, technologies, or services requires gathering data from multiple sources, synthesizing findings, and presenting them clearly. This is time-consuming and often produces inconsistent results. ComparAIson automates this process with LLM-powered research that produces structured, visual, and comparable outputs.

## 3. Target Users

- **Developers** comparing frameworks, tools, cloud services
- **Consumers** comparing products before purchase
- **Researchers** comparing methodologies, papers, or approaches
- **Teams** evaluating options for technical decisions

## 4. Core Features

### 4.1 AI Research Engine
- Multi-item comparison (2-10 items)
- Multi-dimensional scoring (5-8 dimensions per comparison)
- Web search integration via Tavily API
- LLM synthesis via OpenAI GPT-4o-mini or Perplexity Sonar
- Automatic provider fallback chain
- Structured JSON output with validation
- Server-Sent Events for real-time progress

### 4.2 Interactive Visualizations
- **Radar/Spider Chart** — Multi-dimensional overlay showing all items
- **Grouped Bar Chart** — Side-by-side metric comparison
- **Comparison Table** — Feature matrix with color-coded cells
- **Score Cards** — Animated progress bars with overall + per-dimension scores
- **Pros/Cons Cards** — Expandable per-item breakdown

### 4.3 User System
- Email + password authentication (Better Auth)
- Session management (7-day expiry)
- Protected routes for compare/profile actions
- Public profile pages with comparison history

### 4.4 Social/Feed Features
- Public comparisons feed (Explore page)
- Per-comparison view count tracking
- Tag-based categorization and filtering
- Search across public comparisons
- Shareable URLs for each comparison

## 5. Technical Constraints

| Constraint | Value |
|---|---|
| Deployment target | Raspberry Pi ARM64, 8GB RAM |
| Concurrent users | Low (homelab, <20) |
| Total RAM budget | ~500MB-1GB (app + DB + reverse proxy) |
| Cost target | Minimal (free tier APIs where possible) |
| Network | Behind Traefik reverse proxy with HTTPS |

## 6. Data Model

### 6.1 Users (Better Auth managed)
```
users: id, name, email, emailVerified, image, createdAt, updatedAt
sessions: id, userId (FK), token, expiresAt, createdAt, updatedAt
```

### 6.2 Comparisons
```
comparisons: id, userId (FK), title, query, slug, status (researching|completed|failed),
             summary, overallData (JSONB), tags[], isPublic, viewCount, createdAt, updatedAt
```

### 6.3 Comparison Items
```
comparison_items: id, comparisonId (FK), name, description, imageUrl,
                  researchData (JSONB), scores (JSONB), pros[], cons[], order
```

### 6.4 Comparison Dimensions
```
comparison_dimensions: id, comparisonId (FK), name, description, weight, order
```

### 6.5 JSONB Schemas

**overallData (on comparisons):**
```json
{
  "title": "React vs Vue vs Svelte",
  "query": "for modern web development",
  "status": "completed",
  "summary": "...",
  "items": [
    {
      "name": "React",
      "description": "...",
      "overallScore": 8.5,
      "dimensions": {
        "Performance": { "score": 8, "summary": "...", "details": "...", "pros": [], "cons": [] }
      },
      "pros": ["..."],
      "cons": ["..."]
    }
  ],
  "dimensions": ["Performance", "Developer Experience", "Ecosystem", ...]
}
```

**researchData (on comparison_items):**
Full `ItemResearch` object including dimensions, sources, and scores.

## 7. LLM Research Pipeline

### 7.1 Flow
```
User submits query
  → Parse request (validate items ≥ 2)
  → Detect available providers (Tavily? Perplexity? OpenAI?)
  → If Tavily available: search each item individually
  → Synthesize via best available provider:
      Priority 1: Tavily search + Perplexity synthesis
      Priority 2: Tavily search + OpenAI synthesis
      Priority 3: OpenAI only (no web search)
  → Validate structured JSON output
  → Persist to database
  → Stream results to client
```

### 7.2 Provider Details

| Provider | Role | Model | Cost |
|---|---|---|---|
| Tavily | Web search | Search API | ~$0.005/search |
| Perplexity | Synthesis | Sonar | ~$0.002/query |
| OpenAI | Synthesis | GPT-4o-mini | ~$0.15/1M tokens |

### 7.3 Progress Stages (SSE)
1. `parsing` — Validating query and extracting items
2. `searching` — Running web search for each item (Tavily only)
3. `researching` — Processing research per item
4. `synthesizing` — LLM generating structured comparison
5. `complete` — Final result with all data
6. `error` — Failure with error message

## 8. Security Considerations

- Auth middleware protects `/compare` and `/profile` routes
- Session tokens stored in HTTP-only cookies
- API keys never exposed to client (server-only LLM calls)
- Input validation on all API endpoints (min 2 items, max 10)
- SQL injection prevented via Drizzle ORM parameterized queries
- CSRF protection via Better Auth
- Rate limiting placeholder in compare API route

## 9. Future Considerations

- [ ] OAuth providers (Google, GitHub)
- [ ] Comparison comments/likes
- [ ] Export to PDF/image
- [ ] Embeddable comparison widgets
- [ ] Comparison templates
- [ ] Batch comparison queue for heavy loads
- [ ] Local Ollama fallback for offline operation