# Content Pipeline - CLAUDE.md

## Overview
Automated SEO content pipeline generating German ANPR parking management articles.

## Quick Start
```bash
source venv/bin/activate
python scripts/migrate.py      # Create tables
python scripts/seed_topics.py  # Load 120 topics
python pipeline.py             # Run pipeline
```

## Architecture
- `pipeline.py` - Main orchestrator (13 steps)
- `modules/` - Topic selection, article generation, fact-checking, thumbnails, WP publishing
- `prompts/` - Gemini prompt templates (German)
- `database/` - MariaDB models with connection pool

## Key Dependencies
- `google-genai` (NOT `google-generativeai` - deprecated)
- `fal-client` for image generation
- `mysql-connector-python` for database

## API Patterns

### Text Generation (Model Fallback Chain with Exponential Backoff)
```python
TEXT_MODELS = ["gemini-3-pro-preview", "gemini-3-flash-preview", "gemini-2.5-pro"]
# Internal linking: ["gemini-3-flash-preview", "gemini-2.5-flash"] (LLM-only, no regex fallback)

for model in MODELS:
    delay = INITIAL_DELAY
    for attempt in range(MAX_RETRIES):
        try:
            response = client.models.generate_content(model=model, ...)
            break
        except Exception as e:
            if "429" in str(e): time.sleep(delay); delay *= 2; continue
            break  # Non-retryable, try next model
    else: continue
    if response: break
```

### Image Generation
- Primary: `fal-ai/flux-2/turbo` via fal_client
- Fallback: `gemini-3-pro-image-preview` with `response_modalities=['Image']`

## Database Tables
- `parketry_content_topics` - Topic queue with priority scoring
- `parketry_articles` - Generated articles with fact-check logs
- `parketry_published_content` - For semantic deduplication

## Validation Thresholds
- Min words: 1200 (LLMs underdeliver)
- Fact-check: NEEDS_REVISION with >75% confidence passes

## Cron

Daily scheduler runs at 00:05, picks a random publish time between 08:00-19:00, sleeps until then, and runs the pipeline:

```bash
5 0 * * * cd /var/www/html/content-pipeline && ./venv/bin/python daily_scheduler.py >> logs/scheduler.log 2>&1
```

Files:
- `daily_scheduler.py` - Waits until random time, then calls cron_runner
- `cron_runner.py` - Handles locking, logging, runs pipeline
- `logs/scheduler.log` - Scheduler output
- `logs/scheduled_time.txt` - Today's scheduled publish time
