# Parketry Content Pipeline

Automated SEO content generation for parketry.de using Gemini AI with web search grounding.

## Quick Start

```bash
# 1. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure environment
cp .env.example .env
# Edit .env with your API keys

# 4. Run database migrations
python scripts/migrate.py

# 5. Seed topics
python scripts/seed_topics.py

# 6. Backfill existing articles for deduplication
python scripts/backfill_existing.py

# 7. Run pipeline
python pipeline.py
```

## Configuration

Required in `.env`:
- `GEMINI_API_KEY` - Google AI Studio API key
- `FAL_API_KEY` - Fal.ai API key for image generation
- `WP_USERNAME` / `WP_APPLICATION_PASSWORD` - WordPress credentials
- `DB_*` - Database connection details

## Cron Setup

```bash
# Daily at 06:00
0 6 * * * cd /path/to/content-pipeline && ./venv/bin/python cron_runner.py
```

## Pipeline Steps

1. **Topic Selection** - Highest priority pending topic
2. **Semantic Dedupe** - Check against published content
3. **Article Generation** - Gemini Flash + web search (2000-3000 words)
4. **Content Validation** - Word count, error patterns
5. **Fact-Checking** - Gemini Pro verification (max 2 attempts)
6. **Thumbnail Generation** - Fal.ai FLUX.2 / Gemini fallback
7. **WordPress Publish** - REST API with Yoast SEO

## Logs

- `logs/pipeline_YYYY-MM-DD.log` - Daily execution logs
- `logs/last_run.json` - Last run result
- `../logs/error.log` - Critical errors (parent project)
