chore: ai agent config

Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
This commit is contained in:
2026-04-21 20:19:05 -04:00
parent ffc4a2c5c5
commit 7cf21546e2
65 changed files with 10076 additions and 133 deletions

View File

@@ -0,0 +1,481 @@
# Output Templates Reference
Complete formatting templates for all supported output formats.
Every output must be wrapped in a delivery envelope with metadata.
---
## Delivery Envelope (Required)
Every extraction result MUST include this metadata wrapper,
regardless of output format:
```markdown
## Extraction Results
**Source:** [Page Title](https://example.com/page)
**Date:** 2026-02-25 14:30 UTC
**Items:** 47 records
**Confidence:** HIGH
**Format:** Markdown Table
---
[DATA GOES HERE]
---
**Notes:**
- Any gaps, anomalies, or observations
- Filters or sorts applied
- Pages scraped (if paginated)
```
---
## Markdown Table Format
### Standard Table
```markdown
| Name | Price | Rating | Availability |
|:---------------|---------:|:------:|:-------------|
| Product Alpha | $29.99 | 4.5 | In Stock |
| Product Beta | $49.99 | 4.2 | In Stock |
| Product Gamma | $119.00 | 4.8 | Pre-order |
| Product Delta | $15.50 | 3.9 | Out of Stock |
```
### Alignment Rules
| Data Type | Alignment | Markdown Syntax |
|:-------------|:----------|:----------------|
| Text | Left | `:---` |
| Numbers | Right | `---:` |
| Centered | Center | `:---:` |
| Mixed/Status | Left | `:---` |
### Table with Summary Row
```markdown
| Product | Units Sold | Revenue |
|:---------------|----------:|-----------:|
| Widget A | 1,234 | $12,340 |
| Widget B | 567 | $8,505 |
| Widget C | 2,890 | $57,800 |
| **Total** | **4,691** | **$78,645**|
```
### Wide Data (Split Tables)
When data has more than 10 columns, split into logical groups:
```markdown
### Basic Information
| Name | Category | Brand | SKU |
|:--------|:---------|:--------|:---------|
| Item A | Tools | Acme | ACM-001 |
### Pricing and Availability
| Name | Price | Sale Price | Stock | Ships In |
|:--------|--------:|-----------:|:------|:---------|
| Item A | $49.99 | $39.99 | 142 | 2 days |
```
### Multi-URL Comparison Table
```markdown
| Source | Product | Price | Rating |
|:-------------|:-----------|--------:|:------:|
| store-a.com | Laptop X | $999 | 4.3 |
| store-b.com | Laptop X | $949 | 4.5 |
| store-c.com | Laptop X | $1,029 | 4.1 |
```
### Truncation Rules
For values exceeding 60 characters:
```markdown
| Title | Author |
|:------------------------------------------------------------|:--------|
| Introduction to Advanced Machine Learning Techni... | J. Smith|
```
---
## JSON Format
### Standard JSON Output
```json
{
"metadata": {
"source": "https://example.com/products",
"title": "Product Catalog - Example Store",
"extractedAt": "2026-02-25T14:30:00Z",
"itemCount": 3,
"confidence": "HIGH",
"fields": ["name", "price", "rating", "availability"],
"notes": []
},
"data": [
{
"name": "Product Alpha",
"price": 29.99,
"currency": "USD",
"rating": 4.5,
"availability": "In Stock"
},
{
"name": "Product Beta",
"price": 49.99,
"currency": "USD",
"rating": 4.2,
"availability": "In Stock"
},
{
"name": "Product Gamma",
"price": 119.00,
"currency": "USD",
"rating": 4.8,
"availability": "Pre-order"
}
]
}
```
### JSON Key Naming
| Rule | Example |
|:-----------------------|:----------------------------------|
| camelCase | `productName`, `unitPrice` |
| Numbers stay numeric | `29.99` not `"29.99"` |
| Booleans stay boolean | `true` not `"true"` |
| Missing = null | `null` not `""` or `"N/A"` |
| Arrays for multiples | `"tags": ["sale", "new"]` |
| ISO-8601 for dates | `"2026-02-25T14:30:00Z"` |
### Nested JSON (Product with Details)
```json
{
"metadata": { "..." : "..." },
"data": [
{
"name": "Laptop Pro X",
"brand": "TechCo",
"pricing": {
"current": 999.99,
"original": 1299.99,
"currency": "USD",
"discount": "23%"
},
"rating": {
"score": 4.5,
"count": 1234
},
"specifications": {
"processor": "M3 Pro",
"ram": "16 GB",
"storage": "512 GB SSD",
"display": "14.2 inch Retina"
},
"availability": {
"inStock": true,
"shipsIn": "2-3 business days"
}
}
]
}
```
### Multi-URL JSON
```json
{
"metadata": {
"sources": [
"https://store-a.com/laptop-x",
"https://store-b.com/laptop-x"
],
"extractedAt": "2026-02-25T14:30:00Z",
"itemCount": 2,
"confidence": "HIGH"
},
"data": [
{
"source": "store-a.com",
"name": "Laptop X",
"price": 999,
"currency": "USD",
"rating": 4.3
},
{
"source": "store-b.com",
"name": "Laptop X",
"price": 949,
"currency": "USD",
"rating": 4.5
}
]
}
```
---
## CSV Format
### Standard CSV
```csv
# Source: https://example.com/products
# Extracted: 2026-02-25 14:30 UTC
# Items: 3 | Confidence: HIGH
name,price,currency,rating,availability
"Product Alpha",29.99,USD,4.5,"In Stock"
"Product Beta",49.99,USD,4.2,"In Stock"
"Product Gamma",119.00,USD,4.8,"Pre-order"
```
### CSV Rules
| Rule | Example |
|:-------------------------------------|:-------------------------------|
| Always include header row | `name,price,rating` |
| Quote fields with commas | `"Smith, John"` |
| Quote fields with quotes (escape) | `"He said ""hello"""` |
| Quote fields with newlines | `"Line 1\nLine 2"` |
| UTF-8 encoding with BOM | `\xEF\xBB\xBF` prefix |
| Comma delimiter (standard) | `,` |
| Metadata as comments (# prefix) | `# Source: URL` |
| null/missing as empty field | `field1,,field3` |
### Multi-URL CSV
```csv
# Sources: store-a.com, store-b.com
# Extracted: 2026-02-25 14:30 UTC
source,name,price,currency,rating
"store-a.com","Laptop X",999,USD,4.3
"store-b.com","Laptop X",949,USD,4.5
```
---
## Summary Statistics Template
When extracted data contains numeric fields, include a summary block:
```markdown
### Summary Statistics
| Metric | Price | Rating |
|:----------|----------:|-------:|
| Count | 47 | 47 |
| Min | $12.99 | 2.1 |
| Max | $299.99 | 5.0 |
| Average | $67.42 | 4.1 |
| Median | $54.99 | 4.3 |
```
Include only when:
- Data has numeric columns
- More than 5 items extracted
- User would likely benefit from aggregate view (prices, ratings, quantities)
---
## Contact Data Template
```markdown
| Name | Title | Email | Phone |
|:---------------|:-------------------|:---------------------|:---------------|
| Jane Smith | CEO | jane@example.com | +1-555-0101 |
| John Doe | CTO | john@example.com | +1-555-0102 |
| Alice Johnson | VP Engineering | alice@example.com | N/A |
```
---
## Article Extraction Template
```markdown
## Article: [Title]
**Author:** Author Name
**Published:** YYYY-MM-DD
**Source:** [Site Name](URL)
### Summary
[2-3 sentence summary of the article content]
### Key Data Points
- [Factual data point 1]
- [Factual data point 2]
- [Statistical finding]
### Tags
`tag1` `tag2` `tag3`
```
Note: Summarize article content. Do not reproduce full article text
due to copyright.
---
## FAQ Extraction Template
```markdown
### FAQ: [Page Title]
**Source:** [Site Name](URL)
**Items:** 12 questions
| # | Question | Answer (excerpt) |
|--:|:---------|:-----------------|
| 1 | How do I reset my password? | Navigate to Settings > Security and click "Reset..." |
| 2 | What payment methods do you accept? | We accept Visa, Mastercard, PayPal, and bank transfer... |
```
Or as JSON (default for FAQ mode):
```json
{
"metadata": { "source": "URL", "itemCount": 12, "confidence": "HIGH" },
"data": [
{ "question": "How do I reset my password?", "answer": "Navigate to...", "category": "Account" },
{ "question": "What payment methods?", "answer": "We accept...", "category": "Billing" }
]
}
```
---
## Pricing Plans Template
```markdown
### Pricing: [Product Name]
**Source:** [Site Name](URL)
**Plans:** 3 tiers
| Plan | Monthly | Annual | Highlighted |
|:------------|----------:|----------:|:-----------:|
| Starter | $9/mo | $7/mo | |
| Pro | $29/mo | $24/mo | * |
| Enterprise | Custom | Custom | |
#### Feature Comparison
| Feature | Starter | Pro | Enterprise |
|:----------------------|:-------:|:---:|:----------:|
| Users | 1 | 10 | Unlimited |
| Storage | 5 GB | 50 GB | Unlimited |
| API Access | N/A | Yes | Yes |
| Priority Support | N/A | N/A | Yes |
```
---
## Job Listings Template
```markdown
| Title | Company | Location | Salary | Type | Posted |
|:-------------------|:------------|:---------------|:----------------|:----------|:-----------|
| Senior Engineer | TechCo | Remote, US | $150k - $200k | Full-time | 2026-02-20 |
| Product Manager | StartupXYZ | San Francisco | $130k - $160k | Full-time | 2026-02-18 |
| Data Analyst | DataCorp | London, UK | GBP 55k - 70k | Contract | 2026-02-22 |
```
---
## Events Template
```markdown
| Event | Date | Time | Location | Speakers |
|:-----------------------|:-----------|:--------|:------------------|:---------------|
| Opening Keynote | 2026-03-15 | 09:00 | Main Hall | J. Smith |
| Workshop: AI Basics | 2026-03-15 | 14:00 | Room 201 | A. Johnson |
| Networking Reception | 2026-03-15 | 18:00 | Rooftop Lounge | N/A |
```
---
## Differential (Diff) Output Template
When comparing current extraction with a previous run:
```markdown
## Extraction Results (Diff)
**Source:** [Page Title](URL)
**Date:** 2026-02-25 14:30 UTC
**Compared to:** 2026-02-20 10:00 UTC
**Changes:** +5 new, -2 removed, 3 modified
---
### New Items (+5)
| Name | Price | Rating |
|:---------------|--------:|:------:|
| Product Eta | $39.99 | 4.6 |
| Product Theta | $24.99 | 4.1 |
| ... | | |
### Removed Items (-2)
| Name | Price | Rating |
|:---------------|--------:|:------:|
| ~~Product Alpha~~ | ~~$29.99~~ | ~~4.5~~ |
| ~~Product Beta~~ | ~~$49.99~~ | ~~4.2~~ |
### Modified Items (3)
| Name | Field | Was | Now |
|:---------------|:--------|:-----------|:-----------|
| Product Gamma | Price | $119.00 | $109.00 |
| Product Gamma | Rating | 4.8 | 4.9 |
| Product Delta | Stock | Out of Stock | In Stock |
---
**Summary:**
- 5 new products added since last extraction
- 2 products removed (possibly discontinued)
- Product Gamma had a price drop of $10 and rating increase
- Product Delta is back in stock
```
---
## Error / Partial Result Template
When extraction partially fails:
```markdown
## Extraction Results (Partial)
**Source:** [Page Title](URL)
**Date:** 2026-02-25 14:30 UTC
**Items:** 23 of ~50 expected records
**Confidence:** LOW
**Strategy:** A (WebFetch) -> escalated to B (Browser)
---
[PARTIAL DATA]
---
**Issues:**
- 27 items could not be extracted (content behind JS rendering)
- Price field missing for 5 items (marked N/A)
- Auto-escalation from WebFetch to Browser recovered 15 additional items
**Suggestions:**
- Re-run with explicit Browser automation for complete results
- Check if site has an API endpoint for direct data access
- Try at a different time if rate-limited
```