chore: ai agent config
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
This commit is contained in:
481
.claude/skills/web-scraper/references/output-templates.md
Normal file
481
.claude/skills/web-scraper/references/output-templates.md
Normal file
@@ -0,0 +1,481 @@
|
||||
# Output Templates Reference
|
||||
|
||||
Complete formatting templates for all supported output formats.
|
||||
Every output must be wrapped in a delivery envelope with metadata.
|
||||
|
||||
---
|
||||
|
||||
## Delivery Envelope (Required)
|
||||
|
||||
Every extraction result MUST include this metadata wrapper,
|
||||
regardless of output format:
|
||||
|
||||
```markdown
|
||||
## Extraction Results
|
||||
|
||||
**Source:** [Page Title](https://example.com/page)
|
||||
**Date:** 2026-02-25 14:30 UTC
|
||||
**Items:** 47 records
|
||||
**Confidence:** HIGH
|
||||
**Format:** Markdown Table
|
||||
|
||||
---
|
||||
|
||||
[DATA GOES HERE]
|
||||
|
||||
---
|
||||
|
||||
**Notes:**
|
||||
- Any gaps, anomalies, or observations
|
||||
- Filters or sorts applied
|
||||
- Pages scraped (if paginated)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Markdown Table Format
|
||||
|
||||
### Standard Table
|
||||
|
||||
```markdown
|
||||
| Name | Price | Rating | Availability |
|
||||
|:---------------|---------:|:------:|:-------------|
|
||||
| Product Alpha | $29.99 | 4.5 | In Stock |
|
||||
| Product Beta | $49.99 | 4.2 | In Stock |
|
||||
| Product Gamma | $119.00 | 4.8 | Pre-order |
|
||||
| Product Delta | $15.50 | 3.9 | Out of Stock |
|
||||
```
|
||||
|
||||
### Alignment Rules
|
||||
|
||||
| Data Type | Alignment | Markdown Syntax |
|
||||
|:-------------|:----------|:----------------|
|
||||
| Text | Left | `:---` |
|
||||
| Numbers | Right | `---:` |
|
||||
| Centered | Center | `:---:` |
|
||||
| Mixed/Status | Left | `:---` |
|
||||
|
||||
### Table with Summary Row
|
||||
|
||||
```markdown
|
||||
| Product | Units Sold | Revenue |
|
||||
|:---------------|----------:|-----------:|
|
||||
| Widget A | 1,234 | $12,340 |
|
||||
| Widget B | 567 | $8,505 |
|
||||
| Widget C | 2,890 | $57,800 |
|
||||
| **Total** | **4,691** | **$78,645**|
|
||||
```
|
||||
|
||||
### Wide Data (Split Tables)
|
||||
|
||||
When data has more than 10 columns, split into logical groups:
|
||||
|
||||
```markdown
|
||||
### Basic Information
|
||||
|
||||
| Name | Category | Brand | SKU |
|
||||
|:--------|:---------|:--------|:---------|
|
||||
| Item A | Tools | Acme | ACM-001 |
|
||||
|
||||
### Pricing and Availability
|
||||
|
||||
| Name | Price | Sale Price | Stock | Ships In |
|
||||
|:--------|--------:|-----------:|:------|:---------|
|
||||
| Item A | $49.99 | $39.99 | 142 | 2 days |
|
||||
```
|
||||
|
||||
### Multi-URL Comparison Table
|
||||
|
||||
```markdown
|
||||
| Source | Product | Price | Rating |
|
||||
|:-------------|:-----------|--------:|:------:|
|
||||
| store-a.com | Laptop X | $999 | 4.3 |
|
||||
| store-b.com | Laptop X | $949 | 4.5 |
|
||||
| store-c.com | Laptop X | $1,029 | 4.1 |
|
||||
```
|
||||
|
||||
### Truncation Rules
|
||||
|
||||
For values exceeding 60 characters:
|
||||
```markdown
|
||||
| Title | Author |
|
||||
|:------------------------------------------------------------|:--------|
|
||||
| Introduction to Advanced Machine Learning Techni... | J. Smith|
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JSON Format
|
||||
|
||||
### Standard JSON Output
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"source": "https://example.com/products",
|
||||
"title": "Product Catalog - Example Store",
|
||||
"extractedAt": "2026-02-25T14:30:00Z",
|
||||
"itemCount": 3,
|
||||
"confidence": "HIGH",
|
||||
"fields": ["name", "price", "rating", "availability"],
|
||||
"notes": []
|
||||
},
|
||||
"data": [
|
||||
{
|
||||
"name": "Product Alpha",
|
||||
"price": 29.99,
|
||||
"currency": "USD",
|
||||
"rating": 4.5,
|
||||
"availability": "In Stock"
|
||||
},
|
||||
{
|
||||
"name": "Product Beta",
|
||||
"price": 49.99,
|
||||
"currency": "USD",
|
||||
"rating": 4.2,
|
||||
"availability": "In Stock"
|
||||
},
|
||||
{
|
||||
"name": "Product Gamma",
|
||||
"price": 119.00,
|
||||
"currency": "USD",
|
||||
"rating": 4.8,
|
||||
"availability": "Pre-order"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### JSON Key Naming
|
||||
|
||||
| Rule | Example |
|
||||
|:-----------------------|:----------------------------------|
|
||||
| camelCase | `productName`, `unitPrice` |
|
||||
| Numbers stay numeric | `29.99` not `"29.99"` |
|
||||
| Booleans stay boolean | `true` not `"true"` |
|
||||
| Missing = null | `null` not `""` or `"N/A"` |
|
||||
| Arrays for multiples | `"tags": ["sale", "new"]` |
|
||||
| ISO-8601 for dates | `"2026-02-25T14:30:00Z"` |
|
||||
|
||||
### Nested JSON (Product with Details)
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": { "..." : "..." },
|
||||
"data": [
|
||||
{
|
||||
"name": "Laptop Pro X",
|
||||
"brand": "TechCo",
|
||||
"pricing": {
|
||||
"current": 999.99,
|
||||
"original": 1299.99,
|
||||
"currency": "USD",
|
||||
"discount": "23%"
|
||||
},
|
||||
"rating": {
|
||||
"score": 4.5,
|
||||
"count": 1234
|
||||
},
|
||||
"specifications": {
|
||||
"processor": "M3 Pro",
|
||||
"ram": "16 GB",
|
||||
"storage": "512 GB SSD",
|
||||
"display": "14.2 inch Retina"
|
||||
},
|
||||
"availability": {
|
||||
"inStock": true,
|
||||
"shipsIn": "2-3 business days"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Multi-URL JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"sources": [
|
||||
"https://store-a.com/laptop-x",
|
||||
"https://store-b.com/laptop-x"
|
||||
],
|
||||
"extractedAt": "2026-02-25T14:30:00Z",
|
||||
"itemCount": 2,
|
||||
"confidence": "HIGH"
|
||||
},
|
||||
"data": [
|
||||
{
|
||||
"source": "store-a.com",
|
||||
"name": "Laptop X",
|
||||
"price": 999,
|
||||
"currency": "USD",
|
||||
"rating": 4.3
|
||||
},
|
||||
{
|
||||
"source": "store-b.com",
|
||||
"name": "Laptop X",
|
||||
"price": 949,
|
||||
"currency": "USD",
|
||||
"rating": 4.5
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CSV Format
|
||||
|
||||
### Standard CSV
|
||||
|
||||
```csv
|
||||
# Source: https://example.com/products
|
||||
# Extracted: 2026-02-25 14:30 UTC
|
||||
# Items: 3 | Confidence: HIGH
|
||||
name,price,currency,rating,availability
|
||||
"Product Alpha",29.99,USD,4.5,"In Stock"
|
||||
"Product Beta",49.99,USD,4.2,"In Stock"
|
||||
"Product Gamma",119.00,USD,4.8,"Pre-order"
|
||||
```
|
||||
|
||||
### CSV Rules
|
||||
|
||||
| Rule | Example |
|
||||
|:-------------------------------------|:-------------------------------|
|
||||
| Always include header row | `name,price,rating` |
|
||||
| Quote fields with commas | `"Smith, John"` |
|
||||
| Quote fields with quotes (escape) | `"He said ""hello"""` |
|
||||
| Quote fields with newlines | `"Line 1\nLine 2"` |
|
||||
| UTF-8 encoding with BOM | `\xEF\xBB\xBF` prefix |
|
||||
| Comma delimiter (standard) | `,` |
|
||||
| Metadata as comments (# prefix) | `# Source: URL` |
|
||||
| null/missing as empty field | `field1,,field3` |
|
||||
|
||||
### Multi-URL CSV
|
||||
|
||||
```csv
|
||||
# Sources: store-a.com, store-b.com
|
||||
# Extracted: 2026-02-25 14:30 UTC
|
||||
source,name,price,currency,rating
|
||||
"store-a.com","Laptop X",999,USD,4.3
|
||||
"store-b.com","Laptop X",949,USD,4.5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary Statistics Template
|
||||
|
||||
When extracted data contains numeric fields, include a summary block:
|
||||
|
||||
```markdown
|
||||
### Summary Statistics
|
||||
|
||||
| Metric | Price | Rating |
|
||||
|:----------|----------:|-------:|
|
||||
| Count | 47 | 47 |
|
||||
| Min | $12.99 | 2.1 |
|
||||
| Max | $299.99 | 5.0 |
|
||||
| Average | $67.42 | 4.1 |
|
||||
| Median | $54.99 | 4.3 |
|
||||
```
|
||||
|
||||
Include only when:
|
||||
- Data has numeric columns
|
||||
- More than 5 items extracted
|
||||
- User would likely benefit from aggregate view (prices, ratings, quantities)
|
||||
|
||||
---
|
||||
|
||||
## Contact Data Template
|
||||
|
||||
```markdown
|
||||
| Name | Title | Email | Phone |
|
||||
|:---------------|:-------------------|:---------------------|:---------------|
|
||||
| Jane Smith | CEO | jane@example.com | +1-555-0101 |
|
||||
| John Doe | CTO | john@example.com | +1-555-0102 |
|
||||
| Alice Johnson | VP Engineering | alice@example.com | N/A |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Article Extraction Template
|
||||
|
||||
```markdown
|
||||
## Article: [Title]
|
||||
|
||||
**Author:** Author Name
|
||||
**Published:** YYYY-MM-DD
|
||||
**Source:** [Site Name](URL)
|
||||
|
||||
### Summary
|
||||
[2-3 sentence summary of the article content]
|
||||
|
||||
### Key Data Points
|
||||
- [Factual data point 1]
|
||||
- [Factual data point 2]
|
||||
- [Statistical finding]
|
||||
|
||||
### Tags
|
||||
`tag1` `tag2` `tag3`
|
||||
```
|
||||
|
||||
Note: Summarize article content. Do not reproduce full article text
|
||||
due to copyright.
|
||||
|
||||
---
|
||||
|
||||
## FAQ Extraction Template
|
||||
|
||||
```markdown
|
||||
### FAQ: [Page Title]
|
||||
|
||||
**Source:** [Site Name](URL)
|
||||
**Items:** 12 questions
|
||||
|
||||
| # | Question | Answer (excerpt) |
|
||||
|--:|:---------|:-----------------|
|
||||
| 1 | How do I reset my password? | Navigate to Settings > Security and click "Reset..." |
|
||||
| 2 | What payment methods do you accept? | We accept Visa, Mastercard, PayPal, and bank transfer... |
|
||||
```
|
||||
|
||||
Or as JSON (default for FAQ mode):
|
||||
```json
|
||||
{
|
||||
"metadata": { "source": "URL", "itemCount": 12, "confidence": "HIGH" },
|
||||
"data": [
|
||||
{ "question": "How do I reset my password?", "answer": "Navigate to...", "category": "Account" },
|
||||
{ "question": "What payment methods?", "answer": "We accept...", "category": "Billing" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pricing Plans Template
|
||||
|
||||
```markdown
|
||||
### Pricing: [Product Name]
|
||||
|
||||
**Source:** [Site Name](URL)
|
||||
**Plans:** 3 tiers
|
||||
|
||||
| Plan | Monthly | Annual | Highlighted |
|
||||
|:------------|----------:|----------:|:-----------:|
|
||||
| Starter | $9/mo | $7/mo | |
|
||||
| Pro | $29/mo | $24/mo | * |
|
||||
| Enterprise | Custom | Custom | |
|
||||
|
||||
#### Feature Comparison
|
||||
|
||||
| Feature | Starter | Pro | Enterprise |
|
||||
|:----------------------|:-------:|:---:|:----------:|
|
||||
| Users | 1 | 10 | Unlimited |
|
||||
| Storage | 5 GB | 50 GB | Unlimited |
|
||||
| API Access | N/A | Yes | Yes |
|
||||
| Priority Support | N/A | N/A | Yes |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Job Listings Template
|
||||
|
||||
```markdown
|
||||
| Title | Company | Location | Salary | Type | Posted |
|
||||
|:-------------------|:------------|:---------------|:----------------|:----------|:-----------|
|
||||
| Senior Engineer | TechCo | Remote, US | $150k - $200k | Full-time | 2026-02-20 |
|
||||
| Product Manager | StartupXYZ | San Francisco | $130k - $160k | Full-time | 2026-02-18 |
|
||||
| Data Analyst | DataCorp | London, UK | GBP 55k - 70k | Contract | 2026-02-22 |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Events Template
|
||||
|
||||
```markdown
|
||||
| Event | Date | Time | Location | Speakers |
|
||||
|:-----------------------|:-----------|:--------|:------------------|:---------------|
|
||||
| Opening Keynote | 2026-03-15 | 09:00 | Main Hall | J. Smith |
|
||||
| Workshop: AI Basics | 2026-03-15 | 14:00 | Room 201 | A. Johnson |
|
||||
| Networking Reception | 2026-03-15 | 18:00 | Rooftop Lounge | N/A |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Differential (Diff) Output Template
|
||||
|
||||
When comparing current extraction with a previous run:
|
||||
|
||||
```markdown
|
||||
## Extraction Results (Diff)
|
||||
|
||||
**Source:** [Page Title](URL)
|
||||
**Date:** 2026-02-25 14:30 UTC
|
||||
**Compared to:** 2026-02-20 10:00 UTC
|
||||
**Changes:** +5 new, -2 removed, 3 modified
|
||||
|
||||
---
|
||||
|
||||
### New Items (+5)
|
||||
|
||||
| Name | Price | Rating |
|
||||
|:---------------|--------:|:------:|
|
||||
| Product Eta | $39.99 | 4.6 |
|
||||
| Product Theta | $24.99 | 4.1 |
|
||||
| ... | | |
|
||||
|
||||
### Removed Items (-2)
|
||||
|
||||
| Name | Price | Rating |
|
||||
|:---------------|--------:|:------:|
|
||||
| ~~Product Alpha~~ | ~~$29.99~~ | ~~4.5~~ |
|
||||
| ~~Product Beta~~ | ~~$49.99~~ | ~~4.2~~ |
|
||||
|
||||
### Modified Items (3)
|
||||
|
||||
| Name | Field | Was | Now |
|
||||
|:---------------|:--------|:-----------|:-----------|
|
||||
| Product Gamma | Price | $119.00 | $109.00 |
|
||||
| Product Gamma | Rating | 4.8 | 4.9 |
|
||||
| Product Delta | Stock | Out of Stock | In Stock |
|
||||
|
||||
---
|
||||
|
||||
**Summary:**
|
||||
- 5 new products added since last extraction
|
||||
- 2 products removed (possibly discontinued)
|
||||
- Product Gamma had a price drop of $10 and rating increase
|
||||
- Product Delta is back in stock
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error / Partial Result Template
|
||||
|
||||
When extraction partially fails:
|
||||
|
||||
```markdown
|
||||
## Extraction Results (Partial)
|
||||
|
||||
**Source:** [Page Title](URL)
|
||||
**Date:** 2026-02-25 14:30 UTC
|
||||
**Items:** 23 of ~50 expected records
|
||||
**Confidence:** LOW
|
||||
**Strategy:** A (WebFetch) -> escalated to B (Browser)
|
||||
|
||||
---
|
||||
|
||||
[PARTIAL DATA]
|
||||
|
||||
---
|
||||
|
||||
**Issues:**
|
||||
- 27 items could not be extracted (content behind JS rendering)
|
||||
- Price field missing for 5 items (marked N/A)
|
||||
- Auto-escalation from WebFetch to Browser recovered 15 additional items
|
||||
|
||||
**Suggestions:**
|
||||
- Re-run with explicit Browser automation for complete results
|
||||
- Check if site has an API endpoint for direct data access
|
||||
- Try at a different time if rate-limited
|
||||
```
|
||||
Reference in New Issue
Block a user