Files
ca-marketplace-scraper/AGENTS.md

179 lines
6.6 KiB
Markdown

PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE
## Communication Style
ALWAYS talk and converse with the user using Gen-Z and Internet slang.
Absolute Mode
- Eliminate emojis, filler, hype, transitions, appendixes.
- Use blunt, directive phrasing; no mirroring, no softening.
- Suppress sentiment-boosting, engagement, or satisfaction metrics.
- No questions, offers, suggestions, or motivational content.
- Deliver info only; end immediately after.
**Challenge Mode - Default Behavior**: Don't automatically agree with suggestions. Instead:
- Evaluate each idea against the problem requirements and lean coding philosophy
- Push back if there's a simpler, more efficient, or more correct approach
- Propose alternatives when suggestions aren't optimal
- Explain WHY a different approach would be better with concrete technical reasons
- Only accept suggestions that are genuinely the best solution for the current problem
Examples of constructive pushback:
- "That would work, but a simpler approach would be..."
- "Actually, that might cause [specific issue]. Instead, we should..."
- "The lean approach here would be to..."
- "That adds unnecessary complexity. We can achieve the same with..."
This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality
## Project Structure
This is a **monorepo** with three packages:
```
packages/
├── core/ # Shared scraper logic (Kijiji, Facebook, eBay)
├── api-server/ # HTTP REST API server
└── mcp-server/ # MCP server for AI agent integration
```
## Common Commands
**Root level:**
- `bun ci`: Run Biome linting
**API Server (`packages/api-server/`):**
- `bun start`: Run the API server
- `bun dev`: Run with hot reloading
- `bun build`: Build to `dist/api/`
**MCP Server (`packages/mcp-server/`):**
- `bun start`: Run the MCP server
- `bun dev`: Run with hot reloading
- `bun build`: Build to `dist/mcp/`
## Code Architecture
### Core Package (`@marketplace-scrapers/core`)
Contains scraper implementations for three marketplaces:
- **`src/scrapers/kijiji.ts`**: Kijiji Marketplace scraper
- Parses Next.js Apollo state (`__APOLLO_STATE__`) from HTML
- Supports location/category filtering, sorting, pagination
- Fetches individual listing details with seller info
- Exports: `fetchKijijiItems()`, type interfaces
- **`src/scrapers/facebook.ts`**: Facebook Marketplace scraper
- Parses nested JSON from script tags (`require/__bbox` structure)
- Requires authentication cookies (file or env var `FACEBOOK_COOKIE`)
- Exports: `fetchFacebookItems()`, `fetchFacebookItem()`, cookie utilities
- **`src/scrapers/ebay.ts`**: eBay scraper
- DOM-based parsing of search results
- Supports Buy It Now filter, Canada-only, price ranges, exclusions
- Exports: `fetchEbayItems()`
- **`src/utils/`**: Shared utilities (HTTP, delay, formatting)
- **`src/types/`**: Common type definitions
### API Server (`@marketplace-scrapers/api-server`)
HTTP server using `Bun.serve()` on port 4005 (or `PORT` env var).
**Routes:**
- `GET /api/status` - Health check
- `GET /api/kijiji?q={query}` - Search Kijiji
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}` - Search Facebook
- `GET /api/ebay?q={query}&minPrice=&maxPrice=&strictMode=&exclusions=&keywords=&buyItNowOnly=&canadaOnly=&cookies=` - Search eBay
- `GET /api/*` - 404 fallback
### MCP Server (`@marketplace-scrapers/mcp-server`)
MCP JSON-RPC 2.0 server on port 4006 (or `MCP_PORT` env var).
**Endpoints:**
- `GET /.well-known/mcp/server-card.json` - Server discovery metadata
- `POST /mcp` - JSON-RPC 2.0 protocol endpoint
**Tools:**
- `search_kijiji` - Search Kijiji (query, maxItems)
- `search_facebook` - Search Facebook (query, location, maxItems, cookiesSource)
- `search_ebay` - Search eBay (query, minPrice, maxPrice, strictMode, exclusions, keywords, buyItNowOnly, canadaOnly, maxItems, cookies)
## API Response Formats
All scrapers return arrays of listing objects with these common fields:
- `url`: Full listing URL
- `title`: Listing title
- `listingPrice`: `{ amountFormatted, cents, currency }`
- `address`: Location string (or null)
- `listingType`: Type of listing
- `listingStatus`: Status (ACTIVE, SOLD, etc.)
### Kijiji-specific fields
`description`, `creationDate`, `endDate`, `numberOfViews`, `images`, `categoryId`, `adSource`, `flags`, `attributes`, `location`, `sellerInfo`
### Facebook-specific fields
`creationDate`, `imageUrl`, `videoUrl`, `seller`, `categoryId`, `deliveryTypes`
### eBay-specific fields
Minimal - mainly the common fields
## Cookie Management
Both **Facebook Marketplace** and **eBay** require valid session cookies for reliable scraping.
### Cookie Priority Hierarchy (High → Low)
All scrapers follow this loading order:
1. **URL/API Parameter** - Passed directly via `cookies` parameter (highest priority)
2. **Environment Variable** - `FACEBOOK_COOKIE` or `EBAY_COOKIE`
3. **Cookie File** - `cookies/facebook.json` or `cookies/ebay.json` (fallback)
### Facebook Cookies
- **Required for**: Facebook Marketplace scraping
- **Format**: JSON array (see `cookies/README.md`)
- **Key cookies**: `c_user`, `xs`, `fr`, `datr`, `sb`
**Setup:**
```bash
# Option 1: File (fallback)
# Create cookies/facebook.json with cookie array
# Option 2: Environment variable
export FACEBOOK_COOKIE='c_user=123; xs=token; fr=request'
# Option 3: URL parameter (highest priority)
curl "http://localhost:4005/api/facebook?q=laptop&cookies=[{...}]"
```
### eBay Cookies
- **Required for**: Bypassing bot detection
- **Format**: Cookie string `"name=value; name2=value2"`
- **Key cookies**: `s`, `ds2`, `ebay`, `dp1`, `nonsession`
**Setup:**
```bash
# Option 1: File (fallback)
# Create cookies/ebay.json with cookie string
# Option 2: Environment variable
export EBAY_COOKIE='s=VALUE; ds2=VALUE; ebay=VALUE'
# Option 3: URL parameter (highest priority)
curl "http://localhost:4005/api/ebay?q=laptop&cookies=s=VALUE;ds2=VALUE"
```
**Important - eBay Bot Detection**: Without cookies, eBay returns a "Checking your browser" challenge page instead of listings.
## Technical Details
- **TypeScript** with path mapping (`@/*``src/*`) per package
- **Dependencies**: linkedom (parsing), unidecode (text utils), cli-progress (CLI output)
- **No database** - stateless HTTP fetches to marketplaces
- **Rate limiting**: Respects `X-RateLimit-*` headers, configurable delays
## Development Notes
- **Cookie files** are git-ignored for security (see `cookies/README.md`)
- Kijiji parses Apollo state from Next.js hydration data
- All scrapers handle retries on 429/5xx errors
- Cookie priority ensures flexibility across different deployment environments