Files
ca-marketplace-scraper/AGENTS.md
2026-01-23 00:52:35 -05:00

133 lines
5.1 KiB
Markdown

PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE
## Communication Style
ALWAYS talk and converse with the user using Gen-Z and Internet slang.
Absolute Mode
- Eliminate emojis, filler, hype, transitions, appendixes.
- Use blunt, directive phrasing; no mirroring, no softening.
- Suppress sentiment-boosting, engagement, or satisfaction metrics.
- No questions, offers, suggestions, or motivational content.
- Deliver info only; end immediately after.
**Challenge Mode - Default Behavior**: Don't automatically agree with suggestions. Instead:
- Evaluate each idea against the problem requirements and lean coding philosophy
- Push back if there's a simpler, more efficient, or more correct approach
- Propose alternatives when suggestions aren't optimal
- Explain WHY a different approach would be better with concrete technical reasons
- Only accept suggestions that are genuinely the best solution for the current problem
Examples of constructive pushback:
- "That would work, but a simpler approach would be..."
- "Actually, that might cause [specific issue]. Instead, we should..."
- "The lean approach here would be to..."
- "That adds unnecessary complexity. We can achieve the same with..."
This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality
## Project Structure
This is a **monorepo** with three packages:
```
packages/
├── core/ # Shared scraper logic (Kijiji, Facebook, eBay)
├── api-server/ # HTTP REST API server
└── mcp-server/ # MCP server for AI agent integration
```
## Common Commands
**Root level:**
- `bun ci`: Run Biome linting
**API Server (`packages/api-server/`):**
- `bun start`: Run the API server
- `bun dev`: Run with hot reloading
- `bun build`: Build to `dist/api/`
**MCP Server (`packages/mcp-server/`):**
- `bun start`: Run the MCP server
- `bun dev`: Run with hot reloading
- `bun build`: Build to `dist/mcp/`
## Code Architecture
### Core Package (`@marketplace-scrapers/core`)
Contains scraper implementations for three marketplaces:
- **`src/scrapers/kijiji.ts`**: Kijiji Marketplace scraper
- Parses Next.js Apollo state (`__APOLLO_STATE__`) from HTML
- Supports location/category filtering, sorting, pagination
- Fetches individual listing details with seller info
- Exports: `fetchKijijiItems()`, type interfaces
- **`src/scrapers/facebook.ts`**: Facebook Marketplace scraper
- Parses nested JSON from script tags (`require/__bbox` structure)
- Requires authentication cookies (file or env var `FACEBOOK_COOKIE`)
- Exports: `fetchFacebookItems()`, `fetchFacebookItem()`, cookie utilities
- **`src/scrapers/ebay.ts`**: eBay scraper
- DOM-based parsing of search results
- Supports Buy It Now filter, Canada-only, price ranges, exclusions
- Exports: `fetchEbayItems()`
- **`src/utils/`**: Shared utilities (HTTP, delay, formatting)
- **`src/types/`**: Common type definitions
### API Server (`@marketplace-scrapers/api-server`)
HTTP server using `Bun.serve()` on port 4005 (or `PORT` env var).
**Routes:**
- `GET /api/status` - Health check
- `GET /api/kijiji?q={query}` - Search Kijiji
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}` - Search Facebook
- `GET /api/ebay?q={query}&minPrice=&maxPrice=&strictMode=&exclusions=&keywords=&buyItNowOnly=&canadaOnly=` - Search eBay
- `GET /api/*` - 404 fallback
### MCP Server (`@marketplace-scrapers/mcp-server`)
MCP JSON-RPC 2.0 server on port 4006 (or `MCP_PORT` env var).
**Endpoints:**
- `GET /.well-known/mcp/server-card.json` - Server discovery metadata
- `POST /mcp` - JSON-RPC 2.0 protocol endpoint
**Tools:**
- `search_kijiji` - Search Kijiji (query, maxItems)
- `search_facebook` - Search Facebook (query, location, maxItems, cookiesSource)
- `search_ebay` - Search eBay (query, minPrice, maxPrice, strictMode, exclusions, keywords, buyItNowOnly, canadaOnly, maxItems)
## API Response Formats
All scrapers return arrays of listing objects with these common fields:
- `url`: Full listing URL
- `title`: Listing title
- `listingPrice`: `{ amountFormatted, cents, currency }`
- `address`: Location string (or null)
- `listingType`: Type of listing
- `listingStatus`: Status (ACTIVE, SOLD, etc.)
### Kijiji-specific fields
`description`, `creationDate`, `endDate`, `numberOfViews`, `images`, `categoryId`, `adSource`, `flags`, `attributes`, `location`, `sellerInfo`
### Facebook-specific fields
`creationDate`, `imageUrl`, `videoUrl`, `seller`, `categoryId`, `deliveryTypes`
### eBay-specific fields
Minimal - mainly the common fields
## Technical Details
- **TypeScript** with path mapping (`@/*``src/*`) per package
- **Dependencies**: linkedom (parsing), unidecode (text utils), cli-progress (CLI output)
- **No database** - stateless HTTP fetches to marketplaces
- **Rate limiting**: Respects `X-RateLimit-*` headers, configurable delays
## Development Notes
- Facebook requires valid session cookies - set `FACEBOOK_COOKIE` env var or create `cookies/facebook.json`
- eBay uses custom headers to bypass basic bot detection
- Kijiji parses Apollo state from Next.js hydration data
- All scrapers handle retries on 429/5xx errors