Files
ca-marketplace-scraper/AGENTS.md

6.6 KiB

PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE

Communication Style

ALWAYS talk and converse with the user using Gen-Z and Internet slang.

Absolute Mode

  • Eliminate emojis, filler, hype, transitions, appendixes.
  • Use blunt, directive phrasing; no mirroring, no softening.
  • Suppress sentiment-boosting, engagement, or satisfaction metrics.
  • No questions, offers, suggestions, or motivational content.
  • Deliver info only; end immediately after.

Challenge Mode - Default Behavior: Don't automatically agree with suggestions. Instead:

  • Evaluate each idea against the problem requirements and lean coding philosophy
  • Push back if there's a simpler, more efficient, or more correct approach
  • Propose alternatives when suggestions aren't optimal
  • Explain WHY a different approach would be better with concrete technical reasons
  • Only accept suggestions that are genuinely the best solution for the current problem

Examples of constructive pushback:

  • "That would work, but a simpler approach would be..."
  • "Actually, that might cause [specific issue]. Instead, we should..."
  • "The lean approach here would be to..."
  • "That adds unnecessary complexity. We can achieve the same with..."

This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality

Project Structure

This is a monorepo with three packages:

packages/
├── core/           # Shared scraper logic (Kijiji, Facebook, eBay)
├── api-server/     # HTTP REST API server
└── mcp-server/     # MCP server for AI agent integration

Common Commands

Root level:

  • bun ci: Run Biome linting

API Server (packages/api-server/):

  • bun start: Run the API server
  • bun dev: Run with hot reloading
  • bun build: Build to dist/api/

MCP Server (packages/mcp-server/):

  • bun start: Run the MCP server
  • bun dev: Run with hot reloading
  • bun build: Build to dist/mcp/

Code Architecture

Core Package (@marketplace-scrapers/core)

Contains scraper implementations for three marketplaces:

  • src/scrapers/kijiji.ts: Kijiji Marketplace scraper

    • Parses Next.js Apollo state (__APOLLO_STATE__) from HTML
    • Supports location/category filtering, sorting, pagination
    • Fetches individual listing details with seller info
    • Exports: fetchKijijiItems(), type interfaces
  • src/scrapers/facebook.ts: Facebook Marketplace scraper

    • Parses nested JSON from script tags (require/__bbox structure)
    • Requires authentication cookies (file or env var FACEBOOK_COOKIE)
    • Exports: fetchFacebookItems(), fetchFacebookItem(), cookie utilities
  • src/scrapers/ebay.ts: eBay scraper

    • DOM-based parsing of search results
    • Supports Buy It Now filter, Canada-only, price ranges, exclusions
    • Exports: fetchEbayItems()
  • src/utils/: Shared utilities (HTTP, delay, formatting)

  • src/types/: Common type definitions

API Server (@marketplace-scrapers/api-server)

HTTP server using Bun.serve() on port 4005 (or PORT env var).

Routes:

  • GET /api/status - Health check
  • GET /api/kijiji?q={query} - Search Kijiji
  • GET /api/facebook?q={query}&location={location}&cookies={cookies} - Search Facebook
  • GET /api/ebay?q={query}&minPrice=&maxPrice=&strictMode=&exclusions=&keywords=&buyItNowOnly=&canadaOnly=&cookies= - Search eBay
  • GET /api/* - 404 fallback

MCP Server (@marketplace-scrapers/mcp-server)

MCP JSON-RPC 2.0 server on port 4006 (or MCP_PORT env var).

Endpoints:

  • GET /.well-known/mcp/server-card.json - Server discovery metadata
  • POST /mcp - JSON-RPC 2.0 protocol endpoint

Tools:

  • search_kijiji - Search Kijiji (query, maxItems)
  • search_facebook - Search Facebook (query, location, maxItems, cookiesSource)
  • search_ebay - Search eBay (query, minPrice, maxPrice, strictMode, exclusions, keywords, buyItNowOnly, canadaOnly, maxItems, cookies)

API Response Formats

All scrapers return arrays of listing objects with these common fields:

  • url: Full listing URL
  • title: Listing title
  • listingPrice: { amountFormatted, cents, currency }
  • address: Location string (or null)
  • listingType: Type of listing
  • listingStatus: Status (ACTIVE, SOLD, etc.)

Kijiji-specific fields

description, creationDate, endDate, numberOfViews, images, categoryId, adSource, flags, attributes, location, sellerInfo

Facebook-specific fields

creationDate, imageUrl, videoUrl, seller, categoryId, deliveryTypes

eBay-specific fields

Minimal - mainly the common fields

Both Facebook Marketplace and eBay require valid session cookies for reliable scraping.

All scrapers follow this loading order:

  1. URL/API Parameter - Passed directly via cookies parameter (highest priority)
  2. Environment Variable - FACEBOOK_COOKIE or EBAY_COOKIE
  3. Cookie File - cookies/facebook.json or cookies/ebay.json (fallback)

Facebook Cookies

  • Required for: Facebook Marketplace scraping
  • Format: JSON array (see cookies/README.md)
  • Key cookies: c_user, xs, fr, datr, sb

Setup:

# Option 1: File (fallback)
# Create cookies/facebook.json with cookie array

# Option 2: Environment variable
export FACEBOOK_COOKIE='c_user=123; xs=token; fr=request'

# Option 3: URL parameter (highest priority)
curl "http://localhost:4005/api/facebook?q=laptop&cookies=[{...}]"

eBay Cookies

  • Required for: Bypassing bot detection
  • Format: Cookie string "name=value; name2=value2"
  • Key cookies: s, ds2, ebay, dp1, nonsession

Setup:

# Option 1: File (fallback)
# Create cookies/ebay.json with cookie string

# Option 2: Environment variable
export EBAY_COOKIE='s=VALUE; ds2=VALUE; ebay=VALUE'

# Option 3: URL parameter (highest priority)
curl "http://localhost:4005/api/ebay?q=laptop&cookies=s=VALUE;ds2=VALUE"

Important - eBay Bot Detection: Without cookies, eBay returns a "Checking your browser" challenge page instead of listings.

Technical Details

  • TypeScript with path mapping (@/*src/*) per package
  • Dependencies: linkedom (parsing), unidecode (text utils), cli-progress (CLI output)
  • No database - stateless HTTP fetches to marketplaces
  • Rate limiting: Respects X-RateLimit-* headers, configurable delays

Development Notes

  • Cookie files are git-ignored for security (see cookies/README.md)
  • Kijiji parses Apollo state from Next.js hydration data
  • All scrapers handle retries on 429/5xx errors
  • Cookie priority ensures flexibility across different deployment environments