Files
ca-marketplace-scraper/CLAUDE.md
2025-10-02 10:39:49 -04:00

3.0 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Commands

  • bun start: Run the server in production mode.
  • bun dev: Run the server with hot reloading for development.
  • bun build: Build the application into a single executable file.

No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed.

Code Architecture

This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA).

  • Entry Point (src/index.ts): Implements a basic HTTP server using Bun.serve. Key routes:

    • GET /api/status: Health check returning "OK".
    • POST/GET /api/kijiji: Accepts a search query via header (query) or param (q), scrapes Kijiji for up to 5 results (configurable), and returns JSON with listing details (title, price, description, etc.).
    • POST/GET /api/facebook: Similar to Kijiji, but for Facebook Marketplace. Optional location param (default "toronto"). Note: Requires authentication cookies for full access.
    • Fallback: 404 for unmatched routes.
  • Kijiji Scraping (src/kijiji.ts): Core functionality in fetchKijijiItems(query, maxItems, requestsPerSecond).

    • Slugifies the query using unidecode for URL-safe search terms.
    • Fetches the search page HTML, parses Next.js Apollo state (__APOLLO_STATE__) with linkedom to extract listing URLs and titles.
    • For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.).
    • Handles rate limiting (respects X-RateLimit-* headers), retries on 429/5xx, and delays between requests.
    • Uses cli-progress for console progress bar during batch fetches.
    • Filters results to include only priced items.
  • Facebook Scraping (src/facebook.ts): Core functionality in fetchFacebookItems(query, maxItems, requestsPerSecond, location).

    • Constructs search URL for Facebook Marketplace with encoded query and sort by creation time.
    • Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with linkedom to extract ad nodes from marketplace_search.feed_units.edges.
    • Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed.
    • Handles delays and retries similar to Kijiji.
    • Uses cli-progress for progress.
    • Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login.

The project uses TypeScript with path mapping (@/* to src/*). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.

Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements.