Files
ca-marketplace-scraper/CLAUDE.md
Dmytro Stanchiev 8c52efe5e7 feat(facebook): parse additional listing details like status, images, and seller info
Enhance Facebook scraping to extract listing status (ACTIVE/SOLD/PENDING/HIDDEN), primary image/video URLs, seller name/ID, category ID, and delivery options, improving response completeness.
2025-10-02 12:03:59 -04:00

5.4 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Commands

  • bun start: Run the server in production mode.
  • bun dev: Run the server with hot reloading for development.
  • bun build: Build the application into a single executable file.

No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed.

Code Architecture

This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA).

  • Entry Point (src/index.ts): Implements a basic HTTP server using Bun.serve. Key routes:
    • GET /api/status: Health check returning "OK".
    • GET /api/kijiji?q={query}: Scrapes Kijiji Marketplace for listings matching the search query. Returns JSON array of listing objects.
    • GET /api/facebook?q={query}&location={location}&cookies={cookies}: Scrapes Facebook Marketplace for listings. Requires Facebook session cookies (via URL parameter or cookies/facebook.json file). Optional location param (default "toronto"). Returns JSON array of listing objects.
    • Fallback: 404 for unmatched routes.

API Response Formats

Both APIs return arrays of listing objects, but the available fields differ based on each marketplace's data availability.

Kijiji API Response Object

{
  "url": "https://www.kijiji.ca/v-laptops/city-of-toronto/...",
  "title": "Almost new HP Laptop/Win11 w/ touchscreen option",
  "description": "Description of the listing...",
  "listingPrice": {
    "amountFormatted": "149.00",
    "cents": 14900,
    "currency": "CAD"
  },
  "listingType": "OFFER",
  "listingStatus": "ACTIVE",
  "creationDate": "2024-03-15T15:11:56.000Z",
  "endDate": "3000-01-01T00:00:00.000Z",
  "numberOfViews": 2005,
  "address": "SPADINA AVENUE, Toronto, ON, M5T 2H7"
}

Facebook API Response Object

{
  "url": "https://www.facebook.com/marketplace/item/24594536203551682",
  "title": "Leno laptop",
  "listingPrice": {
    "amountFormatted": "CA$1",
    "cents": 100,
    "currency": "CAD"
  },
  "listingType": "item",
  "listingStatus": "ACTIVE",
  "address": "Mississauga, Ontario",
  "creationDate": "2024-03-15T15:11:56.000Z",
  "categoryId": "1792291877663080",
  "imageUrl": "https://scontent-yyz1-1.xx.fbcdn.net/...",
  "videoUrl": "https://www.facebook.com/1300609777949414/",
  "seller": {
    "name": "Joyce Diaz",
    "id": "100091799187797"
  },
  "deliveryTypes": ["IN_PERSON"]
}

Common Fields

  • url: Full URL to the listing
  • title: Listing title
  • listingPrice: Price object with amountFormatted (human-readable), cents (integer cents), currency (e.g., "CAD")
  • address: Location string (or null if unavailable)

Kijiji-Only Fields

  • description: Detailed description text (Facebook search results don't include descriptions)
  • endDate: When listing expires (Facebook doesn't have expiration dates in search results)
  • numberOfViews: View count (Facebook doesn't expose view metrics in search results)

Facebook-Only Fields

  • listingStatus: Derived from is_live, is_pending, is_sold, is_hidden states ("ACTIVE", "SOLD", "PENDING", "HIDDEN")

  • creationDate: When listing was posted (when available)

  • categoryId: Facebook marketplace category identifier

  • imageUrl: Primary listing photo URL

  • videoUrl: Listing video URL (if video exists)

  • seller: Object with seller name and Facebook user ID

  • deliveryTypes: Available delivery options (e.g., ["IN_PERSON", "SHIPPING"])

  • Kijiji Scraping (src/kijiji.ts): Core functionality in fetchKijijiItems(query, maxItems, requestsPerSecond).

    • Slugifies the query using unidecode for URL-safe search terms.
    • Fetches the search page HTML, parses Next.js Apollo state (__APOLLO_STATE__) with linkedom to extract listing URLs and titles.
    • For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.).
    • Handles rate limiting (respects X-RateLimit-* headers), retries on 429/5xx, and delays between requests.
    • Uses cli-progress for console progress bar during batch fetches.
    • Filters results to include only priced items.
  • Facebook Scraping (src/facebook.ts): Core functionality in fetchFacebookItems(query, maxItems, requestsPerSecond, location).

    • Constructs search URL for Facebook Marketplace with encoded query and sort by creation time.
    • Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with linkedom to extract ad nodes from marketplace_search.feed_units.edges.
    • Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed.
    • Handles delays and retries similar to Kijiji.
    • Uses cli-progress for progress.
    • Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login.

The project uses TypeScript with path mapping (@/* to src/*). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.

Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements.