Enhance Facebook scraping to extract listing status (ACTIVE/SOLD/PENDING/HIDDEN), primary image/video URLs, seller name/ID, category ID, and delivery options, improving response completeness.
5.4 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Common Commands
bun start: Run the server in production mode.bun dev: Run the server with hot reloading for development.bun build: Build the application into a single executable file.
No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed.
Code Architecture
This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA).
- Entry Point (
src/index.ts): Implements a basic HTTP server usingBun.serve. Key routes:GET /api/status: Health check returning "OK".GET /api/kijiji?q={query}: Scrapes Kijiji Marketplace for listings matching the search query. Returns JSON array of listing objects.GET /api/facebook?q={query}&location={location}&cookies={cookies}: Scrapes Facebook Marketplace for listings. Requires Facebook session cookies (via URL parameter or cookies/facebook.json file). Optionallocationparam (default "toronto"). Returns JSON array of listing objects.- Fallback: 404 for unmatched routes.
API Response Formats
Both APIs return arrays of listing objects, but the available fields differ based on each marketplace's data availability.
Kijiji API Response Object
{
"url": "https://www.kijiji.ca/v-laptops/city-of-toronto/...",
"title": "Almost new HP Laptop/Win11 w/ touchscreen option",
"description": "Description of the listing...",
"listingPrice": {
"amountFormatted": "149.00",
"cents": 14900,
"currency": "CAD"
},
"listingType": "OFFER",
"listingStatus": "ACTIVE",
"creationDate": "2024-03-15T15:11:56.000Z",
"endDate": "3000-01-01T00:00:00.000Z",
"numberOfViews": 2005,
"address": "SPADINA AVENUE, Toronto, ON, M5T 2H7"
}
Facebook API Response Object
{
"url": "https://www.facebook.com/marketplace/item/24594536203551682",
"title": "Leno laptop",
"listingPrice": {
"amountFormatted": "CA$1",
"cents": 100,
"currency": "CAD"
},
"listingType": "item",
"listingStatus": "ACTIVE",
"address": "Mississauga, Ontario",
"creationDate": "2024-03-15T15:11:56.000Z",
"categoryId": "1792291877663080",
"imageUrl": "https://scontent-yyz1-1.xx.fbcdn.net/...",
"videoUrl": "https://www.facebook.com/1300609777949414/",
"seller": {
"name": "Joyce Diaz",
"id": "100091799187797"
},
"deliveryTypes": ["IN_PERSON"]
}
Common Fields
url: Full URL to the listingtitle: Listing titlelistingPrice: Price object withamountFormatted(human-readable),cents(integer cents),currency(e.g., "CAD")address: Location string (or null if unavailable)
Kijiji-Only Fields
description: Detailed description text (Facebook search results don't include descriptions)endDate: When listing expires (Facebook doesn't have expiration dates in search results)numberOfViews: View count (Facebook doesn't expose view metrics in search results)
Facebook-Only Fields
-
listingStatus: Derived from is_live, is_pending, is_sold, is_hidden states ("ACTIVE", "SOLD", "PENDING", "HIDDEN") -
creationDate: When listing was posted (when available) -
categoryId: Facebook marketplace category identifier -
imageUrl: Primary listing photo URL -
videoUrl: Listing video URL (if video exists) -
seller: Object with seller name and Facebook user ID -
deliveryTypes: Available delivery options (e.g., ["IN_PERSON", "SHIPPING"]) -
Kijiji Scraping (
src/kijiji.ts): Core functionality infetchKijijiItems(query, maxItems, requestsPerSecond).- Slugifies the query using
unidecodefor URL-safe search terms. - Fetches the search page HTML, parses Next.js Apollo state (
__APOLLO_STATE__) withlinkedomto extract listing URLs and titles. - For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.).
- Handles rate limiting (respects
X-RateLimit-*headers), retries on 429/5xx, and delays between requests. - Uses
cli-progressfor console progress bar during batch fetches. - Filters results to include only priced items.
- Slugifies the query using
-
Facebook Scraping (
src/facebook.ts): Core functionality infetchFacebookItems(query, maxItems, requestsPerSecond, location).- Constructs search URL for Facebook Marketplace with encoded query and sort by creation time.
- Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with
linkedomto extract ad nodes frommarketplace_search.feed_units.edges. - Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed.
- Handles delays and retries similar to Kijiji.
- Uses
cli-progressfor progress. - Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login.
The project uses TypeScript with path mapping (@/* to src/*). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements.