From da23ca1c3f1d7d4a73aa676315f8f94b725f503d Mon Sep 17 00:00:00 2001 From: Dmytro Stanchiev Date: Fri, 23 Jan 2026 00:52:35 -0500 Subject: [PATCH] chore: agents update --- AGENTS.md | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++--- CLAUDE.md | 111 +----------------------------------------------------- 2 files changed, 106 insertions(+), 116 deletions(-) mode change 100644 => 120000 CLAUDE.md diff --git a/AGENTS.md b/AGENTS.md index 1a015a5..91b6d91 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,9 +1,3 @@ -# AGENTS.md - -This file provides guidance to coding agents when working with code in this repository. - -The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces. - PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE ## Communication Style @@ -31,3 +25,108 @@ Examples of constructive pushback: - "That adds unnecessary complexity. We can achieve the same with..." This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality + +## Project Structure + +This is a **monorepo** with three packages: + +``` +packages/ +├── core/ # Shared scraper logic (Kijiji, Facebook, eBay) +├── api-server/ # HTTP REST API server +└── mcp-server/ # MCP server for AI agent integration +``` + +## Common Commands + +**Root level:** +- `bun ci`: Run Biome linting + +**API Server (`packages/api-server/`):** +- `bun start`: Run the API server +- `bun dev`: Run with hot reloading +- `bun build`: Build to `dist/api/` + +**MCP Server (`packages/mcp-server/`):** +- `bun start`: Run the MCP server +- `bun dev`: Run with hot reloading +- `bun build`: Build to `dist/mcp/` + +## Code Architecture + +### Core Package (`@marketplace-scrapers/core`) +Contains scraper implementations for three marketplaces: + +- **`src/scrapers/kijiji.ts`**: Kijiji Marketplace scraper + - Parses Next.js Apollo state (`__APOLLO_STATE__`) from HTML + - Supports location/category filtering, sorting, pagination + - Fetches individual listing details with seller info + - Exports: `fetchKijijiItems()`, type interfaces + +- **`src/scrapers/facebook.ts`**: Facebook Marketplace scraper + - Parses nested JSON from script tags (`require/__bbox` structure) + - Requires authentication cookies (file or env var `FACEBOOK_COOKIE`) + - Exports: `fetchFacebookItems()`, `fetchFacebookItem()`, cookie utilities + +- **`src/scrapers/ebay.ts`**: eBay scraper + - DOM-based parsing of search results + - Supports Buy It Now filter, Canada-only, price ranges, exclusions + - Exports: `fetchEbayItems()` + +- **`src/utils/`**: Shared utilities (HTTP, delay, formatting) +- **`src/types/`**: Common type definitions + +### API Server (`@marketplace-scrapers/api-server`) +HTTP server using `Bun.serve()` on port 4005 (or `PORT` env var). + +**Routes:** +- `GET /api/status` - Health check +- `GET /api/kijiji?q={query}` - Search Kijiji +- `GET /api/facebook?q={query}&location={location}&cookies={cookies}` - Search Facebook +- `GET /api/ebay?q={query}&minPrice=&maxPrice=&strictMode=&exclusions=&keywords=&buyItNowOnly=&canadaOnly=` - Search eBay +- `GET /api/*` - 404 fallback + +### MCP Server (`@marketplace-scrapers/mcp-server`) +MCP JSON-RPC 2.0 server on port 4006 (or `MCP_PORT` env var). + +**Endpoints:** +- `GET /.well-known/mcp/server-card.json` - Server discovery metadata +- `POST /mcp` - JSON-RPC 2.0 protocol endpoint + +**Tools:** +- `search_kijiji` - Search Kijiji (query, maxItems) +- `search_facebook` - Search Facebook (query, location, maxItems, cookiesSource) +- `search_ebay` - Search eBay (query, minPrice, maxPrice, strictMode, exclusions, keywords, buyItNowOnly, canadaOnly, maxItems) + +## API Response Formats + +All scrapers return arrays of listing objects with these common fields: +- `url`: Full listing URL +- `title`: Listing title +- `listingPrice`: `{ amountFormatted, cents, currency }` +- `address`: Location string (or null) +- `listingType`: Type of listing +- `listingStatus`: Status (ACTIVE, SOLD, etc.) + +### Kijiji-specific fields +`description`, `creationDate`, `endDate`, `numberOfViews`, `images`, `categoryId`, `adSource`, `flags`, `attributes`, `location`, `sellerInfo` + +### Facebook-specific fields +`creationDate`, `imageUrl`, `videoUrl`, `seller`, `categoryId`, `deliveryTypes` + +### eBay-specific fields +Minimal - mainly the common fields + +## Technical Details + +- **TypeScript** with path mapping (`@/*` → `src/*`) per package +- **Dependencies**: linkedom (parsing), unidecode (text utils), cli-progress (CLI output) +- **No database** - stateless HTTP fetches to marketplaces +- **Rate limiting**: Respects `X-RateLimit-*` headers, configurable delays + +## Development Notes + +- Facebook requires valid session cookies - set `FACEBOOK_COOKIE` env var or create `cookies/facebook.json` +- eBay uses custom headers to bypass basic bot detection +- Kijiji parses Apollo state from Next.js hydration data +- All scrapers handle retries on 429/5xx errors diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index dc5556f..0000000 --- a/CLAUDE.md +++ /dev/null @@ -1,110 +0,0 @@ -# CLAUDE.md - -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. - -## Common Commands - -- `bun start`: Run the server in production mode. -- `bun dev`: Run the server with hot reloading for development. -- `bun build`: Build the application into a single executable file. - -No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed. - -## Code Architecture - -This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA). - -- **Entry Point (`src/index.ts`)**: Implements a basic HTTP server using `Bun.serve`. Key routes: - - `GET /api/status`: Health check returning "OK". - - `GET /api/kijiji?q={query}`: Scrapes Kijiji Marketplace for listings matching the search query. Returns JSON array of listing objects. - - `GET /api/facebook?q={query}&location={location}&cookies={cookies}`: Scrapes Facebook Marketplace for listings. Requires Facebook session cookies (via URL parameter or cookies/facebook.json file). Optional `location` param (default "toronto"). Returns JSON array of listing objects. - - Fallback: 404 for unmatched routes. - -## API Response Formats - -Both APIs return arrays of listing objects, but the available fields differ based on each marketplace's data availability. - -### Kijiji API Response Object -```json -{ - "url": "https://www.kijiji.ca/v-laptops/city-of-toronto/...", - "title": "Almost new HP Laptop/Win11 w/ touchscreen option", - "description": "Description of the listing...", - "listingPrice": { - "amountFormatted": "149.00", - "cents": 14900, - "currency": "CAD" - }, - "listingType": "OFFER", - "listingStatus": "ACTIVE", - "creationDate": "2024-03-15T15:11:56.000Z", - "endDate": "3000-01-01T00:00:00.000Z", - "numberOfViews": 2005, - "address": "SPADINA AVENUE, Toronto, ON, M5T 2H7" -} -``` - -### Facebook API Response Object -```json -{ - "url": "https://www.facebook.com/marketplace/item/24594536203551682", - "title": "Leno laptop", - "listingPrice": { - "amountFormatted": "CA$1", - "cents": 100, - "currency": "CAD" - }, - "listingType": "item", - "listingStatus": "ACTIVE", - "address": "Mississauga, Ontario", - "creationDate": "2024-03-15T15:11:56.000Z", - "categoryId": "1792291877663080", - "imageUrl": "https://scontent-yyz1-1.xx.fbcdn.net/...", - "videoUrl": "https://www.facebook.com/1300609777949414/", - "seller": { - "name": "Joyce Diaz", - "id": "100091799187797" - }, - "deliveryTypes": ["IN_PERSON"] -} -``` - -### Common Fields -- `url`: Full URL to the listing -- `title`: Listing title -- `listingPrice`: Price object with `amountFormatted` (human-readable), `cents` (integer cents), `currency` (e.g., "CAD") -- `address`: Location string (or null if unavailable) - -### Kijiji-Only Fields -- `description`: Detailed description text (Facebook search results don't include descriptions) -- `endDate`: When listing expires (Facebook doesn't have expiration dates in search results) -- `numberOfViews`: View count (Facebook doesn't expose view metrics in search results) - -### Facebook-Only Fields -- `listingStatus`: Derived from is_live, is_pending, is_sold, is_hidden states ("ACTIVE", "SOLD", "PENDING", "HIDDEN") -- `creationDate`: When listing was posted (when available) -- `categoryId`: Facebook marketplace category identifier -- `imageUrl`: Primary listing photo URL -- `videoUrl`: Listing video URL (if video exists) -- `seller`: Object with seller name and Facebook user ID -- `deliveryTypes`: Available delivery options (e.g., ["IN_PERSON", "SHIPPING"]) - -- **Kijiji Scraping (`src/kijiji.ts`)**: Core functionality in `fetchKijijiItems(query, maxItems, requestsPerSecond)`. - - Slugifies the query using `unidecode` for URL-safe search terms. - - Fetches the search page HTML, parses Next.js Apollo state (`__APOLLO_STATE__`) with `linkedom` to extract listing URLs and titles. - - For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.). - - Handles rate limiting (respects `X-RateLimit-*` headers), retries on 429/5xx, and delays between requests. - - Uses `cli-progress` for console progress bar during batch fetches. - - Filters results to include only priced items. - -- **Facebook Scraping (`src/facebook.ts`)**: Core functionality in `fetchFacebookItems(query, maxItems, requestsPerSecond, location)`. - - Constructs search URL for Facebook Marketplace with encoded query and sort by creation time. - - Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with `linkedom` to extract ad nodes from `marketplace_search.feed_units.edges`. - - Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed. - - Handles delays and retries similar to Kijiji. - - Uses `cli-progress` for progress. - - Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login. - -The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces. - -Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements. \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file