chore: agents update

This commit is contained in:
2026-01-23 00:52:35 -05:00
parent c35aae4c95
commit da23ca1c3f
2 changed files with 106 additions and 116 deletions

111
AGENTS.md
View File

@@ -1,9 +1,3 @@
# AGENTS.md
This file provides guidance to coding agents when working with code in this repository.
The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE
## Communication Style
@@ -31,3 +25,108 @@ Examples of constructive pushback:
- "That adds unnecessary complexity. We can achieve the same with..."
This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality
## Project Structure
This is a **monorepo** with three packages:
```
packages/
├── core/ # Shared scraper logic (Kijiji, Facebook, eBay)
├── api-server/ # HTTP REST API server
└── mcp-server/ # MCP server for AI agent integration
```
## Common Commands
**Root level:**
- `bun ci`: Run Biome linting
**API Server (`packages/api-server/`):**
- `bun start`: Run the API server
- `bun dev`: Run with hot reloading
- `bun build`: Build to `dist/api/`
**MCP Server (`packages/mcp-server/`):**
- `bun start`: Run the MCP server
- `bun dev`: Run with hot reloading
- `bun build`: Build to `dist/mcp/`
## Code Architecture
### Core Package (`@marketplace-scrapers/core`)
Contains scraper implementations for three marketplaces:
- **`src/scrapers/kijiji.ts`**: Kijiji Marketplace scraper
- Parses Next.js Apollo state (`__APOLLO_STATE__`) from HTML
- Supports location/category filtering, sorting, pagination
- Fetches individual listing details with seller info
- Exports: `fetchKijijiItems()`, type interfaces
- **`src/scrapers/facebook.ts`**: Facebook Marketplace scraper
- Parses nested JSON from script tags (`require/__bbox` structure)
- Requires authentication cookies (file or env var `FACEBOOK_COOKIE`)
- Exports: `fetchFacebookItems()`, `fetchFacebookItem()`, cookie utilities
- **`src/scrapers/ebay.ts`**: eBay scraper
- DOM-based parsing of search results
- Supports Buy It Now filter, Canada-only, price ranges, exclusions
- Exports: `fetchEbayItems()`
- **`src/utils/`**: Shared utilities (HTTP, delay, formatting)
- **`src/types/`**: Common type definitions
### API Server (`@marketplace-scrapers/api-server`)
HTTP server using `Bun.serve()` on port 4005 (or `PORT` env var).
**Routes:**
- `GET /api/status` - Health check
- `GET /api/kijiji?q={query}` - Search Kijiji
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}` - Search Facebook
- `GET /api/ebay?q={query}&minPrice=&maxPrice=&strictMode=&exclusions=&keywords=&buyItNowOnly=&canadaOnly=` - Search eBay
- `GET /api/*` - 404 fallback
### MCP Server (`@marketplace-scrapers/mcp-server`)
MCP JSON-RPC 2.0 server on port 4006 (or `MCP_PORT` env var).
**Endpoints:**
- `GET /.well-known/mcp/server-card.json` - Server discovery metadata
- `POST /mcp` - JSON-RPC 2.0 protocol endpoint
**Tools:**
- `search_kijiji` - Search Kijiji (query, maxItems)
- `search_facebook` - Search Facebook (query, location, maxItems, cookiesSource)
- `search_ebay` - Search eBay (query, minPrice, maxPrice, strictMode, exclusions, keywords, buyItNowOnly, canadaOnly, maxItems)
## API Response Formats
All scrapers return arrays of listing objects with these common fields:
- `url`: Full listing URL
- `title`: Listing title
- `listingPrice`: `{ amountFormatted, cents, currency }`
- `address`: Location string (or null)
- `listingType`: Type of listing
- `listingStatus`: Status (ACTIVE, SOLD, etc.)
### Kijiji-specific fields
`description`, `creationDate`, `endDate`, `numberOfViews`, `images`, `categoryId`, `adSource`, `flags`, `attributes`, `location`, `sellerInfo`
### Facebook-specific fields
`creationDate`, `imageUrl`, `videoUrl`, `seller`, `categoryId`, `deliveryTypes`
### eBay-specific fields
Minimal - mainly the common fields
## Technical Details
- **TypeScript** with path mapping (`@/*``src/*`) per package
- **Dependencies**: linkedom (parsing), unidecode (text utils), cli-progress (CLI output)
- **No database** - stateless HTTP fetches to marketplaces
- **Rate limiting**: Respects `X-RateLimit-*` headers, configurable delays
## Development Notes
- Facebook requires valid session cookies - set `FACEBOOK_COOKIE` env var or create `cookies/facebook.json`
- eBay uses custom headers to bypass basic bot detection
- Kijiji parses Apollo state from Next.js hydration data
- All scrapers handle retries on 429/5xx errors

110
CLAUDE.md
View File

@@ -1,110 +0,0 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Common Commands
- `bun start`: Run the server in production mode.
- `bun dev`: Run the server with hot reloading for development.
- `bun build`: Build the application into a single executable file.
No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed.
## Code Architecture
This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA).
- **Entry Point (`src/index.ts`)**: Implements a basic HTTP server using `Bun.serve`. Key routes:
- `GET /api/status`: Health check returning "OK".
- `GET /api/kijiji?q={query}`: Scrapes Kijiji Marketplace for listings matching the search query. Returns JSON array of listing objects.
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}`: Scrapes Facebook Marketplace for listings. Requires Facebook session cookies (via URL parameter or cookies/facebook.json file). Optional `location` param (default "toronto"). Returns JSON array of listing objects.
- Fallback: 404 for unmatched routes.
## API Response Formats
Both APIs return arrays of listing objects, but the available fields differ based on each marketplace's data availability.
### Kijiji API Response Object
```json
{
"url": "https://www.kijiji.ca/v-laptops/city-of-toronto/...",
"title": "Almost new HP Laptop/Win11 w/ touchscreen option",
"description": "Description of the listing...",
"listingPrice": {
"amountFormatted": "149.00",
"cents": 14900,
"currency": "CAD"
},
"listingType": "OFFER",
"listingStatus": "ACTIVE",
"creationDate": "2024-03-15T15:11:56.000Z",
"endDate": "3000-01-01T00:00:00.000Z",
"numberOfViews": 2005,
"address": "SPADINA AVENUE, Toronto, ON, M5T 2H7"
}
```
### Facebook API Response Object
```json
{
"url": "https://www.facebook.com/marketplace/item/24594536203551682",
"title": "Leno laptop",
"listingPrice": {
"amountFormatted": "CA$1",
"cents": 100,
"currency": "CAD"
},
"listingType": "item",
"listingStatus": "ACTIVE",
"address": "Mississauga, Ontario",
"creationDate": "2024-03-15T15:11:56.000Z",
"categoryId": "1792291877663080",
"imageUrl": "https://scontent-yyz1-1.xx.fbcdn.net/...",
"videoUrl": "https://www.facebook.com/1300609777949414/",
"seller": {
"name": "Joyce Diaz",
"id": "100091799187797"
},
"deliveryTypes": ["IN_PERSON"]
}
```
### Common Fields
- `url`: Full URL to the listing
- `title`: Listing title
- `listingPrice`: Price object with `amountFormatted` (human-readable), `cents` (integer cents), `currency` (e.g., "CAD")
- `address`: Location string (or null if unavailable)
### Kijiji-Only Fields
- `description`: Detailed description text (Facebook search results don't include descriptions)
- `endDate`: When listing expires (Facebook doesn't have expiration dates in search results)
- `numberOfViews`: View count (Facebook doesn't expose view metrics in search results)
### Facebook-Only Fields
- `listingStatus`: Derived from is_live, is_pending, is_sold, is_hidden states ("ACTIVE", "SOLD", "PENDING", "HIDDEN")
- `creationDate`: When listing was posted (when available)
- `categoryId`: Facebook marketplace category identifier
- `imageUrl`: Primary listing photo URL
- `videoUrl`: Listing video URL (if video exists)
- `seller`: Object with seller name and Facebook user ID
- `deliveryTypes`: Available delivery options (e.g., ["IN_PERSON", "SHIPPING"])
- **Kijiji Scraping (`src/kijiji.ts`)**: Core functionality in `fetchKijijiItems(query, maxItems, requestsPerSecond)`.
- Slugifies the query using `unidecode` for URL-safe search terms.
- Fetches the search page HTML, parses Next.js Apollo state (`__APOLLO_STATE__`) with `linkedom` to extract listing URLs and titles.
- For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.).
- Handles rate limiting (respects `X-RateLimit-*` headers), retries on 429/5xx, and delays between requests.
- Uses `cli-progress` for console progress bar during batch fetches.
- Filters results to include only priced items.
- **Facebook Scraping (`src/facebook.ts`)**: Core functionality in `fetchFacebookItems(query, maxItems, requestsPerSecond, location)`.
- Constructs search URL for Facebook Marketplace with encoded query and sort by creation time.
- Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with `linkedom` to extract ad nodes from `marketplace_search.feed_units.edges`.
- Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed.
- Handles delays and retries similar to Kijiji.
- Uses `cli-progress` for progress.
- Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login.
The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements.

1
CLAUDE.md Symbolic link
View File

@@ -0,0 +1 @@
AGENTS.md