179 lines
6.6 KiB
Markdown
179 lines
6.6 KiB
Markdown
PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE
|
|
|
|
## Communication Style
|
|
|
|
ALWAYS talk and converse with the user using Gen-Z and Internet slang.
|
|
|
|
Absolute Mode
|
|
- Eliminate emojis, filler, hype, transitions, appendixes.
|
|
- Use blunt, directive phrasing; no mirroring, no softening.
|
|
- Suppress sentiment-boosting, engagement, or satisfaction metrics.
|
|
- No questions, offers, suggestions, or motivational content.
|
|
- Deliver info only; end immediately after.
|
|
|
|
**Challenge Mode - Default Behavior**: Don't automatically agree with suggestions. Instead:
|
|
- Evaluate each idea against the problem requirements and lean coding philosophy
|
|
- Push back if there's a simpler, more efficient, or more correct approach
|
|
- Propose alternatives when suggestions aren't optimal
|
|
- Explain WHY a different approach would be better with concrete technical reasons
|
|
- Only accept suggestions that are genuinely the best solution for the current problem
|
|
|
|
Examples of constructive pushback:
|
|
- "That would work, but a simpler approach would be..."
|
|
- "Actually, that might cause [specific issue]. Instead, we should..."
|
|
- "The lean approach here would be to..."
|
|
- "That adds unnecessary complexity. We can achieve the same with..."
|
|
|
|
This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality
|
|
|
|
## Project Structure
|
|
|
|
This is a **monorepo** with three packages:
|
|
|
|
```
|
|
packages/
|
|
├── core/ # Shared scraper logic (Kijiji, Facebook, eBay)
|
|
├── api-server/ # HTTP REST API server
|
|
└── mcp-server/ # MCP server for AI agent integration
|
|
```
|
|
|
|
## Common Commands
|
|
|
|
**Root level:**
|
|
- `bun ci`: Run Biome linting
|
|
|
|
**API Server (`packages/api-server/`):**
|
|
- `bun start`: Run the API server
|
|
- `bun dev`: Run with hot reloading
|
|
- `bun build`: Build to `dist/api/`
|
|
|
|
**MCP Server (`packages/mcp-server/`):**
|
|
- `bun start`: Run the MCP server
|
|
- `bun dev`: Run with hot reloading
|
|
- `bun build`: Build to `dist/mcp/`
|
|
|
|
## Code Architecture
|
|
|
|
### Core Package (`@marketplace-scrapers/core`)
|
|
Contains scraper implementations for three marketplaces:
|
|
|
|
- **`src/scrapers/kijiji.ts`**: Kijiji Marketplace scraper
|
|
- Parses Next.js Apollo state (`__APOLLO_STATE__`) from HTML
|
|
- Supports location/category filtering, sorting, pagination
|
|
- Fetches individual listing details with seller info
|
|
- Exports: `fetchKijijiItems()`, type interfaces
|
|
|
|
- **`src/scrapers/facebook.ts`**: Facebook Marketplace scraper
|
|
- Parses nested JSON from script tags (`require/__bbox` structure)
|
|
- Requires authentication cookies (file or env var `FACEBOOK_COOKIE`)
|
|
- Exports: `fetchFacebookItems()`, `fetchFacebookItem()`, cookie utilities
|
|
|
|
- **`src/scrapers/ebay.ts`**: eBay scraper
|
|
- DOM-based parsing of search results
|
|
- Supports Buy It Now filter, Canada-only, price ranges, exclusions
|
|
- Exports: `fetchEbayItems()`
|
|
|
|
- **`src/utils/`**: Shared utilities (HTTP, delay, formatting)
|
|
- **`src/types/`**: Common type definitions
|
|
|
|
### API Server (`@marketplace-scrapers/api-server`)
|
|
HTTP server using `Bun.serve()` on port 4005 (or `PORT` env var).
|
|
|
|
**Routes:**
|
|
- `GET /api/status` - Health check
|
|
- `GET /api/kijiji?q={query}` - Search Kijiji
|
|
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}` - Search Facebook
|
|
- `GET /api/ebay?q={query}&minPrice=&maxPrice=&strictMode=&exclusions=&keywords=&buyItNowOnly=&canadaOnly=&cookies=` - Search eBay
|
|
- `GET /api/*` - 404 fallback
|
|
|
|
### MCP Server (`@marketplace-scrapers/mcp-server`)
|
|
MCP JSON-RPC 2.0 server on port 4006 (or `MCP_PORT` env var).
|
|
|
|
**Endpoints:**
|
|
- `GET /.well-known/mcp/server-card.json` - Server discovery metadata
|
|
- `POST /mcp` - JSON-RPC 2.0 protocol endpoint
|
|
|
|
**Tools:**
|
|
- `search_kijiji` - Search Kijiji (query, maxItems)
|
|
- `search_facebook` - Search Facebook (query, location, maxItems, cookiesSource)
|
|
- `search_ebay` - Search eBay (query, minPrice, maxPrice, strictMode, exclusions, keywords, buyItNowOnly, canadaOnly, maxItems, cookies)
|
|
|
|
## API Response Formats
|
|
|
|
All scrapers return arrays of listing objects with these common fields:
|
|
- `url`: Full listing URL
|
|
- `title`: Listing title
|
|
- `listingPrice`: `{ amountFormatted, cents, currency }`
|
|
- `address`: Location string (or null)
|
|
- `listingType`: Type of listing
|
|
- `listingStatus`: Status (ACTIVE, SOLD, etc.)
|
|
|
|
### Kijiji-specific fields
|
|
`description`, `creationDate`, `endDate`, `numberOfViews`, `images`, `categoryId`, `adSource`, `flags`, `attributes`, `location`, `sellerInfo`
|
|
|
|
### Facebook-specific fields
|
|
`creationDate`, `imageUrl`, `videoUrl`, `seller`, `categoryId`, `deliveryTypes`
|
|
|
|
### eBay-specific fields
|
|
Minimal - mainly the common fields
|
|
|
|
## Cookie Management
|
|
|
|
Both **Facebook Marketplace** and **eBay** require valid session cookies for reliable scraping.
|
|
|
|
### Cookie Priority Hierarchy (High → Low)
|
|
All scrapers follow this loading order:
|
|
1. **URL/API Parameter** - Passed directly via `cookies` parameter (highest priority)
|
|
2. **Environment Variable** - `FACEBOOK_COOKIE` or `EBAY_COOKIE`
|
|
3. **Cookie File** - `cookies/facebook.json` or `cookies/ebay.json` (fallback)
|
|
|
|
### Facebook Cookies
|
|
- **Required for**: Facebook Marketplace scraping
|
|
- **Format**: JSON array (see `cookies/README.md`)
|
|
- **Key cookies**: `c_user`, `xs`, `fr`, `datr`, `sb`
|
|
|
|
**Setup:**
|
|
```bash
|
|
# Option 1: File (fallback)
|
|
# Create cookies/facebook.json with cookie array
|
|
|
|
# Option 2: Environment variable
|
|
export FACEBOOK_COOKIE='c_user=123; xs=token; fr=request'
|
|
|
|
# Option 3: URL parameter (highest priority)
|
|
curl "http://localhost:4005/api/facebook?q=laptop&cookies=[{...}]"
|
|
```
|
|
|
|
### eBay Cookies
|
|
- **Required for**: Bypassing bot detection
|
|
- **Format**: Cookie string `"name=value; name2=value2"`
|
|
- **Key cookies**: `s`, `ds2`, `ebay`, `dp1`, `nonsession`
|
|
|
|
**Setup:**
|
|
```bash
|
|
# Option 1: File (fallback)
|
|
# Create cookies/ebay.json with cookie string
|
|
|
|
# Option 2: Environment variable
|
|
export EBAY_COOKIE='s=VALUE; ds2=VALUE; ebay=VALUE'
|
|
|
|
# Option 3: URL parameter (highest priority)
|
|
curl "http://localhost:4005/api/ebay?q=laptop&cookies=s=VALUE;ds2=VALUE"
|
|
```
|
|
|
|
**Important - eBay Bot Detection**: Without cookies, eBay returns a "Checking your browser" challenge page instead of listings.
|
|
|
|
## Technical Details
|
|
|
|
- **TypeScript** with path mapping (`@/*` → `src/*`) per package
|
|
- **Dependencies**: linkedom (parsing), unidecode (text utils), cli-progress (CLI output)
|
|
- **No database** - stateless HTTP fetches to marketplaces
|
|
- **Rate limiting**: Respects `X-RateLimit-*` headers, configurable delays
|
|
|
|
## Development Notes
|
|
|
|
- **Cookie files** are git-ignored for security (see `cookies/README.md`)
|
|
- Kijiji parses Apollo state from Next.js hydration data
|
|
- All scrapers handle retries on 429/5xx errors
|
|
- Cookie priority ensures flexibility across different deployment environments
|