Compare commits

..

1 Commits

Author SHA1 Message Date
25beba747a chore: update package.json
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2025-10-03 08:48:17 -04:00
23 changed files with 1323 additions and 5878 deletions

View File

@@ -1,33 +0,0 @@
# AGENTS.md
This file provides guidance to coding agents when working with code in this repository.
The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE
## Communication Style
ALWAYS talk and converse with the user using Gen-Z and Internet slang.
Absolute Mode
- Eliminate emojis, filler, hype, transitions, appendixes.
- Use blunt, directive phrasing; no mirroring, no softening.
- Suppress sentiment-boosting, engagement, or satisfaction metrics.
- No questions, offers, suggestions, or motivational content.
- Deliver info only; end immediately after.
**Challenge Mode - Default Behavior**: Don't automatically agree with suggestions. Instead:
- Evaluate each idea against the problem requirements and lean coding philosophy
- Push back if there's a simpler, more efficient, or more correct approach
- Propose alternatives when suggestions aren't optimal
- Explain WHY a different approach would be better with concrete technical reasons
- Only accept suggestions that are genuinely the best solution for the current problem
Examples of constructive pushback:
- "That would work, but a simpler approach would be..."
- "Actually, that might cause [specific issue]. Instead, we should..."
- "The lean approach here would be to..."
- "That adds unnecessary complexity. We can achieve the same with..."
This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality

110
CLAUDE.md Normal file
View File

@@ -0,0 +1,110 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Common Commands
- `bun start`: Run the server in production mode.
- `bun dev`: Run the server with hot reloading for development.
- `bun build`: Build the application into a single executable file.
No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed.
## Code Architecture
This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA).
- **Entry Point (`src/index.ts`)**: Implements a basic HTTP server using `Bun.serve`. Key routes:
- `GET /api/status`: Health check returning "OK".
- `GET /api/kijiji?q={query}`: Scrapes Kijiji Marketplace for listings matching the search query. Returns JSON array of listing objects.
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}`: Scrapes Facebook Marketplace for listings. Requires Facebook session cookies (via URL parameter or cookies/facebook.json file). Optional `location` param (default "toronto"). Returns JSON array of listing objects.
- Fallback: 404 for unmatched routes.
## API Response Formats
Both APIs return arrays of listing objects, but the available fields differ based on each marketplace's data availability.
### Kijiji API Response Object
```json
{
"url": "https://www.kijiji.ca/v-laptops/city-of-toronto/...",
"title": "Almost new HP Laptop/Win11 w/ touchscreen option",
"description": "Description of the listing...",
"listingPrice": {
"amountFormatted": "149.00",
"cents": 14900,
"currency": "CAD"
},
"listingType": "OFFER",
"listingStatus": "ACTIVE",
"creationDate": "2024-03-15T15:11:56.000Z",
"endDate": "3000-01-01T00:00:00.000Z",
"numberOfViews": 2005,
"address": "SPADINA AVENUE, Toronto, ON, M5T 2H7"
}
```
### Facebook API Response Object
```json
{
"url": "https://www.facebook.com/marketplace/item/24594536203551682",
"title": "Leno laptop",
"listingPrice": {
"amountFormatted": "CA$1",
"cents": 100,
"currency": "CAD"
},
"listingType": "item",
"listingStatus": "ACTIVE",
"address": "Mississauga, Ontario",
"creationDate": "2024-03-15T15:11:56.000Z",
"categoryId": "1792291877663080",
"imageUrl": "https://scontent-yyz1-1.xx.fbcdn.net/...",
"videoUrl": "https://www.facebook.com/1300609777949414/",
"seller": {
"name": "Joyce Diaz",
"id": "100091799187797"
},
"deliveryTypes": ["IN_PERSON"]
}
```
### Common Fields
- `url`: Full URL to the listing
- `title`: Listing title
- `listingPrice`: Price object with `amountFormatted` (human-readable), `cents` (integer cents), `currency` (e.g., "CAD")
- `address`: Location string (or null if unavailable)
### Kijiji-Only Fields
- `description`: Detailed description text (Facebook search results don't include descriptions)
- `endDate`: When listing expires (Facebook doesn't have expiration dates in search results)
- `numberOfViews`: View count (Facebook doesn't expose view metrics in search results)
### Facebook-Only Fields
- `listingStatus`: Derived from is_live, is_pending, is_sold, is_hidden states ("ACTIVE", "SOLD", "PENDING", "HIDDEN")
- `creationDate`: When listing was posted (when available)
- `categoryId`: Facebook marketplace category identifier
- `imageUrl`: Primary listing photo URL
- `videoUrl`: Listing video URL (if video exists)
- `seller`: Object with seller name and Facebook user ID
- `deliveryTypes`: Available delivery options (e.g., ["IN_PERSON", "SHIPPING"])
- **Kijiji Scraping (`src/kijiji.ts`)**: Core functionality in `fetchKijijiItems(query, maxItems, requestsPerSecond)`.
- Slugifies the query using `unidecode` for URL-safe search terms.
- Fetches the search page HTML, parses Next.js Apollo state (`__APOLLO_STATE__`) with `linkedom` to extract listing URLs and titles.
- For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.).
- Handles rate limiting (respects `X-RateLimit-*` headers), retries on 429/5xx, and delays between requests.
- Uses `cli-progress` for console progress bar during batch fetches.
- Filters results to include only priced items.
- **Facebook Scraping (`src/facebook.ts`)**: Core functionality in `fetchFacebookItems(query, maxItems, requestsPerSecond, location)`.
- Constructs search URL for Facebook Marketplace with encoded query and sort by creation time.
- Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with `linkedom` to extract ad nodes from `marketplace_search.feed_units.edges`.
- Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed.
- Handles delays and retries similar to Kijiji.
- Uses `cli-progress` for progress.
- Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login.
The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements.

View File

@@ -15,7 +15,7 @@ COPY src ./src
COPY tsconfig.json ./ COPY tsconfig.json ./
# Build the application for production # Build the application for production
RUN bun build ./src/index.ts --outdir ./dist --minify --target=bun RUN bun build ./src/index.ts --outdir ./dist --minify
# Multi-stage build - runtime stage # Multi-stage build - runtime stage
FROM oven/bun:latest AS runtime FROM oven/bun:latest AS runtime

View File

@@ -1,382 +0,0 @@
# Facebook Marketplace API Reverse Engineering
## Overview
This document tracks findings from reverse-engineering Facebook Marketplace APIs for listing details.
## Current Implementation Status
- Search functionality: Implemented in `src/facebook.ts`
- Individual listing details: Not yet implemented
## Findings
### Step 1: Initial Setup
- Using Chrome DevTools to inspect Facebook Marketplace
- Need to authenticate with Facebook account to access marketplace data
- Cookies required for full access
- Current status: Successfully logged in and accessed marketplace data
### Step 2: Individual Listing Details Analysis - COMPLETED
- **Data Location**: Embedded in HTML script tags within `require` array structure
- **Path**: `require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Authentication**: Required for full data access
- **Current Status**: Successfully reverse-engineered the API structure and data extraction method
### API Endpoints Discovered
#### Search Endpoint
- URL: `https://www.facebook.com/marketplace/{location}/search`
- Parameters: `query`, `sortBy`, `exact`
- Data embedded in HTML script tags with `require` structure
- Authentication: Required (cookies)
#### Listing Details Endpoint
- **URL Structure**: `https://www.facebook.com/marketplace/item/{listing_id}/`
- **Data Source**: Server-side rendered HTML with embedded JSON data in script tags
- **Data Structure**: Relay/GraphQL style data structure under `require[0][3].__bbox.require[...].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Extraction Method**: Parse JSON from script tags containing marketplace data, navigate to the target object
- **Authentication**: Required (cookies)
### Listing Data Structure Discovered (Current - 2026)
The current Facebook Marketplace API returns a comprehensive `GroupCommerceProductItem` object with the following key properties:
```typescript
interface FacebookMarketplaceItem {
// Basic identification
id: string;
__typename: "GroupCommerceProductItem";
// Listing content
marketplace_listing_title: string;
redacted_description: {
text: string;
};
custom_title?: string;
// Pricing
formatted_price: {
text: string;
};
listing_price: {
amount: string;
currency: string;
amount_with_offset: string;
};
// Location
location_text: {
text: string;
};
location: {
latitude: number;
longitude: number;
reverse_geocode_detailed: {
country_alpha_two: string;
postal_code_trimmed: string;
};
};
// Status flags
is_live: boolean;
is_sold: boolean;
is_pending: boolean;
is_hidden: boolean;
is_draft: boolean;
// Timing
creation_time: number;
// Seller information
marketplace_listing_seller: {
__typename: "User";
id: string;
name: string;
profile_picture?: {
uri: string;
};
join_time?: number;
};
// Vehicle-specific fields (for automotive listings)
vehicle_make_display_name?: string;
vehicle_model_display_name?: string;
vehicle_odometer_data?: {
unit: "KILOMETERS" | "MILES";
value: number;
};
vehicle_transmission_type?: "AUTOMATIC" | "MANUAL";
vehicle_exterior_color?: string;
vehicle_interior_color?: string;
vehicle_condition?: "EXCELLENT" | "GOOD" | "FAIR" | "POOR";
vehicle_fuel_type?: string;
vehicle_trim_display_name?: string;
// Category and commerce
marketplace_listing_category_id: string;
condition?: string;
// Commerce features
delivery_types?: string[];
is_shipping_offered?: boolean;
is_buy_now_enabled?: boolean;
can_buyer_make_checkout_offer?: boolean;
// Communication
messaging_enabled?: boolean;
first_message_suggested_value?: string;
// Metadata
logging_id: string;
reportable_ent_id: string;
origin_target?: {
__typename: "Marketplace";
id: string;
};
// Related listings (for part-out sellers)
marketplace_listing_sets?: {
edges: Array<{
node: {
canonical_listing: {
id: string;
marketplace_listing_title: string;
is_live: boolean;
is_sold: boolean;
formatted_price: { text: string };
};
};
}>;
};
}
```
### Example Data Extracted (Current Structure)
```json
{
"__typename": "GroupCommerceProductItem",
"marketplace_listing_title": "2012 Mazda MAZDA 3 PART-OUT",
"id": "1211645920845312",
"redacted_description": {
"text": "FOR PARTS ONLY!!!"
},
"custom_title": "2012 Mazda 3 part-out",
"creation_time": 1760450080,
"location_text": {
"text": "Toronto, ON"
},
"is_live": true,
"is_sold": false,
"is_pending": false,
"is_hidden": false,
"formatted_price": {
"text": "FREE"
},
"listing_price": {
"amount_with_offset": "0",
"currency": "CAD",
"amount": "0.00"
},
"condition": "USED",
"logging_id": "24676483845336407",
"marketplace_listing_category_id": "807311116002614",
"marketplace_listing_seller": {
"__typename": "User",
"id": "61570613529010",
"name": "Jay Heshin",
"profile_picture": {
"uri": "https://scontent-yyz1-1.xx.fbcdn.net/v/t39.30808-1/480952111_122133462296687117_4145652046222010716_n.jpg?stp=cp6_dst-jpg_s50x50_tt6&_nc_cat=108&ccb=1-7&_nc_sid=e99d92&_nc_ohc=x_DTkeriVbgQ7kNvwEqT_x3&_nc_oc=Adnqnqf4YsZxgMIkR2mSFrdLb6-BDw4omCWqG_cqB-H0uXGgK1l4-T-fLSGB_CQJEKo&_nc_zt=24&_nc_ht=scontent-yyz1-1.xx&_nc_gid=7GnSwn4MSbllAgGWJy0RTQ&oh=00_AfpY66l8w-LvHvZ6tTgiD9Qh-Or_Udc-OaFiVL9pQ0YXsg&oe=697797CD"
}
},
"vehicle_condition": "FAIR",
"vehicle_exterior_color": "white",
"vehicle_interior_color": "",
"vehicle_make_display_name": "Mazda",
"vehicle_model_display_name": "3 part-out",
"vehicle_odometer_data": {
"unit": "KILOMETERS",
"value": 999999
},
"vehicle_transmission_type": "AUTOMATIC",
"location": {
"latitude": 43.651428222656,
"longitude": -79.436645507812,
"reverse_geocode_detailed": {
"country_alpha_two": "CA",
"postal_code_trimmed": "M6H 1C1"
}
},
"delivery_types": ["IN_PERSON"],
"messaging_enabled": true,
"first_message_suggested_value": "Hi, is this available?",
"marketplace_listing_sets": {
"edges": [
{
"node": {
"canonical_listing": {
"id": "1435935788228627",
"marketplace_listing_title": "2004 Land Rover LR2 PART-OUT",
"is_live": true,
"formatted_price": {"text": "FREE"}
}
}
}
]
}
}
```
## Data Extraction Method
### Current Method (2026)
Facebook Marketplace listing data is embedded in JSON within `<script>` tags in the HTML response. The extraction process:
1. **Find the Correct Script**: Look for script tags containing marketplace listing data by searching for key fields like `marketplace_listing_title`, `redacted_description`, and `formatted_price`.
2. **Parse JSON Structure**: The data is nested within a `require` array structure:
```
require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target
```
3. **Navigate to Target Object**: The actual listing data is a `GroupCommerceProductItem` object containing comprehensive information about the listing, seller, and vehicle details.
4. **Handle Dynamic Structure**: Facebook may change the exact path, so robust extraction should search for the target object recursively within the parsed JSON.
### Authentication Requirements
- Valid Facebook session cookies are required
- User must be logged in to Facebook
- Marketplace access may be location-restricted
## Tools Used
- Chrome DevTools Protocol
- Network monitoring
- HTML/script parsing
- JSON structure analysis
## Implementation Status
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
- ✅ Identified current data structure and extraction method (2026)
- ✅ Documented comprehensive GroupCommerceProductItem interface
- ✅ Implemented `extractFacebookItemData()` function with script parsing logic
- ✅ Implemented `parseFacebookItem()` function to convert GroupCommerceProductItem to ListingDetails
- ✅ Implemented `fetchFacebookItem()` function with authentication and error handling
- ✅ Updated TypeScript interfaces to match current API structure
- ✅ Added robust extraction with fallback methods for changing API paths
## Implementation Details
### Core Functions Implemented
1. **`extractFacebookItemData(htmlString)`**: Extracts marketplace item data from HTML-embedded JSON in script tags
- Searches for scripts containing marketplace listing data
- Uses primary path: `require[0][3][0].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
- Falls back to recursive search for GroupCommerceProductItem objects
2. **`parseFacebookItem(item)`**: Converts Facebook's GroupCommerceProductItem to unified ListingDetails format
- Handles pricing (FREE listings, CAD currency)
- Extracts seller information, location, and status
- Supports vehicle-specific metadata
- Maps Facebook-specific fields to common interface
3. **`fetchFacebookItem(itemId, cookiesSource?)`**: Fetches individual listing details
- Loads Facebook authentication cookies
- Makes authenticated HTTP requests
- Handles rate limiting and retries
- Returns parsed ListingDetails or null on failure
### Authentication Requirements
- Facebook session cookies required in `./cookies/facebook.json` or provided as parameter
- Cookies must include valid authentication tokens for marketplace access
- Handles cookie expiration and domain validation
## Current Implementation Status - 2026 Verification
### Step 3: API Verification and Current Structure Analysis (January 2026)
- **Verification Date**: January 22, 2026
- **Status**: Successfully verified current Facebook Marketplace API structure
- **Data Source**: Embedded JSON in HTML script tags (server-side rendered)
- **Extraction Path**: `require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
#### Verified Listing Structure (Real Example - 2006 Hyundai Tiburon)
- **Listing ID**: 1226468515995685
- **Title**: "2006 Hyundai Tiburon"
- **Price**: CA$3,000 (formatted_price.text)
- **Raw Price Data**: {"amount_with_offset": "300000", "currency": "CAD", "amount": "3000.00"}
- **Location**: Hamilton, ON (with coordinates: 43.250427246094, -79.963989257812)
- **Description**: "As is" (redacted_description.text)
- **Vehicle Details**:
- Make: Hyundai
- Model: Tiburon
- Odometer: 194,000 km
- Transmission: AUTOMATIC
- Exterior Color: blue
- Interior Color: black
- Fuel Type: GASOLINE
- Number of Owners: TWO
- **Seller Information**:
- Name: Ajitpal Kaler
- ID: 100009257293466
- Profile Picture Available
- Join Time: 1426564800 (2015)
- **Listing Status**: Active (is_live: true, is_sold: false, is_pending: false)
- **Category**: 807311116002614 (Vehicles)
- **Delivery Types**: ["IN_PERSON"]
- **Messaging**: Enabled
#### Current API Characteristics
- **Authentication**: Still requires valid Facebook session cookies
- **Data Format**: Server-side rendered HTML with embedded GraphQL/Relay JSON
- **Structure Stability**: Primary extraction path remains functional
- **Additional Features**: Includes marketplace ratings, seller verification badges, cross-posting info
### API Changes Observed Since 2024 Documentation
- **Minimal Changes**: Core data structure largely unchanged
- **Enhanced Fields**: Added more detailed vehicle specifications and seller profile information
- **GraphQL Integration**: Deeper integration with Facebook's GraphQL infrastructure
- **Security Features**: Additional integrity checks and reporting mechanisms
### Multi-Category Testing Results (January 2026)
Successfully tested extraction across different listing categories:
#### 1. Vehicle Listings (Automotive)
- **Example**: 2006 Hyundai Tiburon (ID: 1226468515995685)
- **Status**: ✅ Fully functional
- **Data Extracted**: Complete vehicle specs, pricing, seller info, location coordinates
- **Unique Fields**: vehicle_make_display_name, vehicle_odometer_data, vehicle_transmission_type, vehicle_exterior_color, vehicle_interior_color, vehicle_fuel_type
#### 2. Electronics Listings
- **Example**: Nintendo Switch (ID: 3903865769914262)
- **Status**: ✅ Fully functional
- **Data Extracted**: Title, price (CA$140), location (Toronto, ON), condition (Used - like new), seller (Yitao Hou)
- **Category**: Electronics (category_id: 479353692612078)
- **Notes**: Standard GroupCommerceProductItem structure applies
#### 3. Home Goods/Furniture Listings
- **Example**: Tabletop Mirror (cat not included) (ID: 1082389057290709)
- **Status**: ✅ Fully functional
- **Data Extracted**: Title, price (CA$5), location (Mississauga, ON), condition (Used - like new), seller (Rohit Rehan)
- **Category**: Home Goods (category_id: 1569171756675761)
- **Notes**: Includes detailed description and delivery options
#### Testing Summary
- **Extraction Method**: Consistent across all categories
- **Data Structure**: GroupCommerceProductItem interface works for all listing types
- **Authentication**: Required for all categories
- **Rate Limiting**: Standard Facebook rate limits apply
- **Edge Cases**: All tested listings were active/in-person pickup
## Implementation Status - COMPLETED (January 2026)
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
- ✅ Verified current API structure and extraction method (January 2026)
- ✅ Tested extraction across multiple listing categories (vehicles, electronics, home goods)
- ✅ Implemented comprehensive error handling for sold/removed listings and authentication failures
- ✅ Enhanced rate limiting and retry logic (already robust)
- ✅ Added monitoring and metrics for API stability detection
- ✅ Updated all scraper functions to use verified extraction methods
- ✅ Documented comprehensive GroupCommerceProductItem interface with real examples
## Next Steps (Future Maintenance)
1. Monitor extraction success rates for API change detection
2. Update extraction paths if Facebook changes their API structure
3. Add support for additional marketplace features as they become available
4. Implement caching mechanisms for improved performance
5. Add support for marketplace messaging and negotiation features

448
KIJIJI.md
View File

@@ -1,448 +0,0 @@
# Kijiji API Findings
## Overview
Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management.
## Initial Page Load (Homepage)
- **URL**: https://www.kijiji.ca/
- **Architecture**: Server-side rendered React application with Next.js
- **Data Sources**:
- Static assets loaded from `webapp-static.ca-kijiji-production.classifiedscloud.io`
- Image media served from `media.kijiji.ca/api/v1/`
- No initial API calls for listings - data appears to be embedded in HTML
## Search Results Page
- **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0`
- **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0`
- **Technology Stack**: Next.js with Apollo GraphQL client
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized GraphQL cache
### GraphQL Data Structure
#### Data Location
Search results data is embedded in the Next.js page props under `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data.
#### Search Results Container
The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination parameters.
```json
{
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0": { ... },
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0?page=2": { ... }
}
```
#### Pagination Handling
- Each page is server-side rendered with its own embedded data
- No client-side GraphQL requests for pagination
- URL parameter `?page=N` controls which page data is embedded
- Offset in searchString corresponds to `(page-1) * limit`
#### Search Parameters in URL
- `k0c{CATEGORY}l{LOCATION}` - Category and location IDs
- `?page=N` - Page number (1-based)
- Data contains `offset` and `limit` for API-style pagination
#### Individual Listing Structure
```json
{
"id": "1732061412",
"title": "iPhone 13",
"description": "iPhone 13, always had a screen protector on it...",
"imageCount": 3,
"imageUrls": ["https://media.kijiji.ca/api/v1/ca-prod-fsbo-ads/images/..."],
"categoryId": 760,
"url": "https://www.kijiji.ca/v-cell-phone/...",
"activationDate": "2026-01-21T16:51:16.000Z",
"sortingDate": "2026-01-21T16:51:16.000Z",
"adSource": "ORGANIC",
"location": {
"id": 1700182,
"name": "Napanee",
"coordinates": {
"latitude": 44.48774,
"longitude": -76.99519
}
},
"price": {
"type": "FIXED",
"amount": 35000
},
"flags": {
"topAd": false,
"priceDrop": false
},
"posterInfo": {
"posterId": "1000764154",
"rating": 5
},
"attributes": [
{
"canonicalName": "forsaleby",
"canonicalValues": ["ownr"]
},
{
"canonicalName": "phonecarrier",
"canonicalValues": ["unlck"]
}
]
}
```
### URL Parameters
- `sort=MATCH` - Sort by relevance
- `order=DESC` - Descending order
- `type=OFFER` - Show offerings (not wanted ads)
- `offset=0` - Pagination offset
- `limit=40` - Results per page
- `topAdCount=6` - Number of promoted ads
- `keywords=iphone` - Search keywords
- `category=0` - Category ID (0 = All Categories)
- `location=0` - Location ID (0 = Canada)
- `eaTopAdPosition=1` - ?
### Image API
- **Endpoint**: `https://media.kijiji.ca/api/v1/`
- **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg`
- **Sizes**: 200, 300, 400, 500 pixels
### Categories and Locations
#### Category Structure
Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include:
| ID | Name | Total Results (iPhone search) |
|----|------|------------------------------|
| 10 | Buy & Sell | 19956 |
| 12 | Arts & Collectibles | 149 |
| 767 | Audio | 481 |
| 253 | Baby Items | 13 |
| 931 | Bags & Luggage | 8 |
| 644 | Bikes | 46 |
| 109 | Books | 21 |
| 103 | Cameras & Camcorders | 101 |
| 104 | CDs, DVDs & Blu-ray | 102 |
| 274 | Clothing | 83 |
| 16 | Computers | 285 |
| 128 | Computer Accessories | 363 |
| 29659001 | Electronics | 2006 |
| 17220001 | Free Stuff | 23 |
| 235 | Furniture | 29 |
| 638 | Garage Sales | 5 |
| 140 | Health & Special Needs | 30 |
| 139 | Hobbies & Crafts | 10 |
| 107 | Home Appliances | 23 |
| 717 | Home - Indoor | 27 |
| 727 | Home Renovation Materials | 14 |
| 133 | Jewellery & Watches | 83 |
| 17 | Musical Instruments | 34 |
| 132 | Phones | 15518 |
| 111 | Sporting Goods & Exercise | 30 |
| 110 | Tools | 25 |
| 108 | Toys & Games | 38 |
| 15093001 | TVs & Video | 15 |
| 141 | Video Games & Consoles | 96 |
| 26 | Other | 286 |
#### Location Structure
Locations are also hierarchical, with provinces/states under the main "Canada" location:
| ID | Name | Total Results (iPhone search) |
|----|------|------------------------------|
| 0 | Canada | - |
| 9001 | Québec | 2516 |
| 9002 | Nova Scotia | 875 |
| 9003 | Alberta | 2317 |
| 9004 | Ontario | 12507 |
| 9005 | New Brunswick | 118 |
| 9006 | Manitoba | 919 |
| 9007 | British Columbia | 306 |
| 9008 | Newfoundland | 27 |
| 9009 | Saskatchewan | 336 |
| 9010 | Territories | 7 |
| 9011 | Prince Edward Island | 31 |
#### URL Patterns
- Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0`
- Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}`
- Combined: `/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
### Pagination
- Uses offset-based pagination
- 40 results per page
- Total count provided in pagination metadata
## Authentication & User Management
- **Authentication System**: OAuth2-based using CIS (Customer Identity Service)
- **Identity Provider**: `id.kijiji.ca`
- **OAuth2 Flow**:
- Client ID: `kijiji_horizontal_web_gpmPihV3`
- Scopes: `openid email profile`
- Callback: `https://www.kijiji.ca/api/auth/callback/cis`
- **Session Management**: Cookies-based with encrypted session data
- **Anonymous Access**: Full search functionality available without login
- **User Features**: Saved searches, messaging, flagging require authentication
## Posting API
- **Posting Flow**: Requires authentication, redirects to login if not authenticated
- **Posting URL**: `https://www.kijiji.ca/p-post-ad.html`
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated users
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing)
## GraphQL API Endpoint
- **URL**: `https://www.kijiji.ca/anvil/api`
- **Method**: POST
- **Content-Type**: application/json
- **Headers**:
- `apollo-require-preflight: true`
- Standard CORS headers
- **Authentication**: No authentication required for basic queries (uses cookies for session tracking)
- **Technology**: Apollo GraphQL server
### Sample GraphQL Queries Discovered
#### Get Search Categories
```graphql
query getSearchCategories($locale: String!) {
searchCategories {
id
localizedName(locale: $locale)
parentId
__typename
}
}
```
Variables: `{"locale": "en-CA"}`
Response includes hierarchical category structure with IDs and localized names.
#### Get Geocode from IP (fails for current IP)
```graphql
query GetGeocodeReverseFromIp {
geocodeReverseFromIp {
city
province
locationId
__typename
}
}
```
This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges.
#### Get Category Path
```graphql
query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
category(id: $categoryId) {
id
localizedName(locale: $locale)
parentId
searchSeoUrl(locationId: $locationId)
categoryPaths {
id
localizedName(locale: $locale)
parentId
searchSeoUrl(locationId: $locationId)
__typename
}
__typename
}
}
```
Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}`
## Latest Findings (2026-01-21)
### Client-Side GraphQL Queries Observed
- **getSearchCategories**: Retrieves category hierarchy for search filters
- **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP)
### GraphQL Schema Insights
Testing direct GraphQL queries revealed:
- Field "searchResults" does not exist on Query type
- Suggested alternatives: "searchResultsPage" or "searchUrl"
- This suggests the search functionality may use different GraphQL operations than direct queries
The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation.
### Server-Side Rendering Architecture
Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for:
- Initial search results
- Pagination navigation
- Search result data
### Network Analysis Findings
- GraphQL endpoint: `https://www.kijiji.ca/anvil/api`
- Method: POST
- Content-Type: application/json
- Headers include: `apollo-require-preflight: true`
- Cookies required for session tracking
### Embedded Data Structure
Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object. The data includes:
- Individual ad listings with complete metadata
- Pagination information
- Filter options and counts
- Category/location hierarchies
### Current Scraper Implementation
The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state:
- Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML
- Filters Apollo keys containing "Listing" to find ad data
- Extracts `url`, `title`, and other metadata from each listing
- Successfully scrapes listings without needing API authentication
### Authentication Status
- **Search functionality**: No authentication required - all search and listing data accessible anonymously
- **Posting functionality**: Requires authentication (redirects to login)
- **User features**: Saved searches, messaging require authentication
- **Rate limiting**: May apply but not observed in anonymous browsing
### Pagination Implementation
- Each page is a separate server-rendered route
- URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}`
- No client-side pagination API calls
- 40 results per page (observed)
- Example: `/b-canada/iphone/page-2/k0l0` for page 2 of iPhone search
## URL Pattern Analysis
### Search URL Structure
`https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}`
#### Examples Observed:
- All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada)
- Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones)
- With pagination: `/b-canada/iphone/page-2/k0l0`
#### URL Components:
- `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
- `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.)
- `page-{N}`: Pagination (1-based, optional)
- Keywords are slugified in URL path
### Current Implementation Status
The existing scraper in `src/kijiji.ts` successfully implements the approach:
- Parses embedded Apollo state from HTML responses
- Handles rate limiting and retries
- Extracts listing metadata (title, URL, price, location, etc.)
- Works without authentication for search operations
## Listing Details Page
### Overview
Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server.
### Data Architecture
- **Server-Side Rendering**: Each listing page is fully server-rendered with data embedded in HTML
- **Embedded Apollo State**: Listing data is stored in `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API
### Listing Data Structure
The main listing data follows the same pattern as search results:
```json
{
"id": "1705585530",
"title": "We Pay top cash for iPhone 17 pro max, iPhone 17 pro, iPhone Air",
"description": "Buying All Brand new Apple iPhones sealed/Unsealed...",
"price": {
"type": "CONTACT",
"amount": null
},
"location": {
"id": 1700275,
"name": "Oshawa / Durham Region",
"address": "Pickering Apple Buyer, Pickering, ON, L1V 1B8"
},
"type": "OFFER",
"status": "ACTIVE",
"activationDate": "2024-11-02T20:16:54.000Z",
"endDate": "3000-01-01T00:00:00.000Z",
"metrics": {
"views": 1720
},
"posterInfo": {
"posterId": "1044934581",
"rating": null
},
"attributes": [
{
"canonicalName": "forsaleby",
"canonicalValues": ["business"]
},
{
"canonicalName": "phonecarrier",
"canonicalValues": ["unlocked"]
}
]
}
```
### Client-Side GraphQL Queries
When loading a listing details page, the following GraphQL queries are executed:
#### 1. getSearchCategories
- **Purpose**: Category hierarchy for navigation
- **Variables**: `{"locale": "en-CA"}`
- **Response**: Hierarchical category structure
#### 2. getCampaignsForVip
- **Purpose**: Advertisement targeting data
- **Variables**: `{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
- **Response**: Campaign/ads data (usually null)
#### 3. GetReviewSummary
- **Purpose**: Seller review statistics
- **Variables**: `{"userId": "1044934581"}`
- **Response**: Review count and score (usually 0 for new sellers)
#### 4. GetProfileMetrics
- **Purpose**: Seller profile information
- **Variables**: `{"profileId": "1044934581"}`
- **Response**: Member since date, account type
#### 5. GetListingsSimilar
- **Purpose**: Similar listings for cross-selling
- **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}`
- **Response**: Array of similar listings with basic metadata
#### 6. GetGeocodeReverseFromIp
- **Purpose**: Geolocation-based features
- **Variables**: `{}`
- **Response**: Fails with 404 for most IPs
### Implementation Status
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing details from embedded Apollo state:
- ✅ Extracts title, description, price, location
- ✅ Handles contact-based pricing ("Please Contact")
- ✅ Parses creation date, view count, listing status
- ✅ Extracts seller information and address
- ✅ Works without authentication or API keys
### Key Findings
1. **No Dedicated Listing API**: Unlike search results, there's no separate GraphQL query for individual listing data
2. **Complete Data Available**: All listing information is embedded in the initial HTML response
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data (reviews, similar listings)
4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages
### Current Scraper Implementation
The scraper successfully extracts listing details by:
1. Fetching the listing URL HTML
2. Parsing embedded `__NEXT_DATA__` Apollo state
3. Extracting the `Listing:{id}` object from Apollo cache
4. Mapping fields to typed `ListingDetails` interface
This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches.
## Next Steps
- Explore posting/authentication APIs (requires user login)
- Investigate if GraphQL API can be used for programmatic access with proper authentication
- Test rate limiting patterns and optimal scraping strategies
- Document additional category and location ID mappings

View File

@@ -1 +1 @@
# ca-marketplace-scraper # sone4ka-tok

View File

@@ -1,30 +0,0 @@
{
"$schema": "https://biomejs.dev/schemas/1.9.4/schema.json",
"vcs": {
"enabled": false,
"clientKind": "git",
"useIgnoreFile": false
},
"files": {
"ignoreUnknown": false,
"ignore": []
},
"formatter": {
"enabled": true,
"indentStyle": "space"
},
"organizeImports": {
"enabled": true
},
"linter": {
"enabled": true,
"rules": {
"recommended": true
}
},
"javascript": {
"formatter": {
"quoteStyle": "double"
}
}
}

View File

@@ -1,10 +1,10 @@
{ {
"lockfileVersion": 1, "lockfileVersion": 1,
"configVersion": 0,
"workspaces": { "workspaces": {
"": { "": {
"name": "sone4ka-tok", "name": "sone4ka-tok",
"dependencies": { "dependencies": {
"@types/cli-progress": "^3.11.6",
"cli-progress": "^3.12.0", "cli-progress": "^3.12.0",
"linkedom": "^0.18.12", "linkedom": "^0.18.12",
"unidecode": "^1.1.0", "unidecode": "^1.1.0",
@@ -13,7 +13,6 @@
"@anthropic-ai/claude-code": "^2.0.1", "@anthropic-ai/claude-code": "^2.0.1",
"@musistudio/claude-code-router": "^1.0.53", "@musistudio/claude-code-router": "^1.0.53",
"@types/bun": "latest", "@types/bun": "latest",
"@types/cli-progress": "^3.11.6",
"@types/unidecode": "^1.1.0", "@types/unidecode": "^1.1.0",
}, },
"peerDependencies": { "peerDependencies": {

View File

@@ -1,3 +0,0 @@
[test]
# Test configuration
preload = ["./test/setup.ts"]

View File

@@ -1,6 +1,5 @@
services: services:
ca-marketplace-scraper: marketplace-scraper:
container_name: ca-marketplace-scraper
build: . build: .
ports: ports:
- "4005:4005" - "4005:4005"
@@ -14,9 +13,3 @@ services:
retries: 3 retries: 3
start_period: 5s start_period: 5s
restart: unless-stopped restart: unless-stopped
networks:
- internal
networks:
internal:
driver: bridge
name: ca-marketplace-scraper-network

View File

@@ -1,27 +0,0 @@
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"chrome-devtools": {
"type": "local",
"command": [
"bunx",
"--bun",
"chrome-devtools-mcp@latest",
"--log-file",
"./debug.log",
"--headless=false",
"--isolated=false",
"-e",
"/nix/store/lz8ajxhnkkw2llj752bdz41wqr645h9c-google-chrome-dev-146.0.7635.0/bin/google-chrome-unstable",
"--ignore-default-chrome-arg='--disable-extensions'"
],
"enabled": false
},
"bun-docs": {
"type": "remote",
"url": "https://bun.com/docs/mcp",
"timeout": 3000,
"enabled": false
}
}
}

View File

@@ -1,183 +0,0 @@
#!/usr/bin/env bun
/**
* Facebook Cookie Parser CLI
*
* Parses Facebook cookie strings into JSON format for the marketplace scraper
*
* Usage:
* bun run scripts/parse-facebook-cookies.ts "c_user=123; xs=abc"
* bun run scripts/parse-facebook-cookies.ts --input cookies.txt
* echo "c_user=123; xs=abc" | bun run scripts/parse-facebook-cookies.ts
* bun run scripts/parse-facebook-cookies.ts "cookie_string" --output my-cookies.json
*/
import { parseFacebookCookieString } from "../src/facebook";
interface Cookie {
name: string;
value: string;
domain: string;
path: string;
secure?: boolean;
httpOnly?: boolean;
sameSite?: "strict" | "lax" | "none" | "unspecified";
expirationDate?: number;
storeId?: string;
}
function parseFacebookCookieStringCLI(cookieString: string): Cookie[] {
if (!cookieString || !cookieString.trim()) {
console.error("❌ Error: Empty or invalid cookie string provided");
process.exit(1);
}
const cookies = parseFacebookCookieString(cookieString);
if (cookies.length === 0) {
console.error("❌ Error: No valid cookies found in input string");
console.error('Expected format: "name1=value1; name2=value2;"');
process.exit(1);
}
return cookies;
}
async function main() {
const args = process.argv.slice(2);
if (args.length === 0 && process.stdin.isTTY === false) {
// Read from stdin
let input = "";
for await (const chunk of process.stdin) {
input += chunk;
}
input = input.trim();
if (!input) {
console.error("❌ Error: No input provided via stdin");
process.exit(1);
}
const cookies = parseFacebookCookieStringCLI(input);
await writeOutput(cookies, "./cookies/facebook.json");
return;
}
let cookieString = "";
let outputPath = "./cookies/facebook.json";
let inputPath = "";
// Parse command line arguments
for (let i = 0; i < args.length; i++) {
const arg = args[i];
if (arg === "--input" || arg === "-i") {
inputPath = args[i + 1];
i++; // Skip next arg
} else if (arg === "--output" || arg === "-o") {
outputPath = args[i + 1];
i++; // Skip next arg
} else if (arg === "--help" || arg === "-h") {
showHelp();
return;
} else if (!arg.startsWith("-")) {
// Assume this is the cookie string
cookieString = arg;
} else {
console.error(`❌ Unknown option: ${arg}`);
showHelp();
process.exit(1);
}
}
// Read from file if specified
if (inputPath) {
try {
const file = Bun.file(inputPath);
if (!(await file.exists())) {
console.error(`❌ Error: Input file not found: ${inputPath}`);
process.exit(1);
}
cookieString = await file.text();
} catch (error) {
console.error(`❌ Error reading input file: ${error}`);
process.exit(1);
}
}
if (!cookieString.trim()) {
console.error("❌ Error: No cookie string provided");
console.error(
"Provide cookie string as argument, --input file, or via stdin",
);
showHelp();
process.exit(1);
}
const cookies = parseFacebookCookieStringCLI(cookieString);
await writeOutput(cookies, outputPath);
}
async function writeOutput(cookies: Cookie[], outputPath: string) {
try {
await Bun.write(outputPath, JSON.stringify(cookies, null, 2));
console.log(`✅ Successfully parsed ${cookies.length} Facebook cookies`);
console.log(`📁 Saved to: ${outputPath}`);
// Show summary of parsed cookies
console.log("\n📋 Parsed cookies:");
for (const cookie of cookies) {
console.log(
`${cookie.name}: ${cookie.value.substring(0, 20)}${cookie.value.length > 20 ? "..." : ""}`,
);
}
} catch (error) {
console.error(`❌ Error writing to output file: ${error}`);
process.exit(1);
}
}
function showHelp() {
console.log(`
Facebook Cookie Parser CLI
Parses Facebook cookie strings into JSON format for the marketplace scraper.
USAGE:
bun run scripts/parse-facebook-cookies.ts [OPTIONS] [COOKIE_STRING]
EXAMPLES:
# Parse from command line argument
bun run scripts/parse-facebook-cookies.ts "c_user=123; xs=abc"
# Parse from file
bun run scripts/parse-facebook-cookies.ts --input cookies.txt
# Parse from stdin
echo "c_user=123; xs=abc" | bun run scripts/parse-facebook-cookies.ts
# Output to custom file
bun run scripts/parse-facebook-cookies.ts "cookie_string" --output my-cookies.json
OPTIONS:
-i, --input FILE Read cookie string from file
-o, --output FILE Output file path (default: ./cookies/facebook.json)
-h, --help Show this help message
COOKIE FORMAT:
Semicolon-separated name=value pairs
Example: "c_user=123456789; xs=abcdef123456; fr=xyz789"
OUTPUT:
JSON array of cookie objects saved to ./cookies/facebook.json
`);
}
// Run the CLI
if (import.meta.main) {
main().catch((error) => {
console.error(`❌ Unexpected error: ${error}`);
process.exit(1);
});
}

View File

@@ -1,6 +1,6 @@
import cliProgress from "cli-progress";
/* eslint-disable @typescript-eslint/no-explicit-any */ /* eslint-disable @typescript-eslint/no-explicit-any */
import { parseHTML } from "linkedom"; import { parseHTML } from "linkedom";
import cliProgress from "cli-progress";
// ----------------------------- Types ----------------------------- // ----------------------------- Types -----------------------------
@@ -55,10 +55,8 @@ function formatCentsToCurrency(
/** /**
* Parse eBay currency string like "$1.50 CAD" or "CA $1.50" into cents * Parse eBay currency string like "$1.50 CAD" or "CA $1.50" into cents
*/ */
function parseEbayPrice( function parseEbayPrice(priceText: string): { cents: number; currency: string } | null {
priceText: string, if (!priceText || typeof priceText !== 'string') return null;
): { cents: number; currency: string } | null {
if (!priceText || typeof priceText !== "string") return null;
// Clean up the price text and extract currency and amount // Clean up the price text and extract currency and amount
const cleaned = priceText.trim(); const cleaned = priceText.trim();
@@ -67,23 +65,19 @@ function parseEbayPrice(
const numberMatches = cleaned.match(/[\d,]+\.?\d*/); const numberMatches = cleaned.match(/[\d,]+\.?\d*/);
if (!numberMatches) return null; if (!numberMatches) return null;
const amountStr = numberMatches[0].replace(/,/g, ""); const amountStr = numberMatches[0].replace(/,/g, '');
const dollars = Number.parseFloat(amountStr); const dollars = parseFloat(amountStr);
if (Number.isNaN(dollars)) return null; if (isNaN(dollars)) return null;
const cents = Math.round(dollars * 100); const cents = Math.round(dollars * 100);
// Extract currency - look for common formats like "CAD", "USD", "C $", "$CA", etc. // Extract currency - look for common formats like "CAD", "USD", "C $", "$CA", etc.
let currency = "USD"; // Default let currency = 'USD'; // Default
if ( if (cleaned.toUpperCase().includes('CAD') || cleaned.includes('CA$') || cleaned.includes('C $')) {
cleaned.toUpperCase().includes("CAD") || currency = 'CAD';
cleaned.includes("CA$") || } else if (cleaned.toUpperCase().includes('USD') || cleaned.includes('$')) {
cleaned.includes("C $") currency = 'USD';
) {
currency = "CAD";
} else if (cleaned.toUpperCase().includes("USD") || cleaned.includes("$")) {
currency = "USD";
} }
return { cents, currency }; return { cents, currency };
@@ -141,9 +135,7 @@ async function fetchHtml(
if (!res.ok) { if (!res.ok) {
// Respect 429 reset if provided // Respect 429 reset if provided
if (res.status === 429) { if (res.status === 429) {
const resetSeconds = rateLimitReset const resetSeconds = rateLimitReset ? Number(rateLimitReset) : NaN;
? Number(rateLimitReset)
: Number.NaN;
const waitMs = Number.isFinite(resetSeconds) const waitMs = Number.isFinite(resetSeconds)
? Math.max(0, resetSeconds * 1000) ? Math.max(0, resetSeconds * 1000)
: (attempt + 1) * retryBaseMs; : (attempt + 1) * retryBaseMs;
@@ -184,7 +176,7 @@ function parseEbayListings(
htmlString: HTMLString, htmlString: HTMLString,
keywords: string[], keywords: string[],
exclusions: string[], exclusions: string[],
strictMode: boolean, strictMode: boolean
): ListingDetails[] { ): ListingDetails[] {
const { document } = parseHTML(htmlString); const { document } = parseHTML(htmlString);
const results: ListingDetails[] = []; const results: ListingDetails[] = [];
@@ -192,17 +184,16 @@ function parseEbayListings(
// Find all listing links by looking for eBay item URLs (/itm/) // Find all listing links by looking for eBay item URLs (/itm/)
const linkElements = document.querySelectorAll('a[href*="itm/"]'); const linkElements = document.querySelectorAll('a[href*="itm/"]');
for (const linkElement of linkElements) { for (const linkElement of linkElements) {
try { try {
// Get href attribute // Get href attribute
let href = linkElement.getAttribute("href"); let href = linkElement.getAttribute('href');
if (!href) continue; if (!href) continue;
// Make href absolute // Make href absolute
if (!href.startsWith("http")) { if (!href.startsWith('http')) {
href = href.startsWith("//") href = href.startsWith('//') ? `https:${href}` : `https://www.ebay.com${href}`;
? `https:${href}`
: `https://www.ebay.com${href}`;
} }
// Find the container - go up several levels to find the item container // Find the container - go up several levels to find the item container
@@ -216,23 +207,15 @@ function parseEbayListings(
// Extract title - look for heading or title-related elements near the link // Extract title - look for heading or title-related elements near the link
// Modern eBay often uses h3, span, or div with text content near the link // Modern eBay often uses h3, span, or div with text content near the link
let titleElement = container.querySelector( let titleElement = container.querySelector('h3, [role="heading"], .s-item__title span');
'h3, [role="heading"], .s-item__title span',
);
// If no direct title element, try finding text content around the link // If no direct title element, try finding text content around the link
if (!titleElement) { if (!titleElement) {
// Look for spans or divs with text near this link // Look for spans or divs with text near this link
const nearbySpans = container.querySelectorAll("span, div"); const nearbySpans = container.querySelectorAll('span, div');
for (const span of nearbySpans) { for (const span of nearbySpans) {
const text = span.textContent?.trim(); const text = span.textContent?.trim();
if ( if (text && text.length > 10 && text.length < 200 && !text.includes('$') && !text.includes('item')) {
text &&
text.length > 10 &&
text.length < 200 &&
!text.includes("$") &&
!text.includes("item")
) {
titleElement = span; titleElement = span;
break; break;
} }
@@ -245,12 +228,12 @@ function parseEbayListings(
if (title) { if (title) {
// Remove common eBay UI strings that appear at the end of titles // Remove common eBay UI strings that appear at the end of titles
const uiStrings = [ const uiStrings = [
"Opens in a new window", 'Opens in a new window',
"Opens in a new tab", 'Opens in a new tab',
"Opens in a new window or tab", 'Opens in a new window or tab',
"opens in a new window", 'opens in a new window',
"opens in a new tab", 'opens in a new tab',
"opens in a new window or tab", 'opens in a new window or tab'
]; ];
for (const uiString of uiStrings) { for (const uiString of uiStrings) {
@@ -273,27 +256,17 @@ function parseEbayListings(
if (title === "Shop on eBay" || title.length < 3) continue; if (title === "Shop on eBay" || title.length < 3) continue;
// Extract price - look for eBay's price classes, preferring sale/discount prices // Extract price - look for eBay's price classes, preferring sale/discount prices
let priceElement = container.querySelector( let priceElement = container.querySelector('[class*="s-item__price"], .s-item__price, [class*="price"]');
'[class*="s-item__price"], .s-item__price, [class*="price"]',
);
// If no direct price class, look for spans containing $ (but not titles) // If no direct price class, look for spans containing $ (but not titles)
if (!priceElement) { if (!priceElement) {
const spansAndElements = container.querySelectorAll( const spansAndElements = container.querySelectorAll('span, div, b, em, strong');
"span, div, b, em, strong",
);
for (const el of spansAndElements) { for (const el of spansAndElements) {
const text = el.textContent?.trim(); const text = el.textContent?.trim();
// Must contain $, be reasonably short (price shouldn't be paragraph), and not contain product words // Must contain $, be reasonably short (price shouldn't be paragraph), and not contain product words
if ( if (text && text.includes('$') && text.length < 100 &&
text?.includes("$") && !text.includes('laptop') && !text.includes('computer') && !text.includes('intel') &&
text.length < 100 && !text.includes('core') && !text.includes('ram') && !text.includes('ssd') &&
!text.includes("laptop") &&
!text.includes("computer") &&
!text.includes("intel") &&
!text.includes("core") &&
!text.includes("ram") &&
!text.includes("ssd") &&
! /\d{4}/.test(text) && // Avoid years like "2024" ! /\d{4}/.test(text) && // Avoid years like "2024"
!text.includes('"') // Avoid measurements !text.includes('"') // Avoid measurements
) { ) {
@@ -307,26 +280,17 @@ function parseEbayListings(
// Prefer sale/current price over original/strikethrough price // Prefer sale/current price over original/strikethrough price
if (priceElement) { if (priceElement) {
// Check if this element or its parent contains multiple price elements // Check if this element or its parent contains multiple price elements
const priceContainer = const priceContainer = priceElement.closest('[class*="s-item__price"]') || priceElement.parentElement;
priceElement.closest('[class*="s-item__price"]') ||
priceElement.parentElement;
if (priceContainer) { if (priceContainer) {
// Look for all price elements within this container, including strikethrough prices // Look for all price elements within this container, including strikethrough prices
const allPriceElements = priceContainer.querySelectorAll( const allPriceElements = priceContainer.querySelectorAll('[class*="s-item__price"], span, b, em, strong, s, del, strike');
'[class*="s-item__price"], span, b, em, strong, s, del, strike',
);
// Filter to only elements that actually contain prices (not labels) // Filter to only elements that actually contain prices (not labels)
const actualPrices: HTMLElement[] = []; const actualPrices: HTMLElement[] = [];
for (const el of allPriceElements) { for (const el of allPriceElements) {
const text = el.textContent?.trim(); const text = el.textContent?.trim();
if ( if (text && /^\s*[\$£¥]/u.test(text) && text.length < 50 && !/\d{4}/.test(text)) {
text &&
/^\s*[\$£¥]/u.test(text) &&
text.length < 50 &&
!/\d{4}/.test(text)
) {
actualPrices.push(el); actualPrices.push(el);
} }
} }
@@ -334,18 +298,11 @@ function parseEbayListings(
// Prefer non-strikethrough prices (sale prices) over strikethrough ones (original prices) // Prefer non-strikethrough prices (sale prices) over strikethrough ones (original prices)
if (actualPrices.length > 1) { if (actualPrices.length > 1) {
// First, look for prices that are NOT struck through // First, look for prices that are NOT struck through
const nonStrikethroughPrices = actualPrices.filter((el) => { const nonStrikethroughPrices = actualPrices.filter(el => {
const tagName = el.tagName.toLowerCase(); const tagName = el.tagName.toLowerCase();
const styles = const styles = el.classList.contains('s-strikethrough') || el.classList.contains('u-flStrike') ||
el.classList.contains("s-strikethrough") || el.closest('s, del, strike');
el.classList.contains("u-flStrike") || return tagName !== 's' && tagName !== 'del' && tagName !== 'strike' && !styles;
el.closest("s, del, strike");
return (
tagName !== "s" &&
tagName !== "del" &&
tagName !== "strike" &&
!styles
);
}); });
if (nonStrikethroughPrices.length > 0) { if (nonStrikethroughPrices.length > 0) {
@@ -360,7 +317,7 @@ function parseEbayListings(
} }
} }
const priceText = priceElement?.textContent?.trim(); let priceText = priceElement?.textContent?.trim();
if (!priceText) continue; if (!priceText) continue;
@@ -369,21 +326,12 @@ function parseEbayListings(
if (!priceInfo) continue; if (!priceInfo) continue;
// Apply exclusion filters // Apply exclusion filters
if ( if (exclusions.some(exclusion => title.toLowerCase().includes(exclusion.toLowerCase()))) {
exclusions.some((exclusion) =>
title.toLowerCase().includes(exclusion.toLowerCase()),
)
) {
continue; continue;
} }
// Apply strict mode filter (title must contain at least one keyword) // Apply strict mode filter (title must contain at least one keyword)
if ( if (strictMode && !keywords.some(keyword => title!.toLowerCase().includes(keyword.toLowerCase()))) {
strictMode &&
!keywords.some((keyword) =>
title?.toLowerCase().includes(keyword.toLowerCase()),
)
) {
continue; continue;
} }
@@ -403,6 +351,7 @@ function parseEbayListings(
results.push(listing); results.push(listing);
} catch (err) { } catch (err) {
console.warn(`Error parsing eBay listing: ${err}`); console.warn(`Error parsing eBay listing: ${err}`);
continue;
} }
} }
@@ -427,7 +376,7 @@ export default async function fetchEbayItems(
maxPrice = Number.MAX_SAFE_INTEGER, maxPrice = Number.MAX_SAFE_INTEGER,
strictMode = false, strictMode = false,
exclusions = [], exclusions = [],
keywords = [SEARCH_QUERY], // Default to search query if no keywords provided keywords = [SEARCH_QUERY] // Default to search query if no keywords provided
} = opts; } = opts;
// Build eBay search URL - use Canadian site and tracking parameters like real browser // Build eBay search URL - use Canadian site and tracking parameters like real browser
@@ -440,19 +389,18 @@ export default async function fetchEbayItems(
try { try {
// Use custom headers modeled after real browser requests to bypass bot detection // Use custom headers modeled after real browser requests to bypass bot detection
const headers: Record<string, string> = { const headers: Record<string, string> = {
"User-Agent": 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:141.0) Gecko/20100101 Firefox/141.0',
"Mozilla/5.0 (X11; Linux x86_64; rv:141.0) Gecko/20100101 Firefox/141.0", 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 'Accept-Language': 'en-US,en;q=0.5',
"Accept-Language": "en-US,en;q=0.5", 'Accept-Encoding': 'gzip, deflate, br',
"Accept-Encoding": "gzip, deflate, br", 'Referer': 'https://www.ebay.ca/',
Referer: "https://www.ebay.ca/", 'Connection': 'keep-alive',
Connection: "keep-alive", 'Upgrade-Insecure-Requests': '1',
"Upgrade-Insecure-Requests": "1", 'Sec-Fetch-Dest': 'document',
"Sec-Fetch-Dest": "document", 'Sec-Fetch-Mode': 'navigate',
"Sec-Fetch-Mode": "navigate", 'Sec-Fetch-Site': 'same-origin',
"Sec-Fetch-Site": "same-origin", 'Sec-Fetch-User': '?1',
"Sec-Fetch-User": "?1", 'Priority': 'u=0, i'
Priority: "u=0, i",
}; };
const res = await fetch(searchUrl, { const res = await fetch(searchUrl, {
@@ -472,23 +420,19 @@ export default async function fetchEbayItems(
// Respect per-request delay to keep at or under REQUESTS_PER_SECOND // Respect per-request delay to keep at or under REQUESTS_PER_SECOND
await delay(DELAY_MS); await delay(DELAY_MS);
console.log("\nParsing eBay listings..."); console.log(`\nParsing eBay listings...`);
const listings = parseEbayListings( const listings = parseEbayListings(searchHtml, keywords, exclusions, strictMode);
searchHtml,
keywords,
exclusions,
strictMode,
);
// Filter by price range (additional safety check) // Filter by price range (additional safety check)
const filteredListings = listings.filter((listing) => { const filteredListings = listings.filter(listing => {
const cents = listing.listingPrice?.cents; const cents = listing.listingPrice?.cents;
return cents && cents >= minPrice && cents <= maxPrice; return cents && cents >= minPrice && cents <= maxPrice;
}); });
console.log(`Parsed ${filteredListings.length} eBay listings.`); console.log(`Parsed ${filteredListings.length} eBay listings.`);
return filteredListings; return filteredListings;
} catch (err) { } catch (err) {
if (err instanceof HttpError) { if (err instanceof HttpError) {
console.error( console.error(

View File

@@ -1,6 +1,6 @@
import cliProgress from "cli-progress";
/* eslint-disable @typescript-eslint/no-explicit-any */ /* eslint-disable @typescript-eslint/no-explicit-any */
import { parseHTML } from "linkedom"; import { parseHTML } from "linkedom";
import cliProgress from "cli-progress";
/** /**
* Facebook Marketplace Scraper * Facebook Marketplace Scraper
@@ -24,7 +24,7 @@ interface Cookie {
sameSite?: "strict" | "lax" | "none" | "unspecified"; sameSite?: "strict" | "lax" | "none" | "unspecified";
session?: boolean; session?: boolean;
expirationDate?: number; expirationDate?: number;
partitionKey?: Record<string, unknown>; partitionKey?: any;
storeId?: string; storeId?: string;
} }
@@ -68,112 +68,6 @@ interface FacebookRequireData {
[k: string]: unknown; [k: string]: unknown;
} }
interface FacebookMarketplaceItem {
// Basic identification
id: string;
__typename: "GroupCommerceProductItem";
// Listing content
marketplace_listing_title: string;
redacted_description?: {
text: string;
};
custom_title?: string;
// Pricing
formatted_price?: {
text: string;
};
listing_price?: {
amount: string;
currency: string;
amount_with_offset: string;
};
// Location
location_text?: {
text: string;
};
location?: {
latitude: number;
longitude: number;
reverse_geocode_detailed?: {
country_alpha_two: string;
postal_code_trimmed: string;
};
};
// Status flags
is_live?: boolean;
is_sold?: boolean;
is_pending?: boolean;
is_hidden?: boolean;
is_draft?: boolean;
// Timing
creation_time?: number;
// Seller information
marketplace_listing_seller?: {
__typename: "User";
id: string;
name: string;
profile_picture?: {
uri: string;
};
join_time?: number;
};
// Vehicle-specific fields (for automotive listings)
vehicle_make_display_name?: string;
vehicle_model_display_name?: string;
vehicle_odometer_data?: {
unit: "KILOMETERS" | "MILES";
value: number;
};
vehicle_transmission_type?: "AUTOMATIC" | "MANUAL";
vehicle_exterior_color?: string;
vehicle_interior_color?: string;
vehicle_condition?: "EXCELLENT" | "GOOD" | "FAIR" | "POOR";
vehicle_fuel_type?: string;
vehicle_trim_display_name?: string;
// Category and commerce
marketplace_listing_category_id?: string;
condition?: string;
// Commerce features
delivery_types?: string[];
is_shipping_offered?: boolean;
is_buy_now_enabled?: boolean;
can_buyer_make_checkout_offer?: boolean;
// Communication
messaging_enabled?: boolean;
first_message_suggested_value?: string;
// Metadata
logging_id?: string;
reportable_ent_id?: string;
// Related listings (for part-out sellers)
marketplace_listing_sets?: {
edges: Array<{
node: {
canonical_listing: {
id: string;
marketplace_listing_title: string;
is_live: boolean;
is_sold: boolean;
formatted_price: { text: string };
};
};
}>;
};
[k: string]: unknown;
}
type ListingDetails = { type ListingDetails = {
url: string; url: string;
title: string; title: string;
@@ -213,10 +107,7 @@ async function delay(ms: number): Promise<void> {
/** /**
* Load Facebook cookies from file or string * Load Facebook cookies from file or string
*/ */
async function loadFacebookCookies( async function loadFacebookCookies(cookiesSource?: string): Promise<Cookie[]> {
cookiesSource?: string,
cookiePath = "./cookies/facebook.json",
): Promise<Cookie[]> {
// First try to load from provided string parameter // First try to load from provided string parameter
if (cookiesSource) { if (cookiesSource) {
try { try {
@@ -234,9 +125,9 @@ async function loadFacebookCookies(
} }
} }
// Try to load from specified path // Try to load from ./cookies/facebook.json
try { try {
const cookiesPath = cookiePath; const cookiesPath = "./cookies/facebook.json";
const file = Bun.file(cookiesPath); const file = Bun.file(cookiesPath);
if (await file.exists()) { if (await file.exists()) {
const content = await file.text(); const content = await file.text();
@@ -257,89 +148,6 @@ async function loadFacebookCookies(
return []; return [];
} }
/**
* Parse Facebook cookie string into Cookie array format
*/
function parseFacebookCookieString(cookieString: string): Cookie[] {
if (!cookieString || !cookieString.trim()) {
return [];
}
return cookieString
.split(";")
.map((pair) => pair.trim())
.filter((pair) => pair.includes("="))
.map((pair) => {
const [name, value] = pair.split("=", 2);
const trimmedName = name.trim();
const trimmedValue = value.trim();
// Skip empty names or values
if (!trimmedName || !trimmedValue) {
return null;
}
return {
name: trimmedName,
value: decodeURIComponent(trimmedValue),
domain: ".facebook.com",
path: "/",
secure: true,
httpOnly: false,
sameSite: "lax" as const,
expirationDate: undefined, // Session cookies
};
})
.filter((cookie): cookie is Cookie => cookie !== null);
}
/**
* Ensure Facebook cookies are available, parsing from env var if needed
*/
async function ensureFacebookCookies(
cookiePath = "./cookies/facebook.json",
): Promise<Cookie[]> {
// First try to load existing cookies
try {
const existing = await loadFacebookCookies(undefined, cookiePath);
if (existing.length > 0) {
return existing;
}
} catch (error) {
// File doesn't exist or is invalid, continue to check env var
}
// Try to parse from environment variable
const cookieString = process.env.FACEBOOK_COOKIE;
if (!cookieString || !cookieString.trim()) {
throw new Error(
"No valid Facebook cookies found. Either:\n" +
" 1. Set FACEBOOK_COOKIE environment variable with cookie string, or\n" +
" 2. Create ./cookies/facebook.json manually with cookie array",
);
}
// Parse the cookie string
const cookies = parseFacebookCookieString(cookieString);
if (cookies.length === 0) {
throw new Error(
"FACEBOOK_COOKIE environment variable contains no valid cookies. " +
'Expected format: "name1=value1; name2=value2;"',
);
}
// Save to file for future use
try {
await Bun.write(cookiePath, JSON.stringify(cookies, null, 2));
console.log(`✅ Saved ${cookies.length} Facebook cookies to ${cookiePath}`);
} catch (error) {
console.warn(`! Could not save cookies to ${cookiePath}: ${error}`);
// Continue anyway, we have the cookies in memory
}
return cookies;
}
/** /**
* Format cookies array into Cookie header string * Format cookies array into Cookie header string
*/ */
@@ -353,9 +161,10 @@ function formatCookiesForHeader(cookies: Cookie[], domain: string): string {
domain.endsWith(cookie.domain.slice(1)) || domain.endsWith(cookie.domain.slice(1)) ||
domain === cookie.domain.slice(1) domain === cookie.domain.slice(1)
); );
} } else {
// Host-only cookie // Host-only cookie
return cookie.domain === domain; return cookie.domain === domain;
}
}) })
.filter((cookie) => { .filter((cookie) => {
// Check expiration // Check expiration
@@ -421,7 +230,7 @@ async function fetchHtml(
// Add cookies if provided // Add cookies if provided
if (opts?.cookies) { if (opts?.cookies) {
headers.cookie = opts.cookies; headers["cookie"] = opts.cookies;
} }
const res = await fetch(url, { const res = await fetch(url, {
@@ -436,9 +245,7 @@ async function fetchHtml(
if (!res.ok) { if (!res.ok) {
// Respect 429 reset if provided // Respect 429 reset if provided
if (res.status === 429) { if (res.status === 429) {
const resetSeconds = rateLimitReset const resetSeconds = rateLimitReset ? Number(rateLimitReset) : NaN;
? Number(rateLimitReset)
: Number.NaN;
const waitMs = Number.isFinite(resetSeconds) const waitMs = Number.isFinite(resetSeconds)
? Math.max(0, resetSeconds * 1000) ? Math.max(0, resetSeconds * 1000)
: (attempt + 1) * retryBaseMs; : (attempt + 1) * retryBaseMs;
@@ -493,7 +300,7 @@ function extractFacebookMarketplaceData(
let marketplaceData: FacebookMarketplaceSearch | null = null; let marketplaceData: FacebookMarketplaceSearch | null = null;
// Find the script containing the require data with marketplace_search // Find the script containing the require data with marketplace_search
for (const script of Array.from(scripts) as HTMLScriptElement[]) { for (const script of scripts as unknown as HTMLScriptElement[]) {
const scriptText = script.textContent; const scriptText = script.textContent;
if (!scriptText) continue; if (!scriptText) continue;
@@ -505,114 +312,58 @@ function extractFacebookMarketplaceData(
// Try multiple navigation paths to find marketplace_search // Try multiple navigation paths to find marketplace_search
const paths = [ const paths = [
// Original path from example // Original path from example
() => () => parsed.require[0][3][0]['__bbox']['require'][0][3][1]['__bbox']['result']['data']['marketplace_search'],
parsed.require[0][3][0].__bbox.require[0][3][1].__bbox.result.data
.marketplace_search,
// Alternative path structure // Alternative path structure
() => () => parsed.require[0][3][1]?.__bbox?.result?.data?.marketplace_search,
parsed.require[0][3][1]?.__bbox?.result?.data?.marketplace_search,
// Another variation // Another variation
() => parsed.require[0][3][0].__bbox.result.data.marketplace_search, () => parsed.require[0][3][0]['__bbox']['result']['data']['marketplace_search'],
// Direct access for some responses // Direct access for some responses
() => { () => {
for (const item of parsed.require) { for (const item of parsed.require) {
if (item && item.length >= 4 && item[3]) { if (item && item.length >= 4 && item[3]) {
const bbox = item[3]?.__bbox?.result?.data?.marketplace_search; const bbox = item[3]?.['__bbox']?.result?.data?.marketplace_search;
if (bbox) return bbox; if (bbox) return bbox;
} }
} }
return null; return null;
}, }
]; ];
for (const getData of paths) { for (const getData of paths) {
try { try {
const result = getData(); const result = getData();
if ( if (result && isRecord(result) && result.feed_units?.edges) {
result &&
isRecord(result) &&
result.feed_units?.edges?.length > 0
) {
marketplaceData = result as FacebookMarketplaceSearch; marketplaceData = result as FacebookMarketplaceSearch;
break; break;
} }
} catch {} } catch {
continue;
}
} }
if (marketplaceData) break; if (marketplaceData) break;
} }
// Also check for direct marketplace_search in the parsed data // Also check for direct marketplace_search in the parsed data
if (parsed.marketplace_search && isRecord(parsed.marketplace_search)) { if (parsed.marketplace_search && isRecord(parsed.marketplace_search) && parsed.marketplace_search.feed_units?.edges) {
const searchData = marketplaceData = parsed.marketplace_search as FacebookMarketplaceSearch;
parsed.marketplace_search as FacebookMarketplaceSearch;
if (searchData.feed_units?.edges?.length > 0) {
marketplaceData = searchData;
break; break;
} }
}
} catch { } catch {
// Ignore parsing errors for other scripts // Ignore parsing errors for other scripts
continue;
} }
} }
if (!marketplaceData?.feed_units?.edges?.length) { if (!marketplaceData?.feed_units?.edges) {
console.warn("No marketplace data found in HTML response"); console.warn("No marketplace data found in HTML response");
return null; return null;
} }
console.log( console.log(`Successfully parsed ${marketplaceData.feed_units.edges.length} Facebook marketplace listings`);
`Successfully parsed ${marketplaceData.feed_units.edges.length} Facebook marketplace listings`,
);
return marketplaceData.feed_units.edges.map((edge) => ({ node: edge.node })); return marketplaceData.feed_units.edges.map((edge) => ({ node: edge.node }));
} }
/**
* Monitor API extraction success/failure for detecting changes
*/
const extractionStats = {
totalExtractions: 0,
successfulExtractions: 0,
failedExtractions: 0,
lastApiChangeDetected: null as Date | null,
};
/**
* Log extraction metrics for monitoring API stability
*/
function logExtractionMetrics(success: boolean, itemId?: string) {
extractionStats.totalExtractions++;
if (success) {
extractionStats.successfulExtractions++;
} else {
extractionStats.failedExtractions++;
}
// Log warning if extraction success rate drops below 80%
const successRate =
extractionStats.successfulExtractions / extractionStats.totalExtractions;
if (
extractionStats.totalExtractions > 10 &&
successRate < 0.8 &&
!extractionStats.lastApiChangeDetected
) {
console.warn(
"! Facebook Marketplace API extraction success rate dropped below 80%. This may indicate API changes.",
);
extractionStats.lastApiChangeDetected = new Date();
}
if (success) {
console.log(
`📊 Facebook API extraction stats: ${extractionStats.successfulExtractions}/${extractionStats.totalExtractions} successful`,
);
} else {
console.warn(
`❌ Facebook API extraction failed for item ${itemId || "unknown"}`,
);
}
}
/** /**
* Turns cents to localized currency string. * Turns cents to localized currency string.
*/ */
@@ -625,8 +376,6 @@ function formatCentsToCurrency(
if (Number.isNaN(cents)) return ""; if (Number.isNaN(cents)) return "";
const dollars = cents / 100; const dollars = cents / 100;
const formatter = new Intl.NumberFormat(locale, { const formatter = new Intl.NumberFormat(locale, {
style: "currency",
currency: "USD",
minimumFractionDigits: 2, minimumFractionDigits: 2,
maximumFractionDigits: 2, maximumFractionDigits: 2,
useGrouping: true, useGrouping: true,
@@ -634,145 +383,6 @@ function formatCentsToCurrency(
return formatter.format(dollars); return formatter.format(dollars);
} }
/**
Extract marketplace item details from Facebook item page HTML
Updated for 2026 Facebook Marketplace API structure with multiple extraction paths
*/
function extractFacebookItemData(
htmlString: HTMLString,
): FacebookMarketplaceItem | null {
const { document } = parseHTML(htmlString);
const scripts = document.querySelectorAll("script");
for (const script of scripts) {
const scriptText = script.textContent;
if (!scriptText) continue;
try {
const parsed = JSON.parse(scriptText);
// Check for the 2026 require structure with marketplace product details
if (parsed.require && Array.isArray(parsed.require)) {
// Try multiple extraction paths discovered from reverse engineering
const extractionPaths = [
// Path 1: Primary path from current API structure
() =>
parsed.require[0][3].__bbox.result.data.viewer
.marketplace_product_details_page.target,
// Path 2: Alternative path with nested require
() =>
parsed.require[0][3][0].__bbox.require[3][3][1].__bbox.result.data
.viewer.marketplace_product_details_page.target,
// Path 3: Variation without the [0] index
() =>
parsed.require[0][3].__bbox.require[3][3][1].__bbox.result.data
.viewer.marketplace_product_details_page.target,
// Path 4-5: Additional fallback paths for edge cases
() =>
parsed.require[0][3][1]?.__bbox?.result?.data?.viewer
?.marketplace_product_details_page?.target,
() =>
parsed.require[0][3][2]?.__bbox?.result?.data?.viewer
?.marketplace_product_details_page?.target,
];
let pathIndex = 0;
for (const getPath of extractionPaths) {
try {
const targetData = getPath();
if (
targetData &&
typeof targetData === "object" &&
targetData.id &&
targetData.marketplace_listing_title &&
targetData.__typename === "GroupCommerceProductItem"
) {
console.log(
`Successfully extracted Facebook item data using extraction path ${pathIndex + 1}`,
);
return targetData as FacebookMarketplaceItem;
}
} catch {
// Path not found or invalid, try next path
}
pathIndex++;
}
// Fallback: Search recursively for marketplace data in the parsed structure
const findMarketplaceData = (
obj: unknown,
depth = 0,
maxDepth = 10,
): FacebookMarketplaceItem | null => {
if (depth > maxDepth) return null; // Prevent infinite recursion
if (isRecord(obj)) {
// Check if this object matches the expected marketplace item structure
if (
obj.marketplace_listing_title &&
obj.id &&
obj.__typename === "GroupCommerceProductItem" &&
obj.redacted_description
) {
return obj as FacebookMarketplaceItem;
}
// Recursively search nested objects and arrays
for (const key in obj) {
const value = obj[key];
if (isRecord(value) || Array.isArray(value)) {
const result = findMarketplaceData(value, depth + 1, maxDepth);
if (result) return result;
}
}
} else if (Array.isArray(obj)) {
// Search through arrays
for (const item of obj) {
const result = findMarketplaceData(item, depth + 1, maxDepth);
if (result) return result;
}
}
return null;
};
// Search through the entire require structure
const recursiveResult = findMarketplaceData(parsed.require);
if (recursiveResult) {
console.log(
"Successfully extracted Facebook item data using recursive search",
);
return recursiveResult;
}
// Additional search in other potential locations
if (
parsed.__bbox?.result?.data?.viewer?.marketplace_product_details_page
?.target
) {
const bboxData =
parsed.__bbox.result.data.viewer.marketplace_product_details_page
.target;
if (
bboxData &&
typeof bboxData === "object" &&
bboxData.id &&
bboxData.marketplace_listing_title &&
bboxData.__typename === "GroupCommerceProductItem"
) {
console.log(
"Successfully extracted Facebook item data from __bbox structure",
);
return bboxData as FacebookMarketplaceItem;
}
}
}
} catch (error) {
// Log parsing errors for debugging but continue to next script
console.debug(`Failed to parse script for Facebook item data: ${error}`);
}
}
return null;
}
/** /**
Parse Facebook marketplace search results into ListingDetails[] Parse Facebook marketplace search results into ListingDetails[]
*/ */
@@ -796,8 +406,7 @@ function parseFacebookAds(ads: FacebookAdNode[]): ListingDetails[] {
// - formatted_amount: human-readable price (like "CA$1") // - formatted_amount: human-readable price (like "CA$1")
let cents: number; let cents: number;
if (priceObj.amount != null) { if (priceObj.amount != null) {
const dollars = const dollars = typeof priceObj.amount === 'string'
typeof priceObj.amount === "string"
? Number.parseFloat(priceObj.amount) ? Number.parseFloat(priceObj.amount)
: priceObj.amount; : priceObj.amount;
cents = Math.round(dollars * 100); cents = Math.round(dollars * 100);
@@ -811,7 +420,7 @@ function parseFacebookAds(ads: FacebookAdNode[]): ListingDetails[] {
if (priceObj.formatted_amount) { if (priceObj.formatted_amount) {
const match = priceObj.formatted_amount.match(/[\d,]+\.?\d*/); const match = priceObj.formatted_amount.match(/[\d,]+\.?\d*/);
if (match) { if (match) {
const dollars = Number.parseFloat(match[0].replace(",", "")); const dollars = Number.parseFloat(match[0].replace(',', ''));
if (!Number.isNaN(dollars)) { if (!Number.isNaN(dollars)) {
cents = Math.round(dollars * 100); cents = Math.round(dollars * 100);
} else { } else {
@@ -856,24 +465,19 @@ function parseFacebookAds(ads: FacebookAdNode[]): ListingDetails[] {
// Extract image and video URLs // Extract image and video URLs
const imageUrl = listing.primary_listing_photo?.image?.uri; const imageUrl = listing.primary_listing_photo?.image?.uri;
const videoUrl = listing.listing_video const videoUrl = listing.listing_video ? `https://www.facebook.com/${listing.listing_video.id}/` : undefined;
? `https://www.facebook.com/${listing.listing_video.id}/`
: undefined;
// Extract seller information // Extract seller information
const seller = listing.marketplace_listing_seller const seller = listing.marketplace_listing_seller ? {
? {
name: listing.marketplace_listing_seller.name, name: listing.marketplace_listing_seller.name,
id: listing.marketplace_listing_seller.id, id: listing.marketplace_listing_seller.id
} } : undefined;
: undefined;
const listingDetails: ListingDetails = { const listingDetails: ListingDetails = {
url, url,
title, title,
listingPrice: { listingPrice: {
amountFormatted: amountFormatted: priceObj.formatted_amount || formatCentsToCurrency(cents),
priceObj.formatted_amount || formatCentsToCurrency(cents),
cents, cents,
currency: priceObj.currency || "CAD", // Facebook marketplace often uses CAD currency: priceObj.currency || "CAD", // Facebook marketplace often uses CAD
}, },
@@ -889,121 +493,15 @@ function parseFacebookAds(ads: FacebookAdNode[]): ListingDetails[] {
}; };
results.push(listingDetails); results.push(listingDetails);
} catch {} } catch {
// Skip malformed ads
continue;
}
} }
return results; return results;
} }
/**
Parse Facebook marketplace item details into ListingDetails format
Updated for 2026 GroupCommerceProductItem structure
*/
function parseFacebookItem(
item: FacebookMarketplaceItem,
): ListingDetails | null {
try {
const title = item.marketplace_listing_title || item.custom_title;
if (!title) return null;
const url = `https://www.facebook.com/marketplace/item/${item.id}`;
// Extract price information
let cents = 0;
let currency = "CAD"; // Default
let amountFormatted = item.formatted_price?.text || "FREE";
if (item.listing_price) {
currency = item.listing_price.currency || "CAD";
if (item.listing_price.amount && item.listing_price.amount !== "0.00") {
const amount = Number.parseFloat(item.listing_price.amount);
if (!Number.isNaN(amount)) {
cents = Math.round(amount * 100);
amountFormatted =
item.formatted_price?.text || formatCentsToCurrency(cents);
}
}
}
// Extract description
const description = item.redacted_description?.text;
// Extract location
const address = item.location_text?.text || null;
// Extract seller information
const seller = item.marketplace_listing_seller
? {
name: item.marketplace_listing_seller.name,
id: item.marketplace_listing_seller.id,
}
: undefined;
// Determine listing status
let listingStatus: string | undefined;
if (item.is_sold) {
listingStatus = "SOLD";
} else if (item.is_pending) {
listingStatus = "PENDING";
} else if (item.is_live) {
listingStatus = "ACTIVE";
} else if (item.is_hidden) {
listingStatus = "HIDDEN";
}
// Format creation date
const creationDate = item.creation_time
? new Date(item.creation_time * 1000).toISOString()
: undefined;
// Determine listing type based on category or vehicle data
let listingType = "item";
if (item.vehicle_make_display_name || item.vehicle_odometer_data) {
listingType = "vehicle";
} else if (item.marketplace_listing_category_id) {
// Could map category IDs to types, but keeping simple for now
listingType = "item";
}
const listingDetails: ListingDetails = {
url,
title,
description,
listingPrice: {
amountFormatted,
cents,
currency,
},
address,
creationDate,
listingType,
listingStatus,
categoryId: item.marketplace_listing_category_id,
seller,
deliveryTypes: item.delivery_types,
};
return listingDetails;
} catch (error) {
console.warn(`Failed to parse Facebook item ${item.id}:`, error);
return null;
}
}
// ----------------------------- Exports for Testing -----------------------------
// Export internal functions for comprehensive testing
export {
extractFacebookItemData,
extractFacebookMarketplaceData,
parseFacebookItem,
parseFacebookAds,
formatCentsToCurrency,
loadFacebookCookies,
formatCookiesForHeader,
parseFacebookCookieString,
ensureFacebookCookies,
};
// ----------------------------- Main ----------------------------- // ----------------------------- Main -----------------------------
export default async function fetchFacebookItems( export default async function fetchFacebookItems(
@@ -1012,18 +510,9 @@ export default async function fetchFacebookItems(
LOCATION = "toronto", LOCATION = "toronto",
MAX_ITEMS = 25, MAX_ITEMS = 25,
cookiesSource?: string, cookiesSource?: string,
cookiePath?: string,
) { ) {
// Load Facebook cookies - required for Facebook Marketplace access // Load Facebook cookies - required for Facebook Marketplace access
let cookies: Cookie[]; const cookies = await loadFacebookCookies(cookiesSource);
if (cookiesSource) {
// Use provided cookie source (backward compatibility)
cookies = await loadFacebookCookies(cookiesSource);
} else {
// Auto-load from file or parse from env var
cookies = await ensureFacebookCookies(cookiePath);
}
if (cookies.length === 0) { if (cookies.length === 0) {
throw new Error( throw new Error(
"Facebook cookies are required for marketplace access. " + "Facebook cookies are required for marketplace access. " +
@@ -1057,7 +546,8 @@ export default async function fetchFacebookItems(
onRateInfo: (remaining, reset) => { onRateInfo: (remaining, reset) => {
if (remaining && reset) { if (remaining && reset) {
console.log( console.log(
`\nFacebook - Rate limit remaining: ${remaining}, reset in: ${reset}s`, "\n" +
`Facebook - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
); );
} }
}, },
@@ -1091,7 +581,7 @@ export default async function fetchFacebookItems(
cliProgress.Presets.shades_classic, cliProgress.Presets.shades_classic,
); );
const totalProgress = ads.length; const totalProgress = ads.length;
const currentProgress = 0; let currentProgress = 0;
progressBar.start(totalProgress, currentProgress); progressBar.start(totalProgress, currentProgress);
const items = parseFacebookAds(ads); const items = parseFacebookAds(ads);
@@ -1107,158 +597,3 @@ export default async function fetchFacebookItems(
console.log(`\nParsed ${pricedItems.length} Facebook marketplace listings.`); console.log(`\nParsed ${pricedItems.length} Facebook marketplace listings.`);
return pricedItems.slice(0, MAX_ITEMS); // Limit results return pricedItems.slice(0, MAX_ITEMS); // Limit results
} }
/**
* Fetch individual Facebook marketplace item details with enhanced error handling
*/
export async function fetchFacebookItem(
itemId: string,
cookiesSource?: string,
cookiePath?: string,
): Promise<ListingDetails | null> {
// Load Facebook cookies - required for Facebook Marketplace access
let cookies: Cookie[];
if (cookiesSource) {
// Use provided cookie source (backward compatibility)
cookies = await loadFacebookCookies(cookiesSource);
} else {
// Auto-load from file or parse from env var
cookies = await ensureFacebookCookies(cookiePath);
}
if (cookies.length === 0) {
throw new Error(
"Facebook cookies are required for marketplace access. " +
"Please provide cookies via 'cookies' parameter or create ./cookies/facebook.json file with valid Facebook session cookies.",
);
}
// Format cookies for HTTP header
const domain = "www.facebook.com";
const cookiesHeader = formatCookiesForHeader(cookies, domain);
if (!cookiesHeader) {
throw new Error(
"No valid Facebook cookies found. Please check that cookies are not expired and apply to facebook.com domain.",
);
}
const itemUrl = `https://www.facebook.com/marketplace/item/${itemId}/`;
console.log(`Fetching Facebook marketplace item: ${itemUrl}`);
let itemHtml: string;
try {
itemHtml = await fetchHtml(itemUrl, 1000, {
onRateInfo: (remaining, reset) => {
if (remaining && reset) {
console.log(
`\nFacebook - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
);
}
},
cookies: cookiesHeader,
});
} catch (err) {
if (err instanceof HttpError) {
console.warn(
`\nFacebook marketplace item access failed (${err.status}): ${err.message}`,
);
// Enhanced error handling based on status codes
switch (err.status) {
case 400:
case 401:
case 403:
console.warn(
"Authentication error: Invalid or expired cookies. Please update ./cookies/facebook.json with fresh session cookies.",
);
console.warn(
"Try logging out and back into Facebook, then export fresh cookies.",
);
break;
case 404:
console.warn(
"Listing not found: The marketplace item may have been removed, sold, or the URL is invalid.",
);
break;
case 429:
console.warn(
"Rate limited: Too many requests. Facebook is blocking access temporarily.",
);
break;
case 500:
case 502:
case 503:
console.warn(
"Facebook server error: Marketplace may be temporarily unavailable.",
);
break;
default:
console.warn(`Unexpected error status: ${err.status}`);
}
return null;
}
throw err;
}
const itemData = extractFacebookItemData(itemHtml);
if (!itemData) {
logExtractionMetrics(false, itemId);
// Enhanced checking for specific failure scenarios
if (
itemHtml.includes("This listing is no longer available") ||
itemHtml.includes("listing has been removed") ||
itemHtml.includes("This item has been sold")
) {
console.warn(
`Item ${itemId} appears to be sold or removed from marketplace.`,
);
return null;
}
if (
itemHtml.includes("log in to Facebook") ||
itemHtml.includes("You must log in") ||
itemHtml.includes("authentication required")
) {
console.warn(
`Authentication failed for item ${itemId}. Cookies may be expired.`,
);
return null;
}
console.warn(
`No item data found in Facebook marketplace page for item ${itemId}. This may indicate:`,
);
console.warn(" - The listing was removed or sold");
console.warn(" - Authentication issues");
console.warn(" - Facebook changed their API structure");
console.warn(" - Network or parsing issues");
return null;
}
logExtractionMetrics(true, itemId);
console.log(`Successfully extracted data for item ${itemId}`);
const parsedItem = parseFacebookItem(itemData);
if (!parsedItem) {
console.warn(`Failed to parse item ${itemId}: Invalid data structure`);
return null;
}
// Check for sold/removed status in the parsed data with proper precedence
if (itemData.is_sold) {
console.warn(`Item ${itemId} is marked as sold in the marketplace.`);
// Still return the data but mark it as sold
parsedItem.listingStatus = "SOLD";
} else if (!itemData.is_live) {
console.warn(`Item ${itemId} is not live/active in the marketplace.`);
parsedItem.listingStatus = itemData.is_hidden
? "HIDDEN"
: itemData.is_pending
? "PENDING"
: "INACTIVE";
}
return parsedItem;
}

View File

@@ -1,6 +1,6 @@
import fetchEbayItems from "@/ebay";
import fetchFacebookItems from "@/facebook";
import fetchKijijiItems from "@/kijiji"; import fetchKijijiItems from "@/kijiji";
import fetchFacebookItems from "@/facebook";
import fetchEbayItems from "@/ebay";
const PORT = process.env.PORT || 4005; const PORT = process.env.PORT || 4005;
@@ -26,77 +26,13 @@ const server = Bun.serve({
{ status: 400 }, { status: 400 },
); );
// Parse optional parameters with enhanced defaults const items = await fetchKijijiItems(SEARCH_QUERY, 5);
const location = reqUrl.searchParams.get("location"); if (!items)
const category = reqUrl.searchParams.get("category");
const maxPagesParam = reqUrl.searchParams.get("maxPages");
const maxPages = maxPagesParam ? Number.parseInt(maxPagesParam, 10) : 5; // Default: 5 pages
const sortBy = reqUrl.searchParams.get("sortBy") as
| "relevancy"
| "date"
| "price"
| "distance"
| undefined;
const sortOrder = reqUrl.searchParams.get("sortOrder") as
| "asc"
| "desc"
| undefined;
// Build search options
const locationValue = location
? /^\d+$/.test(location)
? Number(location)
: location
: 1700272;
const categoryValue = category
? /^\d+$/.test(category)
? Number(category)
: category
: 0;
const searchOptions: import("@/kijiji").SearchOptions = {
location: locationValue,
category: categoryValue,
keywords: SEARCH_QUERY,
sortBy: sortBy || "relevancy",
sortOrder: sortOrder || "desc",
maxPages,
};
// Build listing fetch options with enhanced defaults
const listingOptions: import("@/kijiji").ListingFetchOptions = {
includeImages: true, // Always include full image arrays
sellerDataDepth: "detailed", // Default: detailed seller info
includeClientSideData: false, // GraphQL reviews disabled by default
};
try {
const items = await fetchKijijiItems(
SEARCH_QUERY,
1,
undefined,
searchOptions,
listingOptions,
);
if (!items || items.length === 0)
return Response.json( return Response.json(
{ message: "Search didn't return any results!" }, { message: "Search didn't return any results!" },
{ status: 404 }, { status: 404 },
); );
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} catch (error) {
console.error("Kijiji scraping error:", error);
const errorMessage =
error instanceof Error ? error.message : "Unknown error occurred";
return Response.json(
{
message: `Scraping failed: ${errorMessage}`,
query: SEARCH_QUERY,
options: { searchOptions, listingOptions },
},
{ status: 500 },
);
}
}, },
"/api/facebook": async (req: Request) => { "/api/facebook": async (req: Request) => {
@@ -117,14 +53,7 @@ const server = Bun.serve({
const COOKIES_SOURCE = reqUrl.searchParams.get("cookies") || undefined; const COOKIES_SOURCE = reqUrl.searchParams.get("cookies") || undefined;
try { try {
const items = await fetchFacebookItems( const items = await fetchFacebookItems(SEARCH_QUERY, 5, LOCATION, 25, COOKIES_SOURCE);
SEARCH_QUERY,
5,
LOCATION,
25,
COOKIES_SOURCE,
"./cookies/facebook.json",
);
if (!items || items.length === 0) if (!items || items.length === 0)
return Response.json( return Response.json(
{ message: "Search didn't return any results!" }, { message: "Search didn't return any results!" },
@@ -133,9 +62,11 @@ const server = Bun.serve({
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} catch (error) { } catch (error) {
console.error("Facebook scraping error:", error); console.error("Facebook scraping error:", error);
const errorMessage = const errorMessage = error instanceof Error ? error.message : "Unknown error occurred";
error instanceof Error ? error.message : "Unknown error occurred"; return Response.json(
return Response.json({ message: errorMessage }, { status: 400 }); { message: errorMessage },
{ status: 400 },
);
} }
}, },
@@ -154,23 +85,17 @@ const server = Bun.serve({
); );
// Parse optional parameters with defaults // Parse optional parameters with defaults
const minPriceParam = reqUrl.searchParams.get("minPrice"); const minPrice = reqUrl.searchParams.get("minPrice")
const minPrice = minPriceParam ? parseInt(reqUrl.searchParams.get("minPrice")!)
? Number.parseInt(minPriceParam, 10)
: undefined; : undefined;
const maxPriceParam = reqUrl.searchParams.get("maxPrice"); const maxPrice = reqUrl.searchParams.get("maxPrice")
const maxPrice = maxPriceParam ? parseInt(reqUrl.searchParams.get("maxPrice")!)
? Number.parseInt(maxPriceParam, 10)
: undefined; : undefined;
const strictMode = reqUrl.searchParams.get("strictMode") === "true"; const strictMode = reqUrl.searchParams.get("strictMode") === "true";
const exclusionsParam = reqUrl.searchParams.get("exclusions"); const exclusionsParam = reqUrl.searchParams.get("exclusions");
const exclusions = exclusionsParam const exclusions = exclusionsParam ? exclusionsParam.split(",").map(s => s.trim()) : [];
? exclusionsParam.split(",").map((s) => s.trim())
: [];
const keywordsParam = reqUrl.searchParams.get("keywords"); const keywordsParam = reqUrl.searchParams.get("keywords");
const keywords = keywordsParam const keywords = keywordsParam ? keywordsParam.split(",").map(s => s.trim()) : [SEARCH_QUERY];
? keywordsParam.split(",").map((s) => s.trim())
: [SEARCH_QUERY];
try { try {
const items = await fetchEbayItems(SEARCH_QUERY, 5, { const items = await fetchEbayItems(SEARCH_QUERY, 5, {
@@ -188,9 +113,11 @@ const server = Bun.serve({
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} catch (error) { } catch (error) {
console.error("eBay scraping error:", error); console.error("eBay scraping error:", error);
const errorMessage = const errorMessage = error instanceof Error ? error.message : "Unknown error occurred";
error instanceof Error ? error.message : "Unknown error occurred"; return Response.json(
return Response.json({ message: errorMessage }, { status: 400 }); { message: errorMessage },
{ status: 400 },
);
} }
}, },

View File

@@ -1,7 +1,7 @@
import cliProgress from "cli-progress";
/* eslint-disable @typescript-eslint/no-explicit-any */ /* eslint-disable @typescript-eslint/no-explicit-any */
import { parseHTML } from "linkedom"; import { parseHTML } from "linkedom";
import unidecode from "unidecode"; import unidecode from "unidecode";
import cliProgress from "cli-progress";
// const unidecode = require("unidecode"); // const unidecode = require("unidecode");
@@ -26,29 +26,16 @@ interface ApolloListingRoot {
url?: string; url?: string;
title?: string; title?: string;
description?: string; description?: string;
price?: { amount?: number | string; currency?: string; type?: string }; price?: { amount?: number | string; currency?: string };
type?: string; type?: string;
status?: string; status?: string;
activationDate?: string; activationDate?: string;
endDate?: string; endDate?: string;
metrics?: { views?: number | string }; metrics?: { views?: number | string };
location?: { location?: { address?: string | null };
address?: string | null;
id?: number;
name?: string;
coordinates?: { latitude: number; longitude: number };
};
imageUrls?: string[];
imageCount?: number;
categoryId?: number;
adSource?: string;
flags?: { topAd?: boolean; priceDrop?: boolean };
posterInfo?: { posterId?: string; rating?: number };
attributes?: Array<{ canonicalName?: string; canonicalValues?: string[] }>;
[k: string]: unknown; [k: string]: unknown;
} }
// Keep existing interface for backward compatibility
type ListingDetails = { type ListingDetails = {
url: string; url: string;
title: string; title: string;
@@ -66,181 +53,10 @@ type ListingDetails = {
address?: string | null; address?: string | null;
}; };
// New comprehensive interface for detailed listings
interface DetailedListing extends ListingDetails {
images: string[];
categoryId: number;
adSource: string;
flags: {
topAd: boolean;
priceDrop: boolean;
};
attributes: Record<string, string[]>;
location: {
id: number;
name: string;
coordinates?: {
latitude: number;
longitude: number;
};
};
sellerInfo?: {
posterId: string;
rating?: number;
accountType?: string;
memberSince?: string;
reviewCount?: number;
reviewScore?: number;
};
}
// Configuration interfaces
interface SearchOptions {
location?: number | string; // Location ID or name
category?: number | string; // Category ID or name
keywords?: string;
sortBy?: "relevancy" | "date" | "price" | "distance";
sortOrder?: "desc" | "asc";
maxPages?: number; // Default: 5
priceMin?: number;
priceMax?: number;
}
interface ListingFetchOptions {
includeImages?: boolean; // Default: true
sellerDataDepth?: "basic" | "detailed" | "full"; // Default: 'detailed'
includeClientSideData?: boolean; // Default: false
}
// ----------------------------- Constants & Mappings -----------------------------
// Location mappings from KIJIJI.md
const LOCATION_MAPPINGS: Record<string, number> = {
canada: 0,
ontario: 9004,
toronto: 1700273,
gta: 1700272,
oshawa: 1700275,
quebec: 9001,
"nova scotia": 9002,
alberta: 9003,
"new brunswick": 9005,
manitoba: 9006,
"british columbia": 9007,
newfoundland: 9008,
saskatchewan: 9009,
territories: 9010,
pei: 9011,
"prince edward island": 9011,
};
// Category mappings from KIJIJI.md (Buy & Sell main categories)
const CATEGORY_MAPPINGS: Record<string, number> = {
all: 0,
"buy-sell": 10,
"arts-collectibles": 12,
audio: 767,
"baby-items": 253,
"bags-luggage": 931,
bikes: 644,
books: 109,
cameras: 103,
cds: 104,
clothing: 274,
computers: 16,
"computer-accessories": 128,
electronics: 29659001,
"free-stuff": 17220001,
furniture: 235,
"garage-sales": 638,
"health-special-needs": 140,
"hobbies-crafts": 139,
"home-appliances": 107,
"home-indoor": 717,
"home-outdoor": 727,
jewellery: 133,
"musical-instruments": 17,
phones: 132,
"sporting-goods": 111,
tools: 110,
"toys-games": 108,
"tvs-video": 15093001,
"video-games": 141,
other: 26,
};
// Sort parameter mappings
const SORT_MAPPINGS: Record<string, string> = {
relevancy: "MATCH",
date: "DATE",
price: "PRICE",
distance: "DISTANCE",
};
// ----------------------------- Exports for Testing -----------------------------
// Note: These are exported for testing purposes only
export { resolveLocationId, resolveCategoryId, buildSearchUrl };
export { extractApolloState, parseSearch };
export { parseDetailedListing };
export { HttpError, NetworkError, ParseError, RateLimitError, ValidationError };
// ----------------------------- Utilities ----------------------------- // ----------------------------- Utilities -----------------------------
const SEPS = new Set([" ", "", "—", "/", ":", ";", ",", ".", "-"]); const SEPS = new Set([" ", "", "—", "/", ":", ";", ",", ".", "-"]);
/**
* Resolve location ID from name or return numeric ID
*/
function resolveLocationId(location?: number | string): number {
if (typeof location === "number") return location;
if (typeof location === "string") {
const normalized = location.toLowerCase().replace(/\s+/g, "-");
return LOCATION_MAPPINGS[normalized] ?? 0; // Default to Canada (0)
}
return 0; // Default to Canada
}
/**
* Resolve category ID from name or return numeric ID
*/
function resolveCategoryId(category?: number | string): number {
if (typeof category === "number") return category;
if (typeof category === "string") {
const normalized = category.toLowerCase().replace(/\s+/g, "-");
return CATEGORY_MAPPINGS[normalized] ?? 0; // Default to all categories
}
return 0; // Default to all categories
}
/**
* Build search URL with enhanced parameters
*/
function buildSearchUrl(
keywords: string,
options: SearchOptions & { page?: number },
BASE_URL = "https://www.kijiji.ca",
): string {
const locationId = resolveLocationId(options.location);
const categoryId = resolveCategoryId(options.category);
const categorySlug = categoryId === 0 ? "buy-sell" : "buy-sell"; // Could be enhanced
const locationSlug = locationId === 0 ? "canada" : "canada"; // Could be enhanced
let url = `${BASE_URL}/b-${categorySlug}/${locationSlug}/${slugify(keywords)}/k0c${categoryId}l${locationId}`;
const sortParam = options.sortBy
? `&sort=${SORT_MAPPINGS[options.sortBy]}`
: "";
const sortOrder = options.sortOrder === "asc" ? "ASC" : "DESC";
const pageParam =
options.page && options.page > 1 ? `&page=${options.page}` : "";
url += `?sort=relevancyDesc&view=list${sortParam}&order=${sortOrder}${pageParam}`;
return url;
}
/** /**
* Slugifies a string for search * Slugifies a string for search
*/ */
@@ -251,14 +67,13 @@ export function slugify(input: string): string {
for (let i = 0; i < s.length; i++) { for (let i = 0; i < s.length; i++) {
const ch = s[i]; const ch = s[i];
if (!ch) continue; const code = ch!.charCodeAt(0);
const code = ch.charCodeAt(0);
// a-z or 0-9 // a-z or 0-9
if ((code >= 97 && code <= 122) || (code >= 48 && code <= 57)) { if ((code >= 97 && code <= 122) || (code >= 48 && code <= 57)) {
out.push(ch); out.push(ch!);
lastHyphen = false; lastHyphen = false;
} else if (SEPS.has(ch)) { } else if (SEPS.has(ch!)) {
if (!lastHyphen) { if (!lastHyphen) {
out.push("-"); out.push("-");
lastHyphen = true; lastHyphen = true;
@@ -272,7 +87,7 @@ export function slugify(input: string): string {
/** /**
* Turns cents to localized currency string. * Turns cents to localized currency string.
*/ */
export function formatCentsToCurrency( function formatCentsToCurrency(
num: number | string | undefined, num: number | string | undefined,
locale = "en-US", locale = "en-US",
): string { ): string {
@@ -281,24 +96,21 @@ export function formatCentsToCurrency(
if (Number.isNaN(cents)) return ""; if (Number.isNaN(cents)) return "";
const dollars = cents / 100; const dollars = cents / 100;
const formatter = new Intl.NumberFormat(locale, { const formatter = new Intl.NumberFormat(locale, {
style: "currency",
currency: "USD",
minimumFractionDigits: 2, minimumFractionDigits: 2,
maximumFractionDigits: 2, maximumFractionDigits: 2,
useGrouping: true,
}); });
return formatter.format(dollars); return formatter.format(dollars);
} }
function isRecord(value: unknown): value is Record<string, unknown> { function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null && !Array.isArray(value); return typeof value === "object" && value !== null;
} }
async function delay(ms: number): Promise<void> { async function delay(ms: number): Promise<void> {
await new Promise((resolve) => setTimeout(resolve, ms)); await new Promise((resolve) => setTimeout(resolve, ms));
} }
// ----------------------------- Error Classes -----------------------------
class HttpError extends Error { class HttpError extends Error {
constructor( constructor(
message: string, message: string,
@@ -310,52 +122,12 @@ class HttpError extends Error {
} }
} }
class NetworkError extends Error {
constructor(
message: string,
public readonly url: string,
public readonly cause?: Error,
) {
super(message);
this.name = "NetworkError";
}
}
class ParseError extends Error {
constructor(
message: string,
public readonly data?: unknown,
) {
super(message);
this.name = "ParseError";
}
}
class RateLimitError extends Error {
constructor(
message: string,
public readonly url: string,
public readonly resetTime?: number,
) {
super(message);
this.name = "RateLimitError";
}
}
class ValidationError extends Error {
constructor(message: string) {
super(message);
this.name = "ValidationError";
}
}
// ----------------------------- HTTP Client ----------------------------- // ----------------------------- HTTP Client -----------------------------
/** /**
Fetch HTML with enhanced retry strategy and exponential backoff. Fetch HTML with a basic retry strategy and simple rate-limit delay between calls.
- Retries on 429, 5xx, and network errors - Retries on 429 and 5xx
- Respects X-RateLimit-Reset when present (seconds) - Respects X-RateLimit-Reset when present (seconds)
- Exponential backoff with jitter
*/ */
async function fetchHtml( async function fetchHtml(
url: string, url: string,
@@ -367,13 +139,11 @@ async function fetchHtml(
}, },
): Promise<HTMLString> { ): Promise<HTMLString> {
const maxRetries = opts?.maxRetries ?? 3; const maxRetries = opts?.maxRetries ?? 3;
const retryBaseMs = opts?.retryBaseMs ?? 1000; const retryBaseMs = opts?.retryBaseMs ?? 500;
for (let attempt = 0; attempt <= maxRetries; attempt++) { for (let attempt = 0; attempt <= maxRetries; attempt++) {
try { try {
const controller = new AbortController(); // console.log(`Fetching: `, url);
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout
const res = await fetch(url, { const res = await fetch(url, {
method: "GET", method: "GET",
headers: { headers: {
@@ -385,42 +155,27 @@ async function fetchHtml(
"user-agent": "user-agent":
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36",
}, },
signal: controller.signal,
}); });
clearTimeout(timeoutId);
const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining"); const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining");
const rateLimitReset = res.headers.get("X-RateLimit-Reset"); const rateLimitReset = res.headers.get("X-RateLimit-Reset");
opts?.onRateInfo?.(rateLimitRemaining, rateLimitReset); opts?.onRateInfo?.(rateLimitRemaining, rateLimitReset);
if (!res.ok) { if (!res.ok) {
// Handle rate limiting // Respect 429 reset if provided
if (res.status === 429) { if (res.status === 429) {
const resetSeconds = rateLimitReset const resetSeconds = rateLimitReset ? Number(rateLimitReset) : NaN;
? Number(rateLimitReset)
: Number.NaN;
const waitMs = Number.isFinite(resetSeconds) const waitMs = Number.isFinite(resetSeconds)
? Math.max(0, resetSeconds * 1000) ? Math.max(0, resetSeconds * 1000)
: calculateBackoffDelay(attempt, retryBaseMs); : (attempt + 1) * retryBaseMs;
if (attempt < maxRetries) {
await delay(waitMs); await delay(waitMs);
continue; continue;
} }
throw new RateLimitError( // Retry on 5xx
`Rate limit exceeded for ${url}`,
url,
resetSeconds,
);
}
// Retry on server errors
if (res.status >= 500 && res.status < 600 && attempt < maxRetries) { if (res.status >= 500 && res.status < 600 && attempt < maxRetries) {
await delay(calculateBackoffDelay(attempt, retryBaseMs)); await delay((attempt + 1) * retryBaseMs);
continue; continue;
} }
throw new HttpError( throw new HttpError(
`Request failed with status ${res.status}`, `Request failed with status ${res.status}`,
res.status, res.status,
@@ -429,189 +184,16 @@ async function fetchHtml(
} }
const html = await res.text(); const html = await res.text();
// Respect per-request delay to keep at or under REQUESTS_PER_SECOND
// Respect per-request delay to maintain rate limiting
await delay(DELAY_MS); await delay(DELAY_MS);
return html; return html;
} catch (err) { } catch (err) {
// Handle different error types if (attempt >= maxRetries) throw err;
if (err instanceof RateLimitError || err instanceof HttpError) { await delay((attempt + 1) * retryBaseMs);
throw err; // Re-throw known errors
}
if (err instanceof Error && err.name === "AbortError") {
if (attempt < maxRetries) {
await delay(calculateBackoffDelay(attempt, retryBaseMs));
continue;
}
throw new NetworkError(`Request timeout for ${url}`, url, err);
}
// Network or other errors
if (attempt < maxRetries) {
await delay(calculateBackoffDelay(attempt, retryBaseMs));
continue;
}
throw new NetworkError(
`Network error fetching ${url}: ${err instanceof Error ? err.message : String(err)}`,
url,
err instanceof Error ? err : undefined,
);
} }
} }
throw new NetworkError(`Exhausted retries without response for ${url}`, url); throw new Error("Exhausted retries without response");
}
/**
* Calculate exponential backoff delay with jitter
*/
function calculateBackoffDelay(attempt: number, baseMs: number): number {
const exponentialDelay = baseMs * 2 ** attempt;
const jitter = Math.random() * 0.1 * exponentialDelay; // 10% jitter
return Math.min(exponentialDelay + jitter, 30000); // Cap at 30 seconds
}
// ----------------------------- GraphQL Client -----------------------------
/**
* Fetch additional data via GraphQL API
*/
async function fetchGraphQLData(
query: string,
variables: Record<string, unknown>,
BASE_URL = "https://www.kijiji.ca",
): Promise<unknown> {
const endpoint = `${BASE_URL}/anvil/api`;
try {
const response = await fetch(endpoint, {
method: "POST",
headers: {
"Content-Type": "application/json",
"apollo-require-preflight": "true",
},
body: JSON.stringify({
query,
variables,
}),
});
if (!response.ok) {
throw new HttpError(
`GraphQL request failed with status ${response.status}`,
response.status,
endpoint,
);
}
const result = await response.json();
if (result.errors) {
throw new ParseError(
`GraphQL errors: ${JSON.stringify(result.errors)}`,
result.errors,
);
}
return result.data;
} catch (err) {
if (err instanceof HttpError || err instanceof ParseError) {
throw err;
}
throw new NetworkError(
`Failed to fetch GraphQL data: ${err instanceof Error ? err.message : String(err)}`,
endpoint,
err instanceof Error ? err : undefined,
);
}
}
// GraphQL response interfaces
interface GraphQLReviewResponse {
user?: {
reviewSummary?: {
count?: number;
score?: number;
};
};
}
interface GraphQLProfileResponse {
user?: {
memberSince?: string;
accountType?: string;
};
}
// GraphQL queries from KIJIJI.md
const GRAPHQL_QUERIES = {
getReviewSummary: `
query GetReviewSummary($userId: String!) {
user(id: $userId) {
reviewSummary {
count
score
__typename
}
__typename
}
}
`,
getProfileMetrics: `
query GetProfileMetrics($profileId: String!) {
user(id: $profileId) {
memberSince
accountType
__typename
}
}
`,
} as const;
/**
* Fetch additional seller data via GraphQL
*/
async function fetchSellerDetails(
posterId: string,
BASE_URL = "https://www.kijiji.ca",
): Promise<{
reviewCount?: number;
reviewScore?: number;
memberSince?: string;
accountType?: string;
}> {
try {
const [reviewData, profileData] = await Promise.all([
fetchGraphQLData(
GRAPHQL_QUERIES.getReviewSummary,
{ userId: posterId },
BASE_URL,
),
fetchGraphQLData(
GRAPHQL_QUERIES.getProfileMetrics,
{ profileId: posterId },
BASE_URL,
),
]);
const reviewResponse = reviewData as GraphQLReviewResponse;
const profileResponse = profileData as GraphQLProfileResponse;
return {
reviewCount: reviewResponse?.user?.reviewSummary?.count,
reviewScore: reviewResponse?.user?.reviewSummary?.score,
memberSince: profileResponse?.user?.memberSince,
accountType: profileResponse?.user?.accountType,
};
} catch (err) {
// Silently fail for GraphQL errors - not critical for basic functionality
console.warn(
`Failed to fetch seller details for ${posterId}:`,
err instanceof Error ? err.message : String(err),
);
return {};
}
} }
// ----------------------------- Parsing ----------------------------- // ----------------------------- Parsing -----------------------------
@@ -717,8 +299,7 @@ function parseListing(
listingPrice: amountFormatted listingPrice: amountFormatted
? { ? {
amountFormatted, amountFormatted,
cents: cents: Number.isFinite(cents!) ? cents : undefined,
cents !== undefined && Number.isFinite(cents) ? cents : undefined,
currency: price?.currency, currency: price?.currency,
} }
: undefined, : undefined,
@@ -726,212 +307,29 @@ function parseListing(
listingStatus: status, listingStatus: status,
creationDate: activationDate, creationDate: activationDate,
endDate, endDate,
numberOfViews: numberOfViews: Number.isFinite(numberOfViews!) ? numberOfViews : undefined,
numberOfViews !== undefined && Number.isFinite(numberOfViews)
? numberOfViews
: undefined,
address: location?.address ?? null, address: location?.address ?? null,
}; };
} }
/**
* Parse a listing page into a detailed object with all available fields
*/
async function parseDetailedListing(
htmlString: HTMLString,
BASE_URL: string,
options: ListingFetchOptions = {},
): Promise<DetailedListing | null> {
const apolloState = extractApolloState(htmlString);
if (!apolloState) return null;
// Find the listing root key
const listingKey = Object.keys(apolloState).find((k) =>
k.includes("Listing"),
);
if (!listingKey) return null;
const root = apolloState[listingKey];
if (!isRecord(root)) return null;
const {
url,
title,
description,
price,
type,
status,
activationDate,
endDate,
metrics,
location,
imageUrls,
imageCount,
categoryId,
adSource,
flags,
posterInfo,
attributes,
} = root as ApolloListingRoot;
const cents = price?.amount != null ? Number(price.amount) : undefined;
const amountFormatted = formatCentsToCurrency(cents);
const numberOfViews =
metrics?.views != null ? Number(metrics.views) : undefined;
const listingUrl =
typeof url === "string"
? url.startsWith("http")
? url
: `${BASE_URL}${url}`
: "";
if (!listingUrl || !title) return null;
// Only include fixed-price listings
if (!amountFormatted || cents === undefined) return null;
// Extract images if requested
const images =
options.includeImages !== false && Array.isArray(imageUrls)
? imageUrls.filter((url): url is string => typeof url === "string")
: [];
// Extract attributes as key-value pairs
const attributeMap: Record<string, string[]> = {};
if (Array.isArray(attributes)) {
for (const attr of attributes) {
if (attr?.canonicalName && Array.isArray(attr.canonicalValues)) {
attributeMap[attr.canonicalName] = attr.canonicalValues;
}
}
}
// Extract seller info based on depth setting
let sellerInfo: DetailedListing["sellerInfo"];
const depth = options.sellerDataDepth ?? "detailed";
if (posterInfo?.posterId) {
sellerInfo = {
posterId: posterInfo.posterId,
rating:
typeof posterInfo.rating === "number" ? posterInfo.rating : undefined,
};
// Add more detailed info if requested and client-side data is enabled
if (
(depth === "detailed" || depth === "full") &&
options.includeClientSideData
) {
try {
const additionalData = await fetchSellerDetails(
posterInfo.posterId,
BASE_URL,
);
sellerInfo = {
...sellerInfo,
...additionalData,
};
} catch (err) {
// Silently fail - GraphQL data is optional
console.warn(
`Failed to fetch additional seller data for ${posterInfo.posterId}`,
);
}
}
}
return {
url: listingUrl,
title,
description,
listingPrice: {
amountFormatted,
cents,
currency: price?.currency,
},
listingType: type,
listingStatus: status,
creationDate: activationDate,
endDate,
numberOfViews:
numberOfViews !== undefined && Number.isFinite(numberOfViews)
? numberOfViews
: undefined,
address: location?.address ?? null,
images,
categoryId: typeof categoryId === "number" ? categoryId : 0,
adSource: typeof adSource === "string" ? adSource : "UNKNOWN",
flags: {
topAd: flags?.topAd === true,
priceDrop: flags?.priceDrop === true,
},
attributes: attributeMap,
location: {
id: typeof location?.id === "number" ? location.id : 0,
name: typeof location?.name === "string" ? location.name : "Unknown",
coordinates: location?.coordinates
? {
latitude: location.coordinates.latitude,
longitude: location.coordinates.longitude,
}
: undefined,
},
sellerInfo,
};
}
// ----------------------------- Main ----------------------------- // ----------------------------- Main -----------------------------
export default async function fetchKijijiItems( export default async function fetchKijijiItems(
SEARCH_QUERY: string, SEARCH_QUERY: string,
REQUESTS_PER_SECOND = 1, REQUESTS_PER_SECOND = 1,
BASE_URL = "https://www.kijiji.ca", BASE_URL = "https://www.kijiji.ca",
searchOptions: SearchOptions = {},
listingOptions: ListingFetchOptions = {},
) { ) {
const DELAY_MS = Math.max(1, Math.floor(1000 / REQUESTS_PER_SECOND)); const DELAY_MS = Math.max(1, Math.floor(1000 / REQUESTS_PER_SECOND));
// Set defaults for configuration const searchUrl = `${BASE_URL}/b-gta-greater-toronto-area/${slugify(SEARCH_QUERY)}/k0l1700272?sort=relevancyDesc&view=list`;
const finalSearchOptions: Required<SearchOptions> = {
location: searchOptions.location ?? 1700272, // Default to GTA
category: searchOptions.category ?? 0, // Default to all categories
keywords: searchOptions.keywords ?? SEARCH_QUERY,
sortBy: searchOptions.sortBy ?? "relevancy",
sortOrder: searchOptions.sortOrder ?? "desc",
maxPages: searchOptions.maxPages ?? 5, // Default to 5 pages
priceMin: searchOptions.priceMin,
priceMax: searchOptions.priceMax,
};
const finalListingOptions: Required<ListingFetchOptions> = { console.log(`Fetching search: ${searchUrl}`);
includeImages: listingOptions.includeImages ?? true,
sellerDataDepth: listingOptions.sellerDataDepth ?? "detailed",
includeClientSideData: listingOptions.includeClientSideData ?? false,
};
const allListings: DetailedListing[] = [];
const seenUrls = new Set<string>();
// Fetch multiple pages
for (let page = 1; page <= finalSearchOptions.maxPages; page++) {
const searchUrl = buildSearchUrl(
finalSearchOptions.keywords,
{
...finalSearchOptions,
// Add page parameter for pagination
...(page > 1 && { page }),
},
BASE_URL,
);
console.log(`Fetching search page ${page}: ${searchUrl}`);
const searchHtml = await fetchHtml(searchUrl, DELAY_MS, { const searchHtml = await fetchHtml(searchUrl, DELAY_MS, {
onRateInfo: (remaining, reset) => { onRateInfo: (remaining, reset) => {
if (remaining && reset) { if (remaining && reset) {
console.log( console.log(
`\nSearch - Rate limit remaining: ${remaining}, reset in: ${reset}s`, "\n" +
`Search - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
); );
} }
}, },
@@ -939,61 +337,53 @@ export default async function fetchKijijiItems(
const searchResults = parseSearch(searchHtml, BASE_URL); const searchResults = parseSearch(searchHtml, BASE_URL);
if (searchResults.length === 0) { if (searchResults.length === 0) {
console.log( console.warn("No search results parsed from page.");
`No more results found on page ${page}. Stopping pagination.`, return;
);
break;
} }
// Deduplicate links across pages // Deduplicate links
const newListingLinks = searchResults const listingLinks = Array.from(
.map((r) => r.listingLink) new Set(searchResults.map((r) => r.listingLink)),
.filter((link) => !seenUrls.has(link)); );
for (const link of newListingLinks) { console.log(
seenUrls.add(link); "\n" + `Found ${listingLinks.length} listing links. Fetching details...`,
}
console.log(
`\nFound ${newListingLinks.length} new listing links on page ${page}. Total unique: ${seenUrls.size}`,
); );
// Fetch details for this page's listings
const progressBar = new cliProgress.SingleBar( const progressBar = new cliProgress.SingleBar(
{}, {},
cliProgress.Presets.shades_classic, cliProgress.Presets.shades_classic,
); );
const totalProgress = newListingLinks.length; const totalProgress = listingLinks.length;
let currentProgress = 0; let currentProgress = 0;
progressBar.start(totalProgress, currentProgress); progressBar.start(totalProgress, currentProgress);
for (const link of newListingLinks) { const items: ListingDetails[] = [];
for (const link of listingLinks) {
try { try {
const html = await fetchHtml(link, DELAY_MS, { const html = await fetchHtml(link, DELAY_MS, {
onRateInfo: (remaining, reset) => { onRateInfo: (remaining, reset) => {
if (remaining && reset) { if (remaining && reset) {
console.log( console.log(
`\nItem - Rate limit remaining: ${remaining}, reset in: ${reset}s`, "\n" +
`Item - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
); );
} }
}, },
}); });
const parsed = await parseDetailedListing( const parsed = parseListing(html, BASE_URL);
html,
BASE_URL,
finalListingOptions,
);
if (parsed) { if (parsed) {
allListings.push(parsed); if (parsed.listingPrice?.cents) items.push(parsed);
} }
} catch (err) { } catch (err) {
if (err instanceof HttpError) { if (err instanceof HttpError) {
console.error( console.error(
`\nFailed to fetch ${link}\n - ${err.status} ${err.message}`, "\n" + `Failed to fetch ${link}\n - ${err.status} ${err.message}`,
); );
} else { } else {
console.error( console.error(
`\nFailed to fetch ${link}\n - ${String((err as Error)?.message || err)}`, "\n" +
`Failed to fetch ${link}\n - ${String((err as Error)?.message || err)}`,
); );
} }
} finally { } finally {
@@ -1002,14 +392,6 @@ export default async function fetchKijijiItems(
} }
} }
progressBar.stop(); console.log("\n" + `Parsed ${items.length} listings.`);
return items;
// If we got fewer results than expected (40 per page), we've reached the end
if (searchResults.length < 40) {
break;
}
}
console.log(`\nParsed ${allListings.length} detailed listings.`);
return allListings;
} }

View File

@@ -1,834 +0,0 @@
import { afterEach, beforeEach, describe, expect, mock, test } from "bun:test";
import {
extractFacebookItemData,
extractFacebookMarketplaceData,
fetchFacebookItem,
formatCentsToCurrency,
formatCookiesForHeader,
loadFacebookCookies,
parseFacebookAds,
parseFacebookCookieString,
parseFacebookItem,
} from "../src/facebook";
// Mock fetch globally
const originalFetch = global.fetch;
describe("Facebook Marketplace Scraper Core Tests", () => {
beforeEach(() => {
global.fetch = mock(() => {
throw new Error("fetch should be mocked in individual tests");
});
});
afterEach(() => {
global.fetch = originalFetch;
});
describe("Cookie Parsing", () => {
describe("parseFacebookCookieString", () => {
test("should parse valid cookie string", () => {
const cookieString = "c_user=123456789; xs=abcdef123456; fr=xyz789";
const result = parseFacebookCookieString(cookieString);
expect(result).toHaveLength(3);
expect(result[0]).toEqual({
name: "c_user",
value: "123456789",
domain: ".facebook.com",
path: "/",
secure: true,
httpOnly: false,
sameSite: "lax",
expirationDate: undefined,
});
expect(result[1]).toEqual({
name: "xs",
value: "abcdef123456",
domain: ".facebook.com",
path: "/",
secure: true,
httpOnly: false,
sameSite: "lax",
expirationDate: undefined,
});
});
test("should handle URL-encoded values", () => {
const cookieString = "c_user=123%2B456; xs=abc%3Ddef";
const result = parseFacebookCookieString(cookieString);
expect(result[0].value).toBe("123+456");
expect(result[1].value).toBe("abc=def");
});
test("should filter out malformed cookies", () => {
const cookieString = "c_user=123; invalid; xs=abc; =empty";
const result = parseFacebookCookieString(cookieString);
expect(result).toHaveLength(2);
expect(result.map((c) => c.name)).toEqual(["c_user", "xs"]);
});
test("should handle empty input", () => {
expect(parseFacebookCookieString("")).toEqual([]);
expect(parseFacebookCookieString(" ")).toEqual([]);
});
test("should handle extra whitespace", () => {
const cookieString = " c_user = 123 ; xs=abc ";
const result = parseFacebookCookieString(cookieString);
expect(result).toHaveLength(2);
expect(result[0].name).toBe("c_user");
expect(result[0].value).toBe("123");
expect(result[1].name).toBe("xs");
expect(result[1].value).toBe("abc");
});
});
});
describe("Facebook Item Fetching", () => {
describe("fetchFacebookItem", () => {
const mockCookies = JSON.stringify([
{ name: "c_user", value: "12345", domain: ".facebook.com" },
{ name: "xs", value: "abc123", domain: ".facebook.com" },
]);
test("should handle authentication errors", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: false,
status: 401,
text: () => Promise.resolve("Authentication required"),
headers: {
get: () => null,
},
}),
);
const result = await fetchFacebookItem("123", mockCookies);
expect(result).toBeNull();
});
test("should handle item not found", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: false,
status: 404,
text: () => Promise.resolve("Not found"),
headers: {
get: () => null,
},
}),
);
const result = await fetchFacebookItem("nonexistent", mockCookies);
expect(result).toBeNull();
});
test("should handle rate limiting", async () => {
let attempts = 0;
global.fetch = mock(() => {
attempts++;
if (attempts === 1) {
return Promise.resolve({
ok: false,
status: 429,
headers: {
get: (header: string) => {
if (header === "X-RateLimit-Reset") return "1";
return null;
},
},
text: () => Promise.resolve("Rate limited"),
});
}
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
viewer: {
marketplace_product_details_page: {
target: {
id: "123",
__typename: "GroupCommerceProductItem",
marketplace_listing_title: "Test Item",
is_live: true,
},
},
},
},
},
},
},
],
],
};
return Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockData)}</script></body></html>`,
),
headers: {
get: () => null,
},
});
});
const result = await fetchFacebookItem("123", mockCookies);
expect(attempts).toBe(2);
// Should eventually succeed after retry
});
test("should handle sold items", async () => {
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
viewer: {
marketplace_product_details_page: {
target: {
id: "456",
__typename: "GroupCommerceProductItem",
marketplace_listing_title: "Sold Item",
is_sold: true,
is_live: false,
},
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const result = await fetchFacebookItem("456", mockCookies);
expect(result?.listingStatus).toBe("SOLD");
});
test("should handle missing authentication cookies", async () => {
// Use a test-specific cookie file that doesn't exist
const testCookiePath = "./cookies/facebook-test.json";
// Test with no cookies available (test file doesn't exist)
await expect(
fetchFacebookItem("123", undefined, testCookiePath),
).rejects.toThrow("No valid Facebook cookies found");
});
test("should handle successful item extraction", async () => {
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
viewer: {
marketplace_product_details_page: {
target: {
id: "789",
__typename: "GroupCommerceProductItem",
marketplace_listing_title: "Working Item",
formatted_price: { text: "$299.00" },
listing_price: {
amount: "299.00",
currency: "CAD",
},
is_live: true,
creation_time: 1640995200,
},
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const result = await fetchFacebookItem("789", mockCookies);
expect(result).not.toBeNull();
expect(result?.title).toBe("Working Item");
expect(result?.listingPrice?.amountFormatted).toBe("$299.00");
expect(result?.listingStatus).toBe("ACTIVE");
});
test("should handle server errors", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: false,
status: 500,
text: () => Promise.resolve("Internal Server Error"),
headers: {
get: () => null,
},
}),
);
const result = await fetchFacebookItem("error", mockCookies);
expect(result).toBeNull();
});
});
});
describe("Data Extraction", () => {
describe("extractFacebookItemData", () => {
test("should extract item data from standard require structure", () => {
const mockItemData = {
id: "123456",
__typename: "GroupCommerceProductItem",
marketplace_listing_title: "Test Item",
formatted_price: { text: "$100.00" },
listing_price: { amount: "100.00", currency: "CAD" },
is_live: true,
};
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
viewer: {
marketplace_product_details_page: {
target: mockItemData,
},
},
},
},
},
},
],
],
};
const html = `<html><body><script>${JSON.stringify(mockData)}</script></body></html>`;
const result = extractFacebookItemData(html);
expect(result).not.toBeNull();
expect(result?.id).toBe("123456");
expect(result?.marketplace_listing_title).toBe("Test Item");
});
test("should handle missing item data", () => {
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
viewer: {
marketplace_product_details_page: {},
},
},
},
},
},
],
],
};
const html = `<html><body><script>${JSON.stringify(mockData)}</script></body></html>`;
const result = extractFacebookItemData(html);
expect(result).toBeNull();
});
test("should handle malformed HTML", () => {
const result = extractFacebookItemData(
"<html><body>Invalid HTML</body></html>",
);
expect(result).toBeNull();
});
test("should handle invalid JSON in script tags", () => {
const html =
"<html><body><script>{invalid: json}</script></body></html>";
const result = extractFacebookItemData(html);
expect(result).toBeNull();
});
test("should extract item with vehicle data", () => {
const mockVehicleItem = {
id: "789",
__typename: "GroupCommerceProductItem",
marketplace_listing_title: "2006 Honda Civic",
formatted_price: { text: "$5,000" },
listing_price: { amount: "5000.00", currency: "CAD" },
vehicle_make_display_name: "Honda",
vehicle_model_display_name: "Civic",
vehicle_odometer_data: { unit: "KILOMETERS", value: 150000 },
vehicle_transmission_type: "AUTOMATIC",
is_live: true,
};
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
viewer: {
marketplace_product_details_page: {
target: mockVehicleItem,
},
},
},
},
},
},
],
],
};
const html = `<html><body><script>${JSON.stringify(mockData)}</script></body></html>`;
const result = extractFacebookItemData(html);
expect(result).not.toBeNull();
expect(result?.vehicle_make_display_name).toBe("Honda");
expect(result?.vehicle_odometer_data?.value).toBe(150000);
});
});
describe("extractFacebookMarketplaceData", () => {
test("should extract search results from marketplace data", () => {
const mockMarketplaceData = {
feed_units: {
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "Item 1",
listing_price: { amount: "10.00", currency: "CAD" },
},
},
},
{
node: {
listing: {
id: "2",
marketplace_listing_title: "Item 2",
listing_price: { amount: "20.00", currency: "CAD" },
},
},
},
],
},
};
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: mockMarketplaceData,
},
},
},
},
],
],
};
const html = `<html><body><script>${JSON.stringify(mockData)}</script></body></html>`;
const result = extractFacebookMarketplaceData(html);
expect(result).not.toBeNull();
expect(result).toHaveLength(2);
expect(result?.[0].node.listing.marketplace_listing_title).toBe(
"Item 1",
);
});
test("should handle empty search results", () => {
const mockData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: { edges: [] },
},
},
},
},
},
],
],
};
const html = `<html><body><script>${JSON.stringify(mockData)}</script></body></html>`;
const result = extractFacebookMarketplaceData(html);
expect(result).toBeNull();
});
});
});
describe("Data Parsing", () => {
describe("parseFacebookItem", () => {
test("should parse complete item with all fields", () => {
const item = {
id: "123456",
__typename: "GroupCommerceProductItem" as const,
marketplace_listing_title: "iPhone 13 Pro",
redacted_description: { text: "Excellent condition" },
formatted_price: { text: "$800.00" },
listing_price: { amount: "800.00", currency: "CAD" },
location_text: { text: "Toronto, ON" },
is_live: true,
creation_time: 1640995200,
marketplace_listing_seller: {
id: "seller1",
name: "John Doe",
},
delivery_types: ["IN_PERSON"],
};
const result = parseFacebookItem(item);
expect(result).not.toBeNull();
expect(result?.title).toBe("iPhone 13 Pro");
expect(result?.description).toBe("Excellent condition");
expect(result?.listingPrice?.amountFormatted).toBe("$800.00");
expect(result?.listingPrice?.cents).toBe(80000);
expect(result?.listingPrice?.currency).toBe("CAD");
expect(result?.address).toBe("Toronto, ON");
expect(result?.listingStatus).toBe("ACTIVE");
expect(result?.seller?.name).toBe("John Doe");
expect(result?.deliveryTypes).toEqual(["IN_PERSON"]);
});
test("should parse FREE items", () => {
const item = {
id: "789",
__typename: "GroupCommerceProductItem" as const,
marketplace_listing_title: "Free Sofa",
formatted_price: { text: "FREE" },
listing_price: { amount: "0.00", currency: "CAD" },
is_live: true,
};
const result = parseFacebookItem(item);
expect(result).not.toBeNull();
expect(result?.title).toBe("Free Sofa");
expect(result?.listingPrice?.amountFormatted).toBe("FREE");
expect(result?.listingPrice?.cents).toBe(0);
});
test("should handle missing optional fields", () => {
const item = {
id: "456",
__typename: "GroupCommerceProductItem" as const,
marketplace_listing_title: "Minimal Item",
};
const result = parseFacebookItem(item);
expect(result).not.toBeNull();
expect(result?.title).toBe("Minimal Item");
expect(result?.description).toBeUndefined();
expect(result?.seller).toBeUndefined();
});
test("should identify vehicle listings", () => {
const vehicleItem = {
id: "999",
__typename: "GroupCommerceProductItem" as const,
marketplace_listing_title: "2012 Mazda 3",
formatted_price: { text: "$8,000" },
listing_price: { amount: "8000.00", currency: "CAD" },
vehicle_make_display_name: "Mazda",
vehicle_model_display_name: "3",
is_live: true,
};
const result = parseFacebookItem(vehicleItem);
expect(result?.listingType).toBe("vehicle");
});
test("should handle different listing statuses", () => {
const soldItem = {
id: "111",
__typename: "GroupCommerceProductItem" as const,
marketplace_listing_title: "Sold Item",
is_sold: true,
is_live: false,
};
const pendingItem = {
id: "222",
__typename: "GroupCommerceProductItem" as const,
marketplace_listing_title: "Pending Item",
is_pending: true,
is_live: true,
};
const hiddenItem = {
id: "333",
__typename: "GroupCommerceProductItem" as const,
marketplace_listing_title: "Hidden Item",
is_hidden: true,
is_live: false,
};
expect(parseFacebookItem(soldItem)?.listingStatus).toBe("SOLD");
expect(parseFacebookItem(pendingItem)?.listingStatus).toBe("PENDING");
expect(parseFacebookItem(hiddenItem)?.listingStatus).toBe("HIDDEN");
});
test("should return null for items without title", () => {
const invalidItem = {
id: "invalid",
__typename: "GroupCommerceProductItem" as const,
is_live: true,
};
const result = parseFacebookItem(invalidItem);
expect(result).toBeNull();
});
});
describe("parseFacebookAds", () => {
test("should parse search result ads", () => {
const ads = [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "Ad 1",
listing_price: {
amount: "50.00",
formatted_amount: "$50.00",
currency: "CAD",
},
location: {
reverse_geocode: { city_page: { display_name: "Toronto" } },
},
creation_time: 1640995200,
is_live: true,
},
},
},
{
node: {
listing: {
id: "2",
marketplace_listing_title: "Ad 2",
listing_price: {
amount: "75.00",
formatted_amount: "$75.00",
currency: "CAD",
},
location: {
reverse_geocode: { city_page: { display_name: "Ottawa" } },
},
creation_time: 1640995300,
is_live: true,
},
},
},
];
const results = parseFacebookAds(ads);
expect(results).toHaveLength(2);
expect(results[0].title).toBe("Ad 1");
expect(results[0].listingPrice?.cents).toBe(5000);
expect(results[0].address).toBe("Toronto");
expect(results[1].title).toBe("Ad 2");
expect(results[1].address).toBe("Ottawa");
});
test("should filter out ads without price", () => {
const ads = [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "With Price",
listing_price: {
amount: "100.00",
formatted_amount: "$100.00",
currency: "CAD",
},
is_live: true,
},
},
},
{
node: {
listing: {
id: "2",
marketplace_listing_title: "No Price",
is_live: true,
},
},
},
];
const results = parseFacebookAds(ads);
expect(results).toHaveLength(1);
expect(results[0].title).toBe("With Price");
});
test("should handle malformed ads gracefully", () => {
const ads = [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "Valid Ad",
listing_price: {
amount: "50.00",
formatted_amount: "$50.00",
currency: "CAD",
},
is_live: true,
},
},
},
{
node: {
// Missing listing
},
} as { node: { listing?: unknown } },
];
const results = parseFacebookAds(ads);
expect(results).toHaveLength(1);
expect(results[0].title).toBe("Valid Ad");
});
});
});
describe("Utility Functions", () => {
describe("formatCentsToCurrency", () => {
test("should format cents to currency string", () => {
expect(formatCentsToCurrency(100)).toBe("$1.00");
expect(formatCentsToCurrency(1000)).toBe("$10.00");
expect(formatCentsToCurrency(9999)).toBe("$99.99");
expect(formatCentsToCurrency(123456)).toBe("$1,234.56");
});
test("should handle string inputs", () => {
expect(formatCentsToCurrency("100")).toBe("$1.00");
expect(formatCentsToCurrency("1000")).toBe("$10.00");
});
test("should handle zero", () => {
expect(formatCentsToCurrency(0)).toBe("$0.00");
});
test("should handle null and undefined", () => {
expect(formatCentsToCurrency(null)).toBe("");
expect(formatCentsToCurrency(undefined)).toBe("");
});
test("should handle invalid inputs", () => {
expect(formatCentsToCurrency("invalid")).toBe("");
expect(formatCentsToCurrency(Number.NaN)).toBe("");
});
});
describe("formatCookiesForHeader", () => {
const mockCookies = [
{ name: "c_user", value: "123456", domain: ".facebook.com", path: "/" },
{ name: "xs", value: "abcdef", domain: ".facebook.com", path: "/" },
{ name: "session_id", value: "xyz", domain: "other.com", path: "/" },
];
test("should format cookies for header string", () => {
const result = formatCookiesForHeader(mockCookies, "www.facebook.com");
expect(result).toBe("c_user=123456; xs=abcdef");
});
test("should filter expired cookies", () => {
const cookiesWithExpiration = [
...mockCookies,
{
name: "expired",
value: "old",
domain: ".facebook.com",
path: "/",
expirationDate: Date.now() / 1000 - 1000,
},
];
const result = formatCookiesForHeader(
cookiesWithExpiration,
"www.facebook.com",
);
expect(result).not.toContain("expired");
});
test("should handle no matching cookies", () => {
const result = formatCookiesForHeader(mockCookies, "www.google.com");
expect(result).toBe("");
});
test("should handle empty cookie array", () => {
const result = formatCookiesForHeader([], "www.facebook.com");
expect(result).toBe("");
});
});
});
});

View File

@@ -1,712 +0,0 @@
import { afterEach, beforeEach, describe, expect, mock, test } from "bun:test";
import fetchFacebookItems, { fetchFacebookItem } from "../src/facebook";
// Mock fetch globally
const originalFetch = global.fetch;
describe("Facebook Marketplace Scraper Integration Tests", () => {
beforeEach(() => {
global.fetch = mock(() => {
throw new Error("fetch should be mocked in individual tests");
});
});
afterEach(() => {
global.fetch = originalFetch;
});
describe("Main Search Function", () => {
const mockCookies = JSON.stringify([
{ name: "c_user", value: "12345", domain: ".facebook.com", path: "/" },
{ name: "xs", value: "abc123", domain: ".facebook.com", path: "/" },
]);
test("should successfully fetch search results", async () => {
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "iPhone 13 Pro",
listing_price: {
amount: "800.00",
formatted_amount: "$800.00",
currency: "CAD",
},
location: {
reverse_geocode: {
city_page: { display_name: "Toronto" },
},
},
creation_time: 1640995200,
is_live: true,
},
},
},
{
node: {
listing: {
id: "2",
marketplace_listing_title: "Samsung Galaxy",
listing_price: {
amount: "600.00",
formatted_amount: "$600.00",
currency: "CAD",
},
location: {
reverse_geocode: {
city_page: { display_name: "Mississauga" },
},
},
creation_time: 1640995300,
is_live: true,
},
},
},
],
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"iPhone",
1,
"toronto",
25,
mockCookies,
);
expect(results).toHaveLength(2);
expect(results[0].title).toBe("iPhone 13 Pro");
expect(results[1].title).toBe("Samsung Galaxy");
});
test("should filter out items without price", async () => {
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "With Price",
listing_price: {
amount: "100.00",
formatted_amount: "$100.00",
currency: "CAD",
},
is_live: true,
},
},
},
{
node: {
listing: {
id: "2",
marketplace_listing_title: "No Price",
is_live: true,
},
},
},
],
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"test",
1,
"toronto",
25,
mockCookies,
);
expect(results).toHaveLength(1);
expect(results[0].title).toBe("With Price");
});
test("should respect MAX_ITEMS parameter", async () => {
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: Array.from({ length: 10 }, (_, i) => ({
node: {
listing: {
id: String(i),
marketplace_listing_title: `Item ${i}`,
listing_price: {
amount: `${(i + 1) * 10}.00`,
formatted_amount: `$${(i + 1) * 10}.00`,
currency: "CAD",
},
is_live: true,
},
},
})),
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"test",
1,
"toronto",
5,
mockCookies,
);
expect(results).toHaveLength(5);
});
test("should return empty array for no results", async () => {
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: [],
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"nonexistent query",
1,
"toronto",
25,
mockCookies,
);
expect(results).toEqual([]);
});
test("should handle authentication errors gracefully", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: false,
status: 401,
text: () => Promise.resolve("Unauthorized"),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"test",
1,
"toronto",
25,
mockCookies,
);
expect(results).toEqual([]);
});
test("should handle network errors", async () => {
global.fetch = mock(() => Promise.reject(new Error("Network error")));
await expect(
fetchFacebookItems("test", 1, "toronto", 25, mockCookies),
).rejects.toThrow("Network error");
});
test("should handle rate limiting with retry", async () => {
let attempts = 0;
global.fetch = mock(() => {
attempts++;
if (attempts === 1) {
return Promise.resolve({
ok: false,
status: 429,
headers: {
get: (header: string) => {
if (header === "X-RateLimit-Reset") return "1";
return null;
},
},
text: () => Promise.resolve("Rate limited"),
});
}
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "Item 1",
listing_price: {
amount: "100.00",
formatted_amount: "$100.00",
currency: "CAD",
},
is_live: true,
},
},
},
],
},
},
},
},
},
},
],
],
};
return Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
});
});
const results = await fetchFacebookItems(
"test",
1,
"toronto",
25,
mockCookies,
);
expect(attempts).toBe(2);
expect(results).toHaveLength(1);
});
});
describe("Vehicle Listing Integration", () => {
const mockCookies = JSON.stringify([
{ name: "c_user", value: "12345", domain: ".facebook.com", path: "/" },
{ name: "xs", value: "abc123", domain: ".facebook.com", path: "/" },
]);
test("should correctly identify and parse vehicle listings", async () => {
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "2006 Honda Civic",
listing_price: {
amount: "8000.00",
formatted_amount: "$8,000.00",
currency: "CAD",
},
is_live: true,
},
},
},
{
node: {
listing: {
id: "2",
marketplace_listing_title: "iPhone 13",
listing_price: {
amount: "800.00",
formatted_amount: "$800.00",
currency: "CAD",
},
is_live: true,
},
},
},
],
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"cars",
1,
"toronto",
25,
mockCookies,
);
expect(results).toHaveLength(2);
// Both should be classified as "item" type in search results (vehicle detection is for item details)
expect(results[0].title).toBe("2006 Honda Civic");
expect(results[1].title).toBe("iPhone 13");
});
});
describe("Different Categories", () => {
const mockCookies = JSON.stringify([
{ name: "c_user", value: "12345", domain: ".facebook.com", path: "/" },
{ name: "xs", value: "abc123", domain: ".facebook.com", path: "/" },
]);
test("should handle electronics listings", async () => {
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "Nintendo Switch",
listing_price: {
amount: "250.00",
formatted_amount: "$250.00",
currency: "CAD",
},
location: {
reverse_geocode: {
city_page: { display_name: "Toronto" },
},
},
marketplace_listing_category_id:
"479353692612078",
condition: "USED",
is_live: true,
},
},
},
],
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"nintendo switch",
1,
"toronto",
25,
mockCookies,
);
expect(results).toHaveLength(1);
expect(results[0].title).toBe("Nintendo Switch");
expect(results[0].categoryId).toBe("479353692612078");
});
test("should handle home goods/furniture listings", async () => {
const mockSearchData = {
require: [
[
null,
null,
null,
{
__bbox: {
result: {
data: {
marketplace_search: {
feed_units: {
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "Dining Table",
listing_price: {
amount: "150.00",
formatted_amount: "$150.00",
currency: "CAD",
},
location: {
reverse_geocode: {
city_page: { display_name: "Mississauga" },
},
},
marketplace_listing_category_id:
"1569171756675761",
condition: "USED",
is_live: true,
},
},
},
],
},
},
},
},
},
},
],
],
};
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
`<html><body><script>${JSON.stringify(mockSearchData)}</script></body></html>`,
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"table",
1,
"toronto",
25,
mockCookies,
);
expect(results).toHaveLength(1);
expect(results[0].title).toBe("Dining Table");
expect(results[0].categoryId).toBe("1569171756675761");
});
});
describe("Error Scenarios", () => {
const mockCookies = JSON.stringify([
{ name: "c_user", value: "12345", domain: ".facebook.com", path: "/" },
{ name: "xs", value: "abc123", domain: ".facebook.com", path: "/" },
]);
test("should handle malformed HTML responses", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(
"<html><body>Invalid HTML without JSON data</body></html>",
),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"test",
1,
"toronto",
25,
mockCookies,
);
expect(results).toEqual([]);
});
test("should handle 404 errors gracefully", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: false,
status: 404,
text: () => Promise.resolve("Not found"),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"test",
1,
"toronto",
25,
mockCookies,
);
expect(results).toEqual([]);
});
test("should handle 500 errors gracefully", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: false,
status: 500,
text: () => Promise.resolve("Internal Server Error"),
headers: {
get: () => null,
},
}),
);
const results = await fetchFacebookItems(
"test",
1,
"toronto",
25,
mockCookies,
);
expect(results).toEqual([]);
});
});
});

View File

@@ -1,166 +0,0 @@
import { describe, expect, test } from "bun:test";
import {
HttpError,
NetworkError,
ParseError,
RateLimitError,
ValidationError,
buildSearchUrl,
resolveCategoryId,
resolveLocationId,
} from "../src/kijiji";
describe("Location and Category Resolution", () => {
describe("resolveLocationId", () => {
test("should return numeric IDs as-is", () => {
expect(resolveLocationId(1700272)).toBe(1700272);
expect(resolveLocationId(0)).toBe(0);
});
test("should resolve string location names", () => {
expect(resolveLocationId("canada")).toBe(0);
expect(resolveLocationId("ontario")).toBe(9004);
expect(resolveLocationId("toronto")).toBe(1700273);
expect(resolveLocationId("gta")).toBe(1700272);
});
test("should handle case insensitive matching", () => {
expect(resolveLocationId("Canada")).toBe(0);
expect(resolveLocationId("ONTARIO")).toBe(9004);
});
test("should default to Canada for unknown locations", () => {
expect(resolveLocationId("unknown")).toBe(0);
expect(resolveLocationId("")).toBe(0);
});
test("should handle undefined input", () => {
expect(resolveLocationId(undefined)).toBe(0);
});
});
describe("resolveCategoryId", () => {
test("should return numeric IDs as-is", () => {
expect(resolveCategoryId(132)).toBe(132);
expect(resolveCategoryId(0)).toBe(0);
});
test("should resolve string category names", () => {
expect(resolveCategoryId("all")).toBe(0);
expect(resolveCategoryId("phones")).toBe(132);
expect(resolveCategoryId("electronics")).toBe(29659001);
expect(resolveCategoryId("buy-sell")).toBe(10);
});
test("should handle case insensitive matching", () => {
expect(resolveCategoryId("All")).toBe(0);
expect(resolveCategoryId("PHONES")).toBe(132);
});
test("should default to all categories for unknown categories", () => {
expect(resolveCategoryId("unknown")).toBe(0);
expect(resolveCategoryId("")).toBe(0);
});
test("should handle undefined input", () => {
expect(resolveCategoryId(undefined)).toBe(0);
});
});
});
describe("URL Construction", () => {
describe("buildSearchUrl", () => {
test("should build basic search URL", () => {
const url = buildSearchUrl("iphone", {
location: 1700272,
category: 132,
sortBy: "relevancy",
sortOrder: "desc",
});
expect(url).toContain("b-buy-sell/canada/iphone/k0c132l1700272");
expect(url).toContain("sort=relevancyDesc");
expect(url).toContain("order=DESC");
});
test("should handle pagination", () => {
const url = buildSearchUrl("iphone", {
location: 1700272,
category: 132,
page: 2,
});
expect(url).toContain("&page=2");
});
test("should handle different sort options", () => {
const dateUrl = buildSearchUrl("iphone", {
sortBy: "date",
sortOrder: "asc",
});
expect(dateUrl).toContain("sort=DATE");
expect(dateUrl).toContain("order=ASC");
const priceUrl = buildSearchUrl("iphone", {
sortBy: "price",
sortOrder: "desc",
});
expect(priceUrl).toContain("sort=PRICE");
expect(priceUrl).toContain("order=DESC");
});
test("should handle string location/category inputs", () => {
const url = buildSearchUrl("iphone", {
location: "toronto",
category: "phones",
});
expect(url).toContain("k0c132l1700273"); // phones + toronto
});
});
});
describe("Error Classes", () => {
test("HttpError should store status and URL", () => {
const error = new HttpError("Not found", 404, "https://example.com");
expect(error.message).toBe("Not found");
expect(error.status).toBe(404);
expect(error.url).toBe("https://example.com");
expect(error.name).toBe("HttpError");
});
test("NetworkError should store URL and cause", () => {
const cause = new Error("Connection failed");
const error = new NetworkError(
"Network error",
"https://example.com",
cause,
);
expect(error.message).toBe("Network error");
expect(error.url).toBe("https://example.com");
expect(error.cause).toBe(cause);
expect(error.name).toBe("NetworkError");
});
test("ParseError should store data", () => {
const data = { invalid: "json" };
const error = new ParseError("Invalid JSON", data);
expect(error.message).toBe("Invalid JSON");
expect(error.data).toBe(data);
expect(error.name).toBe("ParseError");
});
test("RateLimitError should store URL and reset time", () => {
const error = new RateLimitError("Rate limited", "https://example.com", 60);
expect(error.message).toBe("Rate limited");
expect(error.url).toBe("https://example.com");
expect(error.resetTime).toBe(60);
expect(error.name).toBe("RateLimitError");
});
test("ValidationError should work without field", () => {
const error = new ValidationError("Invalid value");
expect(error.message).toBe("Invalid value");
expect(error.name).toBe("ValidationError");
});
});

View File

@@ -1,363 +0,0 @@
import { afterEach, beforeEach, describe, expect, mock, test } from "bun:test";
import {
extractApolloState,
parseDetailedListing,
parseSearch,
} from "../src/kijiji";
// Mock fetch globally
const originalFetch = global.fetch;
describe("HTML Parsing Integration", () => {
beforeEach(() => {
// Mock fetch for all tests
global.fetch = mock(() => {
throw new Error("fetch should be mocked in individual tests");
});
});
afterEach(() => {
global.fetch = originalFetch;
});
describe("extractApolloState", () => {
test("should extract Apollo state from valid HTML", () => {
const mockHtml =
'<html><head><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"__APOLLO_STATE__":{"ROOT_QUERY":{"test":"value"}}}}}</script></head></html>';
const result = extractApolloState(mockHtml);
expect(result).toEqual({
ROOT_QUERY: { test: "value" },
});
});
test("should return null for HTML without Apollo state", () => {
const mockHtml = "<html><body>No data here</body></html>";
const result = extractApolloState(mockHtml);
expect(result).toBeNull();
});
test("should return null for malformed JSON", () => {
const mockHtml =
'<html><script id="__NEXT_DATA__" type="application/json">{"invalid": json}</script></html>';
const result = extractApolloState(mockHtml);
expect(result).toBeNull();
});
test("should handle missing __NEXT_DATA__ element", () => {
const mockHtml = "<html><body><div>Content</div></body></html>";
const result = extractApolloState(mockHtml);
expect(result).toBeNull();
});
});
describe("parseSearch", () => {
test("should parse search results from HTML", () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone 13 Pro",
},
"Listing:456": {
url: "/v-samsung/k0l0",
title: "Samsung Galaxy",
},
ROOT_QUERY: { test: "value" },
},
},
},
})}
</script>
</html>
`;
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
expect(results).toHaveLength(2);
expect(results[0]).toEqual({
name: "iPhone 13 Pro",
listingLink: "https://www.kijiji.ca/v-iphone/k0l0",
});
expect(results[1]).toEqual({
name: "Samsung Galaxy",
listingLink: "https://www.kijiji.ca/v-samsung/k0l0",
});
});
test("should handle absolute URLs", () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "https://www.kijiji.ca/v-iphone/k0l0",
title: "iPhone 13 Pro",
},
},
},
},
})}
</script>
</html>
`;
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
expect(results[0].listingLink).toBe(
"https://www.kijiji.ca/v-iphone/k0l0",
);
});
test("should filter out invalid listings", () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone 13 Pro",
},
"Listing:456": {
url: "/v-samsung/k0l0",
// Missing title
},
"Other:789": {
url: "/v-other/k0l0",
title: "Other Item",
},
},
},
},
})}
</script>
</html>
`;
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
expect(results).toHaveLength(1);
expect(results[0].name).toBe("iPhone 13 Pro");
});
test("should return empty array for invalid HTML", () => {
const results = parseSearch(
"<html><body>Invalid</body></html>",
"https://www.kijiji.ca",
);
expect(results).toEqual([]);
});
});
describe("parseDetailedListing", () => {
test("should parse detailed listing with all fields", async () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone-13-pro/k0l0",
title: "iPhone 13 Pro 256GB",
description: "Excellent condition iPhone 13 Pro",
price: {
amount: 80000,
currency: "CAD",
type: "FIXED",
},
type: "OFFER",
status: "ACTIVE",
activationDate: "2024-01-15T10:00:00.000Z",
endDate: "2025-01-15T10:00:00.000Z",
metrics: { views: 150 },
location: {
address: "Toronto, ON",
id: 1700273,
name: "Toronto",
coordinates: {
latitude: 43.6532,
longitude: -79.3832,
},
},
imageUrls: [
"https://media.kijiji.ca/api/v1/image1.jpg",
"https://media.kijiji.ca/api/v1/image2.jpg",
],
imageCount: 2,
categoryId: 132,
adSource: "ORGANIC",
flags: {
topAd: false,
priceDrop: true,
},
posterInfo: {
posterId: "user123",
rating: 4.8,
},
attributes: [
{
canonicalName: "forsaleby",
canonicalValues: ["ownr"],
},
{
canonicalName: "phonecarrier",
canonicalValues: ["unlocked"],
},
],
},
},
},
},
})}
</script>
</html>
`;
const result = await parseDetailedListing(
mockHtml,
"https://www.kijiji.ca",
);
expect(result).toEqual({
url: "https://www.kijiji.ca/v-iphone-13-pro/k0l0",
title: "iPhone 13 Pro 256GB",
description: "Excellent condition iPhone 13 Pro",
listingPrice: {
amountFormatted: "$800.00",
cents: 80000,
currency: "CAD",
},
listingType: "OFFER",
listingStatus: "ACTIVE",
creationDate: "2024-01-15T10:00:00.000Z",
endDate: "2025-01-15T10:00:00.000Z",
numberOfViews: 150,
address: "Toronto, ON",
images: [
"https://media.kijiji.ca/api/v1/image1.jpg",
"https://media.kijiji.ca/api/v1/image2.jpg",
],
categoryId: 132,
adSource: "ORGANIC",
flags: {
topAd: false,
priceDrop: true,
},
attributes: {
forsaleby: ["ownr"],
phonecarrier: ["unlocked"],
},
location: {
id: 1700273,
name: "Toronto",
coordinates: {
latitude: 43.6532,
longitude: -79.3832,
},
},
sellerInfo: {
posterId: "user123",
rating: 4.8,
},
});
});
test("should return null for contact-based pricing", async () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone for Sale",
price: {
type: "CONTACT",
amount: null,
},
},
},
},
},
})}
</script>
</html>
`;
const result = await parseDetailedListing(
mockHtml,
"https://www.kijiji.ca",
);
expect(result).toBeNull();
});
test("should handle missing optional fields", async () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone 13",
price: { amount: 50000 },
},
},
},
},
})}
</script>
</html>
`;
const result = await parseDetailedListing(
mockHtml,
"https://www.kijiji.ca",
);
expect(result).toEqual({
url: "https://www.kijiji.ca/v-iphone/k0l0",
title: "iPhone 13",
description: undefined,
listingPrice: {
amountFormatted: "$500.00",
cents: 50000,
currency: undefined,
},
listingType: undefined,
listingStatus: undefined,
creationDate: undefined,
endDate: undefined,
numberOfViews: undefined,
address: null,
images: [],
categoryId: 0,
adSource: "UNKNOWN",
flags: {
topAd: false,
priceDrop: false,
},
attributes: {},
location: {
id: 0,
name: "Unknown",
coordinates: undefined,
},
sellerInfo: undefined,
});
});
});
});

View File

@@ -1,54 +0,0 @@
import { afterEach, beforeEach, describe, expect, test } from "bun:test";
import { formatCentsToCurrency, slugify } from "../src/kijiji";
describe("Utility Functions", () => {
describe("slugify", () => {
test("should convert basic strings to slugs", () => {
expect(slugify("Hello World")).toBe("hello-world");
expect(slugify("iPhone 13 Pro")).toBe("iphone-13-pro");
});
test("should handle special characters", () => {
expect(slugify("Café & Restaurant")).toBe("cafe-restaurant");
expect(slugify("100% New")).toBe("100-new");
});
test("should handle empty and edge cases", () => {
expect(slugify("")).toBe("");
expect(slugify(" ")).toBe("-");
expect(slugify("---")).toBe("-");
});
test("should preserve numbers and valid characters", () => {
expect(slugify("iPhone 13")).toBe("iphone-13");
expect(slugify("item123")).toBe("item123");
});
});
describe("formatCentsToCurrency", () => {
test("should format valid cent values", () => {
expect(formatCentsToCurrency(100)).toBe("$1.00");
expect(formatCentsToCurrency(1999)).toBe("$19.99");
expect(formatCentsToCurrency(0)).toBe("$0.00");
});
test("should handle string inputs", () => {
expect(formatCentsToCurrency("100")).toBe("$1.00");
expect(formatCentsToCurrency("1999")).toBe("$19.99");
});
test("should handle null/undefined inputs", () => {
expect(formatCentsToCurrency(null)).toBe("");
expect(formatCentsToCurrency(undefined)).toBe("");
});
test("should handle invalid inputs", () => {
expect(formatCentsToCurrency("invalid")).toBe("");
expect(formatCentsToCurrency(Number.NaN)).toBe("");
});
test("should use en-US locale formatting", () => {
expect(formatCentsToCurrency(123456)).toBe("$1,234.56");
});
});
});

View File

@@ -1,14 +0,0 @@
// Test setup for Bun test runner
import { expect } from "bun:test";
// Global test setup
// This file is loaded before any tests run due to bunfig.toml preload
// Mock fetch globally for tests
global.fetch =
global.fetch ||
(() => {
throw new Error("fetch is not available in test environment");
});
// Add any global test utilities here

View File

@@ -7,21 +7,25 @@
"moduleDetection": "force", "moduleDetection": "force",
"jsx": "react-jsx", "jsx": "react-jsx",
"allowJs": true, "allowJs": true,
// Bundler mode // Bundler mode
"moduleResolution": "bundler", "moduleResolution": "bundler",
"allowImportingTsExtensions": true, "allowImportingTsExtensions": true,
"verbatimModuleSyntax": true, "verbatimModuleSyntax": true,
"noEmit": true, "noEmit": true,
// Best practices // Best practices
"strict": true, "strict": true,
"skipLibCheck": true, "skipLibCheck": true,
"noFallthroughCasesInSwitch": true, "noFallthroughCasesInSwitch": true,
"noUncheckedIndexedAccess": true, "noUncheckedIndexedAccess": true,
"noImplicitAny": true, "noImplicitAny": true,
// Some stricter flags (disabled by default) // Some stricter flags (disabled by default)
"noUnusedLocals": false, "noUnusedLocals": false,
"noUnusedParameters": false, "noUnusedParameters": false,
"noPropertyAccessFromIndexSignature": false, "noPropertyAccessFromIndexSignature": false,
"paths": { "paths": {
"@/*": ["./src/*"] "@/*": ["./src/*"]
} }