Compare commits

...

12 Commits

Author SHA1 Message Date
ec545723bb feat(facebook): add challenge detection and session warming utilities
facebook-challenge.ts: session warmup, header construction, and challenge type detection. Spec document for the anti-bot challenge solver design.
2026-05-02 19:03:00 -04:00
0a246a29bf feat(facebook): add session warming and challenge detection
Facebook Marketplace no longer requires authentication cookies.
Session warming sends proper browser headers. Checkpoint and
login-wall challenges are detected and handled gracefully.
Added marketplace_product_details_page.target extraction path
for current item page structure.
2026-05-02 18:58:53 -04:00
7ab33d0b02 chore: format markdown
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-05-01 11:42:54 -04:00
d2c3c07e7d docs: price filtering schema adjustments
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-04-30 23:18:49 -04:00
0470a7bec7 docs(mcp): clarify price filters are dollars 2026-04-30 23:17:59 -04:00
89ad1c521f fix(api): parse price filters as dollars 2026-04-30 23:17:56 -04:00
5c732287c5 test: guard live listing prices 2026-04-30 22:46:48 -04:00
20fb46190a test: add live parser script 2026-04-30 22:46:07 -04:00
e791fc5478 test(facebook): add live parser suite 2026-04-30 22:44:28 -04:00
c1fa5168dc test(kijiji): add live parser suite 2026-04-30 22:43:52 -04:00
ec2a26cedf test(ebay): add live parser suite 2026-04-30 22:42:32 -04:00
5d99e984e0 docs: plan live parser tests 2026-04-30 22:41:41 -04:00
28 changed files with 1862 additions and 451 deletions

View File

@@ -1,44 +1,56 @@
# Facebook Marketplace API Reverse Engineering
## Overview
This document tracks findings from reverse-engineering Facebook Marketplace APIs for listing details.
This document tracks findings from reverse-engineering Facebook Marketplace APIs for
listing details.
## Current Implementation Status
- Search functionality: Implemented in `src/facebook.ts`
- Individual listing details: Not yet implemented
## Findings
### Step 1: Initial Setup
- Using Chrome DevTools to inspect Facebook Marketplace
- Need to authenticate with Facebook account to access marketplace data
- Cookies required for full access
- Current status: Successfully logged in and accessed marketplace data
### Step 2: Individual Listing Details Analysis - COMPLETED
- **Data Location**: Embedded in HTML script tags within `require` array structure
- **Path**: `require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Path**:
`require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Authentication**: Required for full data access
- **Current Status**: Successfully reverse-engineered the API structure and data extraction method
- **Current Status**: Successfully reverse-engineered the API structure and data
extraction method
### API Endpoints Discovered
#### Search Endpoint
- URL: `https://www.facebook.com/marketplace/{location}/search`
- Parameters: `query`, `sortBy`, `exact`
- Data embedded in HTML script tags with `require` structure
- Authentication: Required (cookies)
#### Listing Details Endpoint
- **URL Structure**: `https://www.facebook.com/marketplace/item/{listing_id}/`
- **Data Source**: Server-side rendered HTML with embedded JSON data in script tags
- **Data Structure**: Relay/GraphQL style data structure under `require[0][3].__bbox.require[...].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Extraction Method**: Parse JSON from script tags containing marketplace data, navigate to the target object
- **Data Structure**: Relay/GraphQL style data structure under
`require[0][3].__bbox.require[...].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Extraction Method**: Parse JSON from script tags containing marketplace data,
navigate to the target object
- **Authentication**: Required (cookies)
### Listing Data Structure Discovered (Current - 2026)
The current Facebook Marketplace API returns a comprehensive `GroupCommerceProductItem` object with the following key properties:
The current Facebook Marketplace API returns a comprehensive `GroupCommerceProductItem`
object with the following key properties:
```typescript
interface FacebookMarketplaceItem {
@@ -151,6 +163,7 @@ interface FacebookMarketplaceItem {
```
### Example Data Extracted (Current Structure)
```json
{
"__typename": "GroupCommerceProductItem",
@@ -228,36 +241,47 @@ interface FacebookMarketplaceItem {
## Data Extraction Method
### Current Method (2026)
Facebook Marketplace listing data is embedded in JSON within `<script>` tags in the HTML response. The extraction process:
1. **Find the Correct Script**: Look for script tags containing marketplace listing data by searching for key fields like `marketplace_listing_title`, `redacted_description`, and `formatted_price`.
Facebook Marketplace listing data is embedded in JSON within `<script>` tags in the HTML
response. The extraction process:
1. **Find the Correct Script**: Look for script tags containing marketplace listing data
by searching for key fields like `marketplace_listing_title`, `redacted_description`,
and `formatted_price`.
2. **Parse JSON Structure**: The data is nested within a `require` array structure:
```
require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target
```
3. **Navigate to Target Object**: The actual listing data is a `GroupCommerceProductItem` object containing comprehensive information about the listing, seller, and vehicle details.
3. **Navigate to Target Object**: The actual listing data is a
`GroupCommerceProductItem` object containing comprehensive information about the
listing, seller, and vehicle details.
4. **Handle Dynamic Structure**: Facebook may change the exact path, so robust extraction should search for the target object recursively within the parsed JSON.
4. **Handle Dynamic Structure**: Facebook may change the exact path, so robust
extraction should search for the target object recursively within the parsed JSON.
### Authentication Requirements
- Valid Facebook session cookies are required
- User must be logged in to Facebook
- Marketplace access may be location-restricted
## Tools Used
- Chrome DevTools Protocol
- Network monitoring
- HTML/script parsing
- JSON structure analysis
## Implementation Status
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
- ✅ Identified current data structure and extraction method (2026)
- ✅ Documented comprehensive GroupCommerceProductItem interface
- ✅ Implemented `extractFacebookItemData()` function with script parsing logic
- ✅ Implemented `parseFacebookItem()` function to convert GroupCommerceProductItem to ListingDetails
- ✅ Implemented `parseFacebookItem()` function to convert GroupCommerceProductItem to
ListingDetails
- ✅ Implemented `fetchFacebookItem()` function with authentication and error handling
- ✅ Updated TypeScript interfaces to match current API structure
- ✅ Added robust extraction with fallback methods for changing API paths
@@ -266,12 +290,15 @@ Facebook Marketplace listing data is embedded in JSON within `<script>` tags in
### Core Functions Implemented
1. **`extractFacebookItemData(htmlString)`**: Extracts marketplace item data from HTML-embedded JSON in script tags
1. **`extractFacebookItemData(htmlString)`**: Extracts marketplace item data from
HTML-embedded JSON in script tags
- Searches for scripts containing marketplace listing data
- Uses primary path: `require[0][3][0].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
- Uses primary path:
`require[0][3][0].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
- Falls back to recursive search for GroupCommerceProductItem objects
2. **`parseFacebookItem(item)`**: Converts Facebook's GroupCommerceProductItem to unified ListingDetails format
2. **`parseFacebookItem(item)`**: Converts Facebooks GroupCommerceProductItem to
unified ListingDetails format
- Handles pricing (FREE listings, CAD currency)
- Extracts seller information, location, and status
- Supports vehicle-specific metadata
@@ -284,25 +311,31 @@ Facebook Marketplace listing data is embedded in JSON within `<script>` tags in
- Returns parsed ListingDetails or null on failure
### Authentication Requirements
- Facebook session cookies required in `./cookies/facebook.json` or provided as parameter
- Facebook session cookies required in `./cookies/facebook.json` or provided as
parameter
- Cookies must include valid authentication tokens for marketplace access
- Handles cookie expiration and domain validation
## Current Implementation Status - 2026 Verification
### Step 3: API Verification and Current Structure Analysis (January 2026)
- **Verification Date**: January 22, 2026
- **Status**: Successfully verified current Facebook Marketplace API structure
- **Data Source**: Embedded JSON in HTML script tags (server-side rendered)
- **Extraction Path**: `require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Extraction Path**:
`require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
#### Verified Listing Structure (Real Example - 2006 Hyundai Tiburon)
- **Listing ID**: 1226468515995685
- **Title**: "2006 Hyundai Tiburon"
- **Title**: 2006 Hyundai Tiburon
- **Price**: CA$3,000 (formatted_price.text)
- **Raw Price Data**: {"amount_with_offset": "300000", "currency": "CAD", "amount": "3000.00"}
- **Raw Price Data**: {"amount_with_offset": 300000, currency: CAD, amount”:
"3000.00"}
- **Location**: Hamilton, ON (with coordinates: 43.250427246094, -79.963989257812)
- **Description**: "As is" (redacted_description.text)
- **Description**: As is (redacted_description.text)
- **Vehicle Details**:
- Make: Hyundai
- Model: Tiburon
@@ -323,41 +356,54 @@ Facebook Marketplace listing data is embedded in JSON within `<script>` tags in
- **Messaging**: Enabled
#### Current API Characteristics
- **Authentication**: Still requires valid Facebook session cookies
- **Data Format**: Server-side rendered HTML with embedded GraphQL/Relay JSON
- **Structure Stability**: Primary extraction path remains functional
- **Additional Features**: Includes marketplace ratings, seller verification badges, cross-posting info
- **Additional Features**: Includes marketplace ratings, seller verification badges,
cross-posting info
### API Changes Observed Since 2024 Documentation
- **Minimal Changes**: Core data structure largely unchanged
- **Enhanced Fields**: Added more detailed vehicle specifications and seller profile information
- **GraphQL Integration**: Deeper integration with Facebook's GraphQL infrastructure
- **Enhanced Fields**: Added more detailed vehicle specifications and seller profile
information
- **GraphQL Integration**: Deeper integration with Facebooks GraphQL infrastructure
- **Security Features**: Additional integrity checks and reporting mechanisms
### Multi-Category Testing Results (January 2026)
Successfully tested extraction across different listing categories:
#### 1. Vehicle Listings (Automotive)
- **Example**: 2006 Hyundai Tiburon (ID: 1226468515995685)
- **Status**: ✅ Fully functional
- **Data Extracted**: Complete vehicle specs, pricing, seller info, location coordinates
- **Unique Fields**: vehicle_make_display_name, vehicle_odometer_data, vehicle_transmission_type, vehicle_exterior_color, vehicle_interior_color, vehicle_fuel_type
- **Unique Fields**: vehicle_make_display_name, vehicle_odometer_data,
vehicle_transmission_type, vehicle_exterior_color, vehicle_interior_color,
vehicle_fuel_type
#### 2. Electronics Listings
- **Example**: Nintendo Switch (ID: 3903865769914262)
- **Status**: ✅ Fully functional
- **Data Extracted**: Title, price (CA$140), location (Toronto, ON), condition (Used - like new), seller (Yitao Hou)
- **Data Extracted**: Title, price (CA$140), location (Toronto, ON), condition (Used -
like new), seller (Yitao Hou)
- **Category**: Electronics (category_id: 479353692612078)
- **Notes**: Standard GroupCommerceProductItem structure applies
#### 3. Home Goods/Furniture Listings
- **Example**: Tabletop Mirror (cat not included) (ID: 1082389057290709)
- **Status**: ✅ Fully functional
- **Data Extracted**: Title, price (CA$5), location (Mississauga, ON), condition (Used - like new), seller (Rohit Rehan)
- **Data Extracted**: Title, price (CA$5), location (Mississauga, ON), condition (Used -
like new), seller (Rohit Rehan)
- **Category**: Home Goods (category_id: 1569171756675761)
- **Notes**: Includes detailed description and delivery options
#### Testing Summary
- **Extraction Method**: Consistent across all categories
- **Data Structure**: GroupCommerceProductItem interface works for all listing types
- **Authentication**: Required for all categories
@@ -365,16 +411,20 @@ Successfully tested extraction across different listing categories:
- **Edge Cases**: All tested listings were active/in-person pickup
## Implementation Status - COMPLETED (January 2026)
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
- ✅ Verified current API structure and extraction method (January 2026)
- ✅ Tested extraction across multiple listing categories (vehicles, electronics, home goods)
- ✅ Implemented comprehensive error handling for sold/removed listings and authentication failures
- ✅ Tested extraction across multiple listing categories (vehicles, electronics, home
goods)
- ✅ Implemented comprehensive error handling for sold/removed listings and
authentication failures
- ✅ Enhanced rate limiting and retry logic (already robust)
- ✅ Added monitoring and metrics for API stability detection
- ✅ Updated all scraper functions to use verified extraction methods
- ✅ Documented comprehensive GroupCommerceProductItem interface with real examples
## Next Steps (Future Maintenance)
1. Monitor extraction success rates for API change detection
2. Update extraction paths if Facebook changes their API structure
3. Add support for additional marketplace features as they become available

145
KIJIJI.md
View File

@@ -1,9 +1,13 @@
# Kijiji API Findings
## Overview
Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management.
Kijiji is a Canadian classifieds marketplace that uses a modern web application built
with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with
client-side state management.
## Initial Page Load (Homepage)
- **URL**: https://www.kijiji.ca/
- **Architecture**: Server-side rendered React application with Next.js
- **Data Sources**:
@@ -12,18 +16,27 @@ Kijiji is a Canadian classifieds marketplace that uses a modern web application
- No initial API calls for listings - data appears to be embedded in HTML
## Search Results Page
- **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0`
- **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0`
- **Technology Stack**: Next.js with Apollo GraphQL client
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized GraphQL cache
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized
GraphQL cache
### GraphQL Data Structure
#### Data Location
Search results data is embedded in the Next.js page props under `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data.
Search results data is embedded in the Next.js page props under
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server
and sent to the client.
Each page (including pagination) has its own pre-rendered data.
#### Search Results Container
The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination parameters.
The search results are stored directly in the Apollo ROOT_QUERY with keys following the
pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination
parameters.
```json
{
@@ -33,17 +46,20 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
```
#### Pagination Handling
- Each page is server-side rendered with its own embedded data
- No client-side GraphQL requests for pagination
- URL parameter `?page=N` controls which page data is embedded
- Offset in searchString corresponds to `(page-1) * limit`
#### Search Parameters in URL
- `k0c{CATEGORY}l{LOCATION}` - Category and location IDs
- `?page=N` - Page number (1-based)
- Data contains `offset` and `limit` for API-style pagination
#### Individual Listing Structure
```json
{
"id": "1732061412",
@@ -90,6 +106,7 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
```
### URL Parameters
- `sort=MATCH` - Sort by relevance
- `order=DESC` - Descending order
- `type=OFFER` - Show offerings (not wanted ads)
@@ -102,6 +119,7 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
- `eaTopAdPosition=1` - ?
### Image API
- **Endpoint**: `https://media.kijiji.ca/api/v1/`
- **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg`
- **Sizes**: 200, 300, 400, 500 pixels
@@ -109,10 +127,12 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
### Categories and Locations
#### Category Structure
Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include:
Categories are hierarchical with parent-child relationships.
The main categories under “Buy & Sell” include:
| ID | Name | Total Results (iPhone search) |
|----|------|------------------------------|
| --- | --- | --- |
| 10 | Buy & Sell | 19956 |
| 12 | Arts & Collectibles | 149 |
| 767 | Audio | 481 |
@@ -145,10 +165,11 @@ Categories are hierarchical with parent-child relationships. The main categories
| 26 | Other | 286 |
#### Location Structure
Locations are also hierarchical, with provinces/states under the main "Canada" location:
Locations are also hierarchical, with provinces/states under the main “Canada” location:
| ID | Name | Total Results (iPhone search) |
|----|------|------------------------------|
| --- | --- | --- |
| 0 | Canada | - |
| 9001 | Québec | 2516 |
| 9002 | Nova Scotia | 875 |
@@ -163,16 +184,20 @@ Locations are also hierarchical, with provinces/states under the main "Canada" l
| 9011 | Prince Edward Island | 31 |
#### URL Patterns
- Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0`
- Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}`
- Combined: `/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
- Combined:
`/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
### Pagination
- Uses offset-based pagination
- 40 results per page
- Total count provided in pagination metadata
## Authentication & User Management
- **Authentication System**: OAuth2-based using CIS (Customer Identity Service)
- **Identity Provider**: `id.kijiji.ca`
- **OAuth2 Flow**:
@@ -184,24 +209,30 @@ Locations are also hierarchical, with provinces/states under the main "Canada" l
- **User Features**: Saved searches, messaging, flagging require authentication
## Posting API
- **Posting Flow**: Requires authentication, redirects to login if not authenticated
- **Posting URL**: `https://www.kijiji.ca/p-post-ad.html`
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated users
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing)
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated
users
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in
anonymous browsing)
## GraphQL API Endpoint
- **URL**: `https://www.kijiji.ca/anvil/api`
- **Method**: POST
- **Content-Type**: application/json
- **Headers**:
- `apollo-require-preflight: true`
- Standard CORS headers
- **Authentication**: No authentication required for basic queries (uses cookies for session tracking)
- **Authentication**: No authentication required for basic queries (uses cookies for
session tracking)
- **Technology**: Apollo GraphQL server
### Sample GraphQL Queries Discovered
#### Get Search Categories
```graphql
query getSearchCategories($locale: String!) {
searchCategories {
@@ -218,6 +249,7 @@ Variables: `{"locale": "en-CA"}`
Response includes hierarchical category structure with IDs and localized names.
#### Get Geocode from IP (fails for current IP)
```graphql
query GetGeocodeReverseFromIp {
geocodeReverseFromIp {
@@ -229,9 +261,11 @@ query GetGeocodeReverseFromIp {
}
```
This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges.
This query fails for the current IP address, suggesting geolocation-based features may
not work or require different IP ranges.
#### Get Category Path
```graphql
query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
category(id: $categoryId) {
@@ -256,25 +290,33 @@ Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}`
## Latest Findings (2026-01-21)
### Client-Side GraphQL Queries Observed
- **getSearchCategories**: Retrieves category hierarchy for search filters
- **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP)
### GraphQL Schema Insights
Testing direct GraphQL queries revealed:
- Field "searchResults" does not exist on Query type
- Suggested alternatives: "searchResultsPage" or "searchUrl"
- This suggests the search functionality may use different GraphQL operations than direct queries
The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation.
Testing direct GraphQL queries revealed:
- Field “searchResults” does not exist on Query type
- Suggested alternatives: “searchResultsPage” or “searchUrl”
- This suggests the search functionality may use different GraphQL operations than
direct queries
The embedded Apollo state approach appears to be the primary method for accessing search
data, with GraphQL used for auxiliary operations like categories and geolocation.
### Server-Side Rendering Architecture
Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for:
Search results are fully server-side rendered with data embedded in HTML. Each page
(including pagination) contains its own pre-rendered data.
No client-side GraphQL requests are made for:
- Initial search results
- Pagination navigation
- Search result data
### Network Analysis Findings
- GraphQL endpoint: `https://www.kijiji.ca/anvil/api`
- Method: POST
- Content-Type: application/json
@@ -282,7 +324,10 @@ Search results are fully server-side rendered with data embedded in HTML. Each p
- Cookies required for session tracking
### Embedded Data Structure
Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object. The data includes:
Search results data is embedded in the HTML within Next.js
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object.
The data includes:
- Individual ad listings with complete metadata
- Pagination information
@@ -290,20 +335,24 @@ Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.
- Category/location hierarchies
### Current Scraper Implementation
The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state:
- Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML
- Filters Apollo keys containing "Listing" to find ad data
- Filters Apollo keys containing Listing to find ad data
- Extracts `url`, `title`, and other metadata from each listing
- Successfully scrapes listings without needing API authentication
### Authentication Status
- **Search functionality**: No authentication required - all search and listing data accessible anonymously
- **Search functionality**: No authentication required - all search and listing data
accessible anonymously
- **Posting functionality**: Requires authentication (redirects to login)
- **User features**: Saved searches, messaging require authentication
- **Rate limiting**: May apply but not observed in anonymous browsing
### Pagination Implementation
- Each page is a separate server-rendered route
- URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}`
- No client-side pagination API calls
@@ -313,20 +362,24 @@ The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo
## URL Pattern Analysis
### Search URL Structure
`https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}`
#### Examples Observed:
- All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada)
- Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones)
- With pagination: `/b-canada/iphone/page-2/k0l0`
#### URL Components:
- `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
- `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.)
- `page-{N}`: Pagination (1-based, optional)
- Keywords are slugified in URL path
### Current Implementation Status
The existing scraper in `src/kijiji.ts` successfully implements the approach:
- Parses embedded Apollo state from HTML responses
- Handles rate limiting and retries
@@ -336,14 +389,22 @@ The existing scraper in `src/kijiji.ts` successfully implements the approach:
## Listing Details Page
### Overview
Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server.
Similar to search results, listing details pages use server-side rendering with embedded
Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing
data - all information is pre-rendered on the server.
### Data Architecture
- **Server-Side Rendering**: Each listing page is fully server-rendered with data embedded in HTML
- **Embedded Apollo State**: Listing data is stored in `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API
- **Server-Side Rendering**: Each listing page is fully server-rendered with data
embedded in HTML
- **Embedded Apollo State**: Listing data is stored in
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings,
user profiles) fetched via GraphQL API
### Listing Data Structure
The main listing data follows the same pattern as search results:
```json
@@ -385,40 +446,50 @@ The main listing data follows the same pattern as search results:
```
### Client-Side GraphQL Queries
When loading a listing details page, the following GraphQL queries are executed:
#### 1. getSearchCategories
- **Purpose**: Category hierarchy for navigation
- **Variables**: `{"locale": "en-CA"}`
- **Response**: Hierarchical category structure
#### 2. getCampaignsForVip
- **Purpose**: Advertisement targeting data
- **Variables**: `{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
- **Variables**:
`{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
- **Response**: Campaign/ads data (usually null)
#### 3. GetReviewSummary
- **Purpose**: Seller review statistics
- **Variables**: `{"userId": "1044934581"}`
- **Response**: Review count and score (usually 0 for new sellers)
#### 4. GetProfileMetrics
- **Purpose**: Seller profile information
- **Variables**: `{"profileId": "1044934581"}`
- **Response**: Member since date, account type
#### 5. GetListingsSimilar
- **Purpose**: Similar listings for cross-selling
- **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}`
- **Response**: Array of similar listings with basic metadata
#### 6. GetGeocodeReverseFromIp
- **Purpose**: Geolocation-based features
- **Variables**: `{}`
- **Response**: Fails with 404 for most IPs
### Implementation Status
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing details from embedded Apollo state:
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing
details from embedded Apollo state:
- ✅ Extracts title, description, price, location
- ✅ Handles contact-based pricing ("Please Contact")
@@ -427,22 +498,30 @@ The existing `parseListing()` function in `src/kijiji.ts` successfully extracts
- ✅ Works without authentication or API keys
### Key Findings
1. **No Dedicated Listing API**: Unlike search results, there's no separate GraphQL query for individual listing data
2. **Complete Data Available**: All listing information is embedded in the initial HTML response
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data (reviews, similar listings)
1. **No Dedicated Listing API**: Unlike search results, theres no separate GraphQL
query for individual listing data
2. **Complete Data Available**: All listing information is embedded in the initial HTML
response
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data
(reviews, similar listings)
4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages
### Current Scraper Implementation
The scraper successfully extracts listing details by:
1. Fetching the listing URL HTML
2. Parsing embedded `__NEXT_DATA__` Apollo state
3. Extracting the `Listing:{id}` object from Apollo cache
4. Mapping fields to typed `ListingDetails` interface
This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches.
This approach works reliably without requiring authentication or dealing with rate
limiting on individual listing fetches.
## Next Steps
- Explore posting/authentication APIs (requires user login)
- Investigate if GraphQL API can be used for programmatic access with proper authentication
- Investigate if GraphQL API can be used for programmatic access with proper
authentication
- Test rate limiting patterns and optimal scraping strategies
- Document additional category and location ID mappings

View File

@@ -1 +1,2 @@
# ca-marketplace-scraper

View File

@@ -1,14 +1,21 @@
# opencode Monorepo Config Adoption Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
> to implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Adopt opencode-style monorepo config: Turbo task orchestration, workspace dep catalog, shared root tsconfig, bunfig.toml, and `exports` field in all packages.
**Goal:** Adopt opencode-style monorepo config: Turbo task orchestration, workspace dep
catalog, shared root tsconfig, bunfig.toml, and `exports` field in all packages.
**Architecture:** Pure config changes across 10 files — no source code touched. Root config files are added/updated first, then per-package files updated to reference them. Changes are independent within each task and safe to commit atomically.
**Architecture:** Pure config changes across 10 files — no source code touched.
Root config files are added/updated first, then per-package files updated to reference
them. Changes are independent within each task and safe to commit atomically.
**Tech Stack:** Bun workspaces, Turbo 2.x, @tsconfig/bun, TypeScript (tsgo / @typescript/native-preview)
**Tech Stack:** Bun workspaces, Turbo 2.x, @tsconfig/bun, TypeScript (tsgo /
@typescript/native-preview)
---
* * *
## File Map
@@ -25,14 +32,16 @@
| `packages/api-server/tsconfig.json` | Modify | Slim — extends root, paths only |
| `packages/mcp-server/tsconfig.json` | Modify | Slim — extends root, paths only |
---
* * *
### Task 1: Add `bunfig.toml` and `turbo.json`
Two new root config files with no dependencies on other tasks.
**Files:**
- Create: `bunfig.toml`
- Create: `turbo.json`
- [ ] **Step 1: Create `bunfig.toml`**
@@ -83,13 +92,15 @@ git add bunfig.toml turbo.json
git commit -m "chore: add bunfig.toml and turbo.json"
```
---
* * *
### Task 2: Create root `tsconfig.json`
Shared base tsconfig all packages will extend. Extracts the common options currently duplicated in all 3 per-package tsconfigs.
Shared base tsconfig all packages will extend.
Extracts the common options currently duplicated in all 3 per-package tsconfigs.
**Files:**
- Create: `tsconfig.json`
- [ ] **Step 1: Create root `tsconfig.json`**
@@ -130,13 +141,15 @@ git add tsconfig.json
git commit -m "chore: add shared root tsconfig.json"
```
---
* * *
### Task 3: Update root `package.json`
Add workspace catalog, `turbo` + `@tsconfig/bun` devDependencies, and update scripts to use `turbo run`.
Add workspace catalog, `turbo` + `@tsconfig/bun` devDependencies, and update scripts to
use `turbo run`.
**Files:**
- Modify: `package.json`
- [ ] **Step 1: Replace root `package.json`**
@@ -180,7 +193,11 @@ Write this complete file:
}
```
> **Note on catalog versions:** The catalog pins exact versions. The values above are taken from the current package installs. If `@types/bun` was `latest`, check `node_modules/@types/bun/package.json` for the actual installed version and use that. Same for `@typescript/native-preview`.
> **Note on catalog versions:** The catalog pins exact versions.
> The values above are taken from the current package installs.
> If `@types/bun` was `latest`, check `node_modules/@types/bun/package.json` for the
> actual installed version and use that.
> Same for `@typescript/native-preview`.
- [ ] **Step 2: Check actual installed versions**
@@ -208,7 +225,8 @@ Expected: lock file updated, `turbo` and `@tsconfig/bun` appear in `node_modules
bunx turbo run typecheck --dry
```
Expected: output lists the `typecheck` task for each package (even if no `typecheck` script exists yet — turbo will note them as skipped/missing).
Expected: output lists the `typecheck` task for each package (even if no `typecheck`
script exists yet — turbo will note them as skipped/missing).
- [ ] **Step 5: Commit**
@@ -217,15 +235,19 @@ git add package.json bun.lock
git commit -m "chore: add workspace catalog and turbo to root package.json"
```
---
* * *
### Task 4: Update per-package `package.json` files
Rename `type:check``typecheck`, replace `main`/`module` with `exports`, swap pinned dep versions for `catalog:` references.
Rename `type:check``typecheck`, replace `main`/`module` with `exports`, swap pinned
dep versions for `catalog:` references.
**Files:**
- Modify: `packages/core/package.json`
- Modify: `packages/api-server/package.json`
- Modify: `packages/mcp-server/package.json`
- [ ] **Step 1: Replace `packages/core/package.json`**
@@ -325,7 +347,9 @@ Rename `type:check` → `typecheck`, replace `main`/`module` with `exports`, swa
bun install
```
Expected: no errors. Catalog refs resolved. `bun.lock` updated.
Expected: no errors.
Catalog refs resolved.
`bun.lock` updated.
- [ ] **Step 5: Verify typecheck still works per-package**
@@ -345,15 +369,19 @@ git add packages/core/package.json packages/api-server/package.json packages/mcp
git commit -m "chore: use exports field and catalog refs in all packages"
```
---
* * *
### Task 5: Slim per-package `tsconfig.json` files
Replace the duplicated full tsconfig in each package with a slim `extends`-based one pointing to root.
Replace the duplicated full tsconfig in each package with a slim `extends`-based one
pointing to root.
**Files:**
- Modify: `packages/core/tsconfig.json`
- Modify: `packages/api-server/tsconfig.json`
- Modify: `packages/mcp-server/tsconfig.json`
- [ ] **Step 1: Replace `packages/core/tsconfig.json`**
@@ -400,7 +428,8 @@ Replace the duplicated full tsconfig in each package with a slim `extends`-based
- [ ] **Step 4: Verify `@tsconfig/bun` is resolvable**
The root tsconfig extends `@tsconfig/bun/tsconfig.json`. Confirm the package is installed:
The root tsconfig extends `@tsconfig/bun/tsconfig.json`. Confirm the package is
installed:
```bash
ls node_modules/@tsconfig/bun/tsconfig.json
@@ -414,7 +443,8 @@ Expected: file exists.
bun run typecheck
```
Expected: Turbo runs `typecheck` for all 3 packages in parallel, all pass (or same pre-existing errors — no new ones).
Expected: Turbo runs `typecheck` for all 3 packages in parallel, all pass (or same
pre-existing errors — no new ones).
- [ ] **Step 6: Commit**
@@ -423,7 +453,7 @@ git add packages/core/tsconfig.json packages/api-server/tsconfig.json packages/m
git commit -m "chore: slim per-package tsconfigs to extend root"
```
---
* * *
### Task 6: Smoke test full build pipeline
@@ -437,7 +467,8 @@ Verify everything works end-to-end.
bun run typecheck
```
Expected: Turbo runs `typecheck` across all packages. Exit 0.
Expected: Turbo runs `typecheck` across all packages.
Exit 0.
- [ ] **Step 2: Run full build**
@@ -445,7 +476,8 @@ Expected: Turbo runs `typecheck` across all packages. Exit 0.
bun run build
```
Expected: `dist/` cleaned, Turbo runs `build` (core first, then api-server and mcp-server in parallel), build artifacts appear in `dist/api/` and `dist/mcp/`.
Expected: `dist/` cleaned, Turbo runs `build` (core first, then api-server and
mcp-server in parallel), build artifacts appear in `dist/api/` and `dist/mcp/`.
- [ ] **Step 3: Verify dist artifacts**
@@ -461,7 +493,9 @@ Expected: compiled output files in both directories.
grep -c '\^' bun.lock | head -5
```
With `exact = true` in bunfig.toml, new installs won't add `^` ranges. Existing `^` ranges in `bun.lock` from before are fine — they'll be resolved to exact on next fresh install.
With `exact = true` in bunfig.toml, new installs wont add `^` ranges.
Existing `^` ranges in `bun.lock` from before are fine — theyll be resolved to exact on
next fresh install.
- [ ] **Step 5: Final commit if any loose files**

View File

@@ -1,53 +1,64 @@
# Cookie Env-Only Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
> to implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Remove cookie files and request-provided cookie overrides so all authenticated marketplace scraping reads raw `Cookie` header strings only from environment variables.
**Goal:** Remove cookie files and request-provided cookie overrides so all authenticated
marketplace scraping reads raw `Cookie` header strings only from environment variables.
**Architecture:** Collapse shared cookie loading to a single env-var reader in `packages/core/src/utils/cookies.ts`, then tighten Facebook and eBay core signatures to stop accepting request/file cookie inputs. Update the API and MCP adapters so they no longer advertise or forward cookie parameters, and rewrite docs/tests to match the env-only contract.
**Architecture:** Collapse shared cookie loading to a single env-var reader in
`packages/core/src/utils/cookies.ts`, then tighten Facebook and eBay core signatures to
stop accepting request/file cookie inputs.
Update the API and MCP adapters so they no longer advertise or forward cookie
parameters, and rewrite docs/tests to match the env-only contract.
**Tech Stack:** Bun, TypeScript, Bun test, Biome, workspace package exports
---
* * *
## File Map
- Modify: `packages/core/src/utils/cookies.ts`
Purpose: remove JSON/file/request-source loading and keep env-only cookie parsing/formatting.
- Modify: `packages/core/src/scrapers/facebook.ts`
Purpose: drop `cookiesSource` / `cookiePath` arguments and env-only error text.
- Modify: `packages/core/src/scrapers/ebay.ts`
Purpose: remove `opts.cookies` request override and use env-only cookie loading.
- Modify: `packages/core/src/index.ts`
Purpose: keep exports aligned with tightened core signatures.
- Modify: `packages/core/test/facebook-core.test.ts`
Purpose: replace missing-file coverage with env-only auth tests.
- Create: `packages/core/test/ebay-core.test.ts`
Purpose: add dedicated eBay auth regression coverage instead of mixing it into Facebook tests.
- Modify: `packages/api-server/src/routes/facebook.ts`
Purpose: stop parsing/forwarding `cookies` query params.
- Modify: `packages/api-server/src/routes/ebay.ts`
Purpose: stop parsing/forwarding `cookies` query params.
- Create: `packages/api-server/test/routes.test.ts`
Purpose: verify Facebook/eBay routes ignore cookie query params and still call core correctly.
- Modify: `packages/mcp-server/src/protocol/tools.ts`
Purpose: remove Facebook/eBay cookie tool inputs and descriptions.
- Modify: `packages/mcp-server/src/protocol/handler.ts`
Purpose: stop mapping removed cookie tool inputs into API URLs.
- Create: `packages/mcp-server/test/protocol.test.ts`
Purpose: verify tool schemas and handler URL building no longer include Facebook/eBay cookie fields.
- Modify: `cookies/AGENTS.md`
Purpose: document env vars as the only supported cookie input.
- Modify: `packages/core/src/utils/cookies.ts` Purpose: remove JSON/file/request-source
loading and keep env-only cookie parsing/formatting.
- Modify: `packages/core/src/scrapers/facebook.ts` Purpose: drop `cookiesSource` /
`cookiePath` arguments and env-only error text.
- Modify: `packages/core/src/scrapers/ebay.ts` Purpose: remove `opts.cookies` request
override and use env-only cookie loading.
- Modify: `packages/core/src/index.ts` Purpose: keep exports aligned with tightened core
signatures.
- Modify: `packages/core/test/facebook-core.test.ts` Purpose: replace missing-file
coverage with env-only auth tests.
- Create: `packages/core/test/ebay-core.test.ts` Purpose: add dedicated eBay auth
regression coverage instead of mixing it into Facebook tests.
- Modify: `packages/api-server/src/routes/facebook.ts` Purpose: stop parsing/forwarding
`cookies` query params.
- Modify: `packages/api-server/src/routes/ebay.ts` Purpose: stop parsing/forwarding
`cookies` query params.
- Create: `packages/api-server/test/routes.test.ts` Purpose: verify Facebook/eBay routes
ignore cookie query params and still call core correctly.
- Modify: `packages/mcp-server/src/protocol/tools.ts` Purpose: remove Facebook/eBay
cookie tool inputs and descriptions.
- Modify: `packages/mcp-server/src/protocol/handler.ts` Purpose: stop mapping removed
cookie tool inputs into API URLs.
- Create: `packages/mcp-server/test/protocol.test.ts` Purpose: verify tool schemas and
handler URL building no longer include Facebook/eBay cookie fields.
- Modify: `cookies/AGENTS.md` Purpose: document env vars as the only supported cookie
input.
### Task 1: Lock core cookie utilities to env-only loading
**Files:**
- Modify: `packages/core/src/utils/cookies.ts:19-227`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing test**
Add or replace the auth-source test block in `packages/core/test/facebook-core.test.ts` with env-only expectations:
Add or replace the auth-source test block in `packages/core/test/facebook-core.test.ts`
with env-only expectations:
```ts
test("should load Facebook cookies from FACEBOOK_COOKIE env var", async () => {
@@ -85,12 +96,14 @@ test("should reject missing Facebook auth env var", async () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts`
Expected: FAIL because the current implementation still allows missing env values to fall through to file/request-based behavior and does not emit the new env-only error.
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: FAIL because the
current implementation still allows missing env values to fall through to
file/request-based behavior and does not emit the new env-only error.
- [ ] **Step 3: Write minimal implementation**
Replace the multi-source loader in `packages/core/src/utils/cookies.ts` with an env-only loader. The target shape is:
Replace the multi-source loader in `packages/core/src/utils/cookies.ts` with an env-only
loader. The target shape is:
```ts
export interface CookieConfig {
@@ -129,8 +142,8 @@ Delete the now-dead helpers and types that exist only for JSON/file/request load
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts`
Expected: PASS for the new env-only tests.
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS for the new
env-only tests.
- [ ] **Step 5: Commit**
@@ -142,10 +155,15 @@ git commit -m "refactor: make cookie loading env-only"
### Task 2: Tighten Facebook core APIs to the new contract
**Files:**
- Modify: `packages/core/src/scrapers/facebook.ts:23-29`
- Modify: `packages/core/src/scrapers/facebook.ts:214-228`
- Modify: `packages/core/src/scrapers/facebook.ts:823-929`
- Modify: `packages/core/src/index.ts:5-15`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing test**
@@ -171,8 +189,9 @@ test("should fail Facebook item fetch when FACEBOOK_COOKIE is unset", async () =
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts`
Expected: FAIL because the current function signatures and error text still mention parameter/file-based auth paths.
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: FAIL because the
current function signatures and error text still mention parameter/file-based auth
paths.
- [ ] **Step 3: Write minimal implementation**
@@ -206,12 +225,14 @@ console.warn(
);
```
Remove the extra cookie arguments from `fetchFacebookItem(...)` and keep `packages/core/src/index.ts` exporting the tightened functions without the old parameter contract.
Remove the extra cookie arguments from `fetchFacebookItem(...)` and keep
`packages/core/src/index.ts` exporting the tightened functions without the old parameter
contract.
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts`
Expected: PASS with the new env-only Facebook API surface.
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS with the new
env-only Facebook API surface.
- [ ] **Step 5: Commit**
@@ -223,8 +244,11 @@ git commit -m "refactor: remove facebook cookie overrides"
### Task 3: Tighten eBay core APIs to env-only auth
**Files:**
- Modify: `packages/core/src/scrapers/ebay.ts:9-15`
- Modify: `packages/core/src/scrapers/ebay.ts:337-389`
- Create: `packages/core/test/ebay-core.test.ts`
- [ ] **Step 1: Write the failing test**
@@ -249,8 +273,8 @@ test("should warn and continue without eBay cookies when EBAY_COOKIE is unset",
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/ebay-core.test.ts`
Expected: FAIL because `loadEbayCookies` still accepts request overrides and mentions file/json sources.
Run: `bun test packages/core/test/ebay-core.test.ts` Expected: FAIL because
`loadEbayCookies` still accepts request overrides and mentions file/json sources.
- [ ] **Step 3: Write minimal implementation**
@@ -276,12 +300,13 @@ async function loadEbayCookies(): Promise<string | undefined> {
}
```
Then remove `cookies` from `fetchEbayItems(..., opts)` and the destructuring that feeds it into `loadEbayCookies()`.
Then remove `cookies` from `fetchEbayItems(..., opts)` and the destructuring that feeds
it into `loadEbayCookies()`.
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/ebay-core.test.ts`
Expected: PASS for the eBay env-only regression coverage.
Run: `bun test packages/core/test/ebay-core.test.ts` Expected: PASS for the eBay
env-only regression coverage.
- [ ] **Step 5: Commit**
@@ -293,13 +318,17 @@ git commit -m "refactor: make ebay auth env-only"
### Task 4: Remove cookie query parameters from the API adapter
**Files:**
- Modify: `packages/api-server/src/routes/facebook.ts:3-33`
- Modify: `packages/api-server/src/routes/ebay.ts:3-52`
- Create: `packages/api-server/test/routes.test.ts`
- [ ] **Step 1: Write the failing test**
Create `packages/api-server/test/routes.test.ts` and mock `@marketplace-scrapers/core` so the route contract is explicit:
Create `packages/api-server/test/routes.test.ts` and mock `@marketplace-scrapers/core`
so the route contract is explicit:
```ts
import { afterEach, describe, expect, mock, test } from "bun:test";
@@ -347,8 +376,9 @@ test("ebayRoute ignores cookies query parameter", async () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/api-server/test/routes.test.ts`
Expected: FAIL because the current routes still parse `reqUrl.searchParams.get("cookies")` and forward it downstream.
Run: `bun test packages/api-server/test/routes.test.ts` Expected: FAIL because the
current routes still parse `reqUrl.searchParams.get("cookies")` and forward it
downstream.
- [ ] **Step 3: Write minimal implementation**
@@ -383,8 +413,8 @@ const items = await fetchEbayItems(SEARCH_QUERY, 1, {
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/api-server/test/routes.test.ts`
Expected: PASS for route coverage and no remaining adapter references to `cookies` for Facebook/eBay.
Run: `bun test packages/api-server/test/routes.test.ts` Expected: PASS for route
coverage and no remaining adapter references to `cookies` for Facebook/eBay.
- [ ] **Step 5: Commit**
@@ -396,13 +426,17 @@ git commit -m "refactor: remove api cookie query overrides"
### Task 5: Remove cookie inputs from MCP tool schemas and request mapping
**Files:**
- Modify: `packages/mcp-server/src/protocol/tools.ts:65-148`
- Modify: `packages/mcp-server/src/protocol/handler.ts:154-211`
- Create: `packages/mcp-server/test/protocol.test.ts`
- [ ] **Step 1: Write the failing test**
Create `packages/mcp-server/test/protocol.test.ts` with schema and URL-building assertions:
Create `packages/mcp-server/test/protocol.test.ts` with schema and URL-building
assertions:
```ts
import { expect, mock, test } from "bun:test";
@@ -445,8 +479,8 @@ expect(calledUrl).not.toContain("cookies=");
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/mcp-server/test/protocol.test.ts`
Expected: FAIL because the current MCP schema and handler still expose and forward those inputs.
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: FAIL because the
current MCP schema and handler still expose and forward those inputs.
- [ ] **Step 3: Write minimal implementation**
@@ -465,12 +499,13 @@ Delete the Facebook/eBay cookie tool properties and handler mapping:
// if (args.cookies) params.append("cookies", args.cookies);
```
Leave Kijiji alone; this plan only changes Facebook/eBay env-only auth paths defined by the approved spec.
Leave Kijiji alone; this plan only changes Facebook/eBay env-only auth paths defined by
the approved spec.
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/mcp-server/test/protocol.test.ts`
Expected: PASS with MCP definitions and handler mapping in sync.
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: PASS with MCP
definitions and handler mapping in sync.
- [ ] **Step 5: Commit**
@@ -482,12 +517,16 @@ git commit -m "refactor: remove mcp cookie parameters"
### Task 6: Rewrite cookie documentation and run full verification
**Files:**
- Modify: `cookies/AGENTS.md:9-85`
- Modify: `docs/superpowers/specs/2026-04-21-cookie-env-only-design.md` only if implementation reveals a spec mismatch
- Modify: `docs/superpowers/specs/2026-04-21-cookie-env-only-design.md` only if
implementation reveals a spec mismatch
- [ ] **Step 1: Write the failing test**
Treat docs drift as a contract failure. Capture the required state before editing:
Treat docs drift as a contract failure.
Capture the required state before editing:
```md
- Cookie setup docs mention env vars only for Facebook and eBay
@@ -497,14 +536,14 @@ Treat docs drift as a contract failure. Capture the required state before editin
- [ ] **Step 2: Run verification to prove current docs are stale**
Run: `rg -n "facebook\.json|ebay\.json|cookies=" cookies/AGENTS.md`
Expected: matches found
Run: `rg -n "facebook\.json|ebay\.json|cookies=" cookies/AGENTS.md` Expected: matches
found
- [ ] **Step 3: Write minimal implementation**
Rewrite the cookie setup doc so Facebook and eBay each show only env-var setup:
```md
````md
## Cookie Configuration
All supported authenticated scrapers read cookies only from environment variables.
@@ -513,14 +552,14 @@ All supported authenticated scrapers read cookies only from environment variable
```bash
export FACEBOOK_COOKIE='c_user=123; xs=token; fr=request'
```
````
### eBay
```bash
export EBAY_COOKIE='s=VALUE; ds2=VALUE; ebay=VALUE'
```
```
````
Remove the file-based and request-parameter sections entirely.
@@ -534,10 +573,14 @@ Expected: all commands pass
```bash
git add cookies/AGENTS.md docs/superpowers/specs/2026-04-21-cookie-env-only-design.md
git commit -m "docs: align cookie setup with env-only auth"
```
````
## Self-Review
- Spec coverage check: shared cookie utils, Facebook, eBay, API adapter, MCP adapter, tests, and docs each have explicit tasks.
- Placeholder scan: concrete test files are now named for eBay core, API routes, and MCP protocol coverage.
- Type consistency check: `ensureCookies(config)` is the single shared loader name used across Tasks 1-3, and Facebook/eBay route signatures stay aligned with the core changes.
- Spec coverage check: shared cookie utils, Facebook, eBay, API adapter, MCP adapter,
tests, and docs each have explicit tasks.
- Placeholder scan: concrete test files are now named for eBay core, API routes, and MCP
protocol coverage.
- Type consistency check: `ensureCookies(config)` is the single shared loader name used
across Tasks 1-3, and Facebook/eBay route signatures stay aligned with the core
changes.

View File

@@ -1,34 +1,49 @@
# Facebook Comet Rewrite Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
> to implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid Comet-bootstrap parser for both search and item routes.
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid
Comet-bootstrap parser for both search and item routes.
**Architecture:** Keep authenticated direct HTTP fetches as the transport. Classify each Facebook response first, then parse route-specific Comet bootstrap/state candidates, and fall back to rendered-HTML extraction only when bootstrap decoding cannot produce the expected search or item shape.
**Architecture:** Keep authenticated direct HTTP fetches as the transport.
Classify each Facebook response first, then parse route-specific Comet bootstrap/state
candidates, and fall back to rendered-HTML extraction only when bootstrap decoding
cannot produce the expected search or item shape.
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http helpers
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http
helpers
---
* * *
## File Structure
- Modify: `packages/core/src/scrapers/facebook.ts`
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction, search parsing, item parsing, and HTML fallbacks.
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction,
search parsing, item parsing, and HTML fallbacks.
- Modify: `packages/core/test/facebook-core.test.ts`
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing, and route-aware item/search extraction behavior.
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing,
and route-aware item/search extraction behavior.
- Modify: `packages/core/test/facebook-integration.test.ts`
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping for search/item entrypoints.
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping
for search/item entrypoints.
### Task 1: Add Route Classification Coverage
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
Add these tests near the Facebook parser tests in `packages/core/test/facebook-core.test.ts`:
Add these tests near the Facebook parser tests in
`packages/core/test/facebook-core.test.ts`:
```ts
test("classifies Comet search responses", () => {
@@ -89,12 +104,14 @@ test("classifies unavailable item responses", () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Expected: FAIL because `classifyFacebookResponse` does not exist yet.
- [ ] **Step 3: Write minimal implementation**
Add this type and function near the parsing section in `packages/core/src/scrapers/facebook.ts`:
Add this type and function near the parsing section in
`packages/core/src/scrapers/facebook.ts`:
```ts
type FacebookResponseKind = "search" | "item" | "auth_gated" | "unavailable" | "unknown";
@@ -128,7 +145,8 @@ export function classifyFacebookResponse(htmlString: HTMLString, responseUrl: st
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Expected: PASS
- [ ] **Step 5: Commit**
@@ -141,8 +159,11 @@ git commit -m "refactor: add facebook response classification"
### Task 2: Add Bootstrap Candidate Extraction
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -185,7 +206,8 @@ test("keeps candidate order stable for later scoring", () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Expected: FAIL because `extractFacebookBootstrapCandidates` does not exist.
- [ ] **Step 3: Write minimal implementation**
@@ -218,7 +240,8 @@ export function extractFacebookBootstrapCandidates(htmlString: HTMLString): Reco
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Expected: PASS
- [ ] **Step 5: Commit**
@@ -231,10 +254,15 @@ git commit -m "refactor: add facebook bootstrap candidate extraction"
### Task 3: Replace Search Parsing With Candidate Scoring
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/test/facebook-integration.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- Test: `packages/core/test/facebook-integration.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -323,12 +351,15 @@ const mockSearchHtml = `
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
Expected: FAIL because the current search extractor only understands legacy `marketplace_search` shapes.
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
Expected: FAIL because the current search extractor only understands legacy
`marketplace_search` shapes.
- [ ] **Step 3: Write minimal implementation**
Replace the search extraction internals in `extractFacebookMarketplaceData()` with candidate scoring like this:
Replace the search extraction internals in `extractFacebookMarketplaceData()` with
candidate scoring like this:
```ts
function findSearchEdges(candidate: unknown): FacebookEdge[] | null {
@@ -383,7 +414,8 @@ export function extractFacebookMarketplaceData(htmlString: HTMLString): Facebook
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Run:
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Expected: PASS for the rewritten search fixtures and existing unaffected tests.
- [ ] **Step 5: Commit**
@@ -396,8 +428,11 @@ git commit -m "refactor: rewrite facebook search parser for comet bootstrap"
### Task 4: Replace Item Parsing With Candidate Scoring
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -438,7 +473,8 @@ test("extracts item details from Comet permalink bootstrap candidates", () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
Expected: FAIL because the current item extractor depends on legacy permalink markers.
- [ ] **Step 3: Write minimal implementation**
@@ -491,8 +527,8 @@ export function extractFacebookItemData(htmlString: HTMLString): FacebookMarketp
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts`
Expected: PASS for current-shape item tests and remaining parser tests.
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS for
current-shape item tests and remaining parser tests.
- [ ] **Step 5: Commit**
@@ -504,8 +540,11 @@ git commit -m "refactor: rewrite facebook item parser for comet bootstrap"
### Task 5: Add HTML Fallback Extraction
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -549,8 +588,10 @@ test("falls back to rendered item HTML when bootstrap payloads are undecodable",
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Expected: FAIL because the extractor currently returns `null` without a structured candidate.
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Expected: FAIL because the extractor currently returns `null` without a structured
candidate.
- [ ] **Step 3: Write minimal implementation**
@@ -607,11 +648,13 @@ function extractItemFallback(htmlString: HTMLString): FacebookMarketplaceItem |
}
```
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()` and `extractFacebookItemData()`.
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()`
and `extractFacebookItemData()`.
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Expected: PASS
- [ ] **Step 5: Commit**
@@ -624,8 +667,11 @@ git commit -m "refactor: add facebook html fallbacks"
### Task 6: Wire Route-Aware Failures Into Entry Points
**Files:**
- Modify: `packages/core/test/facebook-integration.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-integration.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -664,8 +710,10 @@ test("returns null for unavailable item responses", async () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by route/auth state.
Run:
`bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by
route/auth state.
- [ ] **Step 3: Write minimal implementation**
@@ -690,12 +738,13 @@ if (itemResponseClass.kind === "unavailable") {
}
```
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to return both HTML and final URL; otherwise start by threading final URL support through the fetch helper in the same task.
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to
return both HTML and final URL; otherwise start by threading final URL support through
the fetch helper in the same task.
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-integration.test.ts`
Expected: PASS
Run: `bun test packages/core/test/facebook-integration.test.ts` Expected: PASS
- [ ] **Step 5: Commit**
@@ -707,19 +756,22 @@ git commit -m "refactor: handle facebook route-aware failure states"
### Task 7: Run Full Verification And Live Probe
**Files:**
- Modify: `packages/core/src/scrapers/facebook.ts` if small cleanup is required
- Modify: `packages/core/test/facebook-core.test.ts` if small cleanup is required
- Modify: `packages/core/test/facebook-integration.test.ts` if small cleanup is required
- [ ] **Step 1: Run focused Facebook tests**
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Run:
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Expected: PASS
- [ ] **Step 2: Run broader core tests**
Run: `bun test packages/core/test`
Expected: PASS
Run: `bun test packages/core/test` Expected: PASS
- [ ] **Step 3: Run live authenticated Facebook probe**
@@ -742,11 +794,14 @@ if (results[0]?.url) {
Expected:
- search returns at least one result
- item fetch returns non-null for the first live result when the route is not stale/unavailable
- item fetch returns non-null for the first live result when the route is not
stale/unavailable
- [ ] **Step 4: Make any minimal cleanup needed to keep tests and live probe green**
If cleanup is needed, keep it limited to naming, dead-code removal caused by the rewrite, or small parser corrections directly exposed by the verification commands.
If cleanup is needed, keep it limited to naming, dead-code removal caused by the
rewrite, or small parser corrections directly exposed by the verification commands.
- [ ] **Step 5: Re-run verification**
@@ -767,6 +822,11 @@ git commit -m "refactor: complete facebook comet scraper rewrite"
## Self-Review
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and live verification.
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps remain.
- Type consistency: all planned functions and types use the same names across tasks: `classifyFacebookResponse`, `extractFacebookBootstrapCandidates`, `extractFacebookMarketplaceData`, and `extractFacebookItemData`.
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware
item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and
live verification.
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps
remain.
- Type consistency: all planned functions and types use the same names across tasks:
`classifyFacebookResponse`, `extractFacebookBootstrapCandidates`,
`extractFacebookMarketplaceData`, and `extractFacebookItemData`.

View File

@@ -1,63 +1,75 @@
# Unstable Listing Mode Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
> to implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add an optional shared mode across Facebook, eBay, and Kijiji that moves listings priced below 80% of the median into `unstableResults`, while preserving current default response shapes.
**Goal:** Add an optional shared mode across Facebook, eBay, and Kijiji that moves
listings priced below 80% of the median into `unstableResults`, while preserving current
default response shapes.
**Architecture:** Introduce a shared generic classifier in `packages/core` that splits any listing array into `results` and `unstableResults` using the same median-based rule. Then thread one opt-in flag through the scraper entrypoints, API routes, and MCP tool definitions so all surfaces expose the same behavior without changing existing defaults.
**Architecture:** Introduce a shared generic classifier in `packages/core` that splits
any listing array into `results` and `unstableResults` using the same median-based rule.
Then thread one opt-in flag through the scraper entrypoints, API routes, and MCP tool
definitions so all surfaces expose the same behavior without changing existing defaults.
**Tech Stack:** Bun, TypeScript, Bun test, workspace packages, JSON-RPC MCP server
---
* * *
## File Map
- Create: `packages/core/src/utils/unstable.ts`
Purpose: shared generic median/cutoff classifier for listing arrays.
- Modify: `packages/core/src/types/common.ts`
Purpose: add shared mode types used by scrapers and adapters.
- Modify: `packages/core/src/index.ts`
Purpose: export the new shared classifier/types.
- Modify: `packages/core/src/scrapers/facebook.ts`
Purpose: add the optional mode flag and return bucketed results when enabled.
- Modify: `packages/core/src/scrapers/ebay.ts`
Purpose: add the optional mode flag and return bucketed results when enabled.
- Modify: `packages/core/src/scrapers/kijiji.ts`
Purpose: add the optional mode flag and return bucketed results when enabled.
- Create: `packages/core/test/unstable-listing-mode.test.ts`
Purpose: lock the shared classifier behavior with direct unit tests.
- Modify: `packages/core/test/facebook-core.test.ts`
Purpose: prove Facebook preserves default arrays and returns buckets when enabled.
- Modify: `packages/core/test/ebay-core.test.ts`
Purpose: prove eBay preserves default arrays and returns buckets when enabled.
- Modify: `packages/core/test/kijiji-core.test.ts`
Purpose: prove Kijiji preserves default arrays and returns buckets when enabled.
- Modify: `packages/api-server/src/routes/facebook.ts`
Purpose: expose a shared opt-in query parameter and preserve default response shape.
- Modify: `packages/api-server/src/routes/ebay.ts`
Purpose: expose the same query parameter and preserve default response shape.
- Modify: `packages/api-server/src/routes/kijiji.ts`
Purpose: expose the same query parameter and preserve default response shape.
- Modify: `packages/api-server/test/routes.test.ts`
Purpose: verify route forwarding and route response-shape switching.
- Modify: `packages/mcp-server/src/protocol/tools.ts`
Purpose: document the optional unstable mode in all search tools.
- Modify: `packages/mcp-server/src/protocol/handler.ts`
Purpose: forward the optional mode to API routes for all search tools.
- Modify: `packages/mcp-server/test/protocol.test.ts`
Purpose: verify MCP tool metadata and forwarded URLs include the new option.
- Create: `packages/core/src/utils/unstable.ts` Purpose: shared generic median/cutoff
classifier for listing arrays.
- Modify: `packages/core/src/types/common.ts` Purpose: add shared mode types used by
scrapers and adapters.
- Modify: `packages/core/src/index.ts` Purpose: export the new shared classifier/types.
- Modify: `packages/core/src/scrapers/facebook.ts` Purpose: add the optional mode flag
and return bucketed results when enabled.
- Modify: `packages/core/src/scrapers/ebay.ts` Purpose: add the optional mode flag and
return bucketed results when enabled.
- Modify: `packages/core/src/scrapers/kijiji.ts` Purpose: add the optional mode flag and
return bucketed results when enabled.
- Create: `packages/core/test/unstable-listing-mode.test.ts` Purpose: lock the shared
classifier behavior with direct unit tests.
- Modify: `packages/core/test/facebook-core.test.ts` Purpose: prove Facebook preserves
default arrays and returns buckets when enabled.
- Modify: `packages/core/test/ebay-core.test.ts` Purpose: prove eBay preserves default
arrays and returns buckets when enabled.
- Modify: `packages/core/test/kijiji-core.test.ts` Purpose: prove Kijiji preserves
default arrays and returns buckets when enabled.
- Modify: `packages/api-server/src/routes/facebook.ts` Purpose: expose a shared opt-in
query parameter and preserve default response shape.
- Modify: `packages/api-server/src/routes/ebay.ts` Purpose: expose the same query
parameter and preserve default response shape.
- Modify: `packages/api-server/src/routes/kijiji.ts` Purpose: expose the same query
parameter and preserve default response shape.
- Modify: `packages/api-server/test/routes.test.ts` Purpose: verify route forwarding and
route response-shape switching.
- Modify: `packages/mcp-server/src/protocol/tools.ts` Purpose: document the optional
unstable mode in all search tools.
- Modify: `packages/mcp-server/src/protocol/handler.ts` Purpose: forward the optional
mode to API routes for all search tools.
- Modify: `packages/mcp-server/test/protocol.test.ts` Purpose: verify MCP tool metadata
and forwarded URLs include the new option.
### Task 1: Add the shared unstable-listing classifier
**Files:**
- Create: `packages/core/src/utils/unstable.ts`
- Modify: `packages/core/src/types/common.ts`
- Modify: `packages/core/src/index.ts`
- Test: `packages/core/test/unstable-listing-mode.test.ts`
- [ ] **Step 1: Write the failing test**
Create `packages/core/test/unstable-listing-mode.test.ts` with focused shared-behavior coverage:
Create `packages/core/test/unstable-listing-mode.test.ts` with focused shared-behavior
coverage:
```ts
import { describe, expect, test } from "bun:test";
@@ -127,8 +139,8 @@ describe("classifyUnstableListings", () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/unstable-listing-mode.test.ts`
Expected: FAIL because `classifyUnstableListings` and the shared mode types do not exist yet.
Run: `bun test packages/core/test/unstable-listing-mode.test.ts` Expected: FAIL because
`classifyUnstableListings` and the shared mode types do not exist yet.
- [ ] **Step 3: Write minimal implementation**
@@ -202,8 +214,8 @@ export { classifyUnstableListings } from "./utils/unstable";
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/unstable-listing-mode.test.ts`
Expected: PASS with 4 passing tests.
Run: `bun test packages/core/test/unstable-listing-mode.test.ts` Expected: PASS with 4
passing tests.
- [ ] **Step 5: Commit**
@@ -215,16 +227,24 @@ git commit -m "feat: add shared unstable listing classifier"
### Task 2: Thread the optional mode through all core scrapers
**Files:**
- Modify: `packages/core/src/scrapers/facebook.ts`
- Modify: `packages/core/src/scrapers/ebay.ts`
- Modify: `packages/core/src/scrapers/kijiji.ts`
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/test/ebay-core.test.ts`
- Modify: `packages/core/test/kijiji-core.test.ts`
- [ ] **Step 1: Write the failing tests**
Add one focused opt-in test per scraper. Use the new shared classifier through the public scraper entrypoints instead of testing internal helpers.
Add one focused opt-in test per scraper.
Use the new shared classifier through the public scraper entrypoints instead of testing
internal helpers.
In `packages/core/test/facebook-core.test.ts`, add:
@@ -286,7 +306,8 @@ test("fetchKijijiItems returns stable and unstable buckets when unstable mode is
});
```
Also add one default-mode assertion in one existing scraper test file, for example in `packages/core/test/facebook-core.test.ts`:
Also add one default-mode assertion in one existing scraper test file, for example in
`packages/core/test/facebook-core.test.ts`:
```ts
test("fetchFacebookItems keeps returning an array by default", async () => {
@@ -307,8 +328,10 @@ test("fetchFacebookItems keeps returning an array by default", async () => {
- [ ] **Step 2: Run tests to verify they fail**
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
Expected: FAIL because the scraper signatures do not yet accept the new option and still always return arrays.
Run:
`bun test packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
Expected: FAIL because the scraper signatures do not yet accept the new option and still
always return arrays.
- [ ] **Step 3: Write minimal implementation**
@@ -322,7 +345,8 @@ import {
} from "../index";
```
In `packages/core/src/scrapers/facebook.ts`, extend the default export signature and branch at the end:
In `packages/core/src/scrapers/facebook.ts`, extend the default export signature and
branch at the end:
```ts
export default async function fetchFacebookItems(
@@ -371,7 +395,8 @@ export default async function fetchEbayItems(
}
```
In `packages/core/src/scrapers/kijiji.ts`, add the same final argument after `listingOptions`:
In `packages/core/src/scrapers/kijiji.ts`, add the same final argument after
`listingOptions`:
```ts
export default async function fetchKijijiItems(
@@ -392,12 +417,15 @@ export default async function fetchKijijiItems(
}
```
Keep the default branch untouched in all three files so existing callers still receive arrays.
Keep the default branch untouched in all three files so existing callers still receive
arrays.
- [ ] **Step 4: Run tests to verify they pass**
Run: `bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
Expected: PASS, including the new opt-in bucket assertions and the default-array regression assertion.
Run:
`bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
Expected: PASS, including the new opt-in bucket assertions and the default-array
regression assertion.
- [ ] **Step 5: Commit**
@@ -409,14 +437,19 @@ git commit -m "feat: add unstable mode to scraper results"
### Task 3: Expose unstable mode in API routes
**Files:**
- Modify: `packages/api-server/src/routes/facebook.ts`
- Modify: `packages/api-server/src/routes/ebay.ts`
- Modify: `packages/api-server/src/routes/kijiji.ts`
- Modify: `packages/api-server/test/routes.test.ts`
- [ ] **Step 1: Write the failing tests**
Extend `packages/api-server/test/routes.test.ts` with route-forwarding coverage for the new query parameter:
Extend `packages/api-server/test/routes.test.ts` with route-forwarding coverage for the
new query parameter:
```ts
test("facebookRoute forwards unstableFilter=true to core", async () => {
@@ -480,8 +513,8 @@ test("kijijiRoute forwards unstableFilter=true to core", async () => {
- [ ] **Step 2: Run tests to verify they fail**
Run: `bun test packages/api-server/test/routes.test.ts`
Expected: FAIL because the routes do not yet parse or forward `unstableFilter`.
Run: `bun test packages/api-server/test/routes.test.ts` Expected: FAIL because the
routes do not yet parse or forward `unstableFilter`.
- [ ] **Step 3: Write minimal implementation**
@@ -533,12 +566,14 @@ const items = await fetchKijijiItems(
);
```
Do not add any response wrapper logic in the routes; simply return whatever the core scraper returns so the default array path remains unchanged.
Do not add any response wrapper logic in the routes; simply return whatever the core
scraper returns so the default array path remains unchanged.
- [ ] **Step 4: Run tests to verify they pass**
Run: `bun test packages/api-server/test/routes.test.ts`
Expected: PASS, including existing cookie-parameter regression tests and the new unstable-mode forwarding assertions.
Run: `bun test packages/api-server/test/routes.test.ts` Expected: PASS, including
existing cookie-parameter regression tests and the new unstable-mode forwarding
assertions.
- [ ] **Step 5: Commit**
@@ -550,13 +585,17 @@ git commit -m "feat: expose unstable mode in api routes"
### Task 4: Document and forward unstable mode in MCP tools
**Files:**
- Modify: `packages/mcp-server/src/protocol/tools.ts`
- Modify: `packages/mcp-server/src/protocol/handler.ts`
- Modify: `packages/mcp-server/test/protocol.test.ts`
- [ ] **Step 1: Write the failing tests**
Extend `packages/mcp-server/test/protocol.test.ts` with metadata and forwarding coverage:
Extend `packages/mcp-server/test/protocol.test.ts` with metadata and forwarding
coverage:
```ts
test("search tools document unstable listing mode", () => {
@@ -601,12 +640,14 @@ Mirror the forwarding assertion for `search_kijiji` and `search_ebay` in the sam
- [ ] **Step 2: Run tests to verify they fail**
Run: `bun test packages/mcp-server/test/protocol.test.ts`
Expected: FAIL because the tools do not yet describe `unstableFilter` and the handler does not append it to API URLs.
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: FAIL because the
tools do not yet describe `unstableFilter` and the handler does not append it to API
URLs.
- [ ] **Step 3: Write minimal implementation**
In `packages/mcp-server/src/protocol/tools.ts`, add the same optional property to all three tools:
In `packages/mcp-server/src/protocol/tools.ts`, add the same optional property to all
three tools:
```ts
unstableFilter: {
@@ -617,7 +658,8 @@ unstableFilter: {
},
```
In `packages/mcp-server/src/protocol/handler.ts`, append the shared flag in each search branch:
In `packages/mcp-server/src/protocol/handler.ts`, append the shared flag in each search
branch:
```ts
if (args.unstableFilter !== undefined) {
@@ -629,8 +671,8 @@ Add that snippet to the `search_kijiji`, `search_facebook`, and `search_ebay` br
- [ ] **Step 4: Run tests to verify they pass**
Run: `bun test packages/mcp-server/test/protocol.test.ts`
Expected: PASS, including the new tool-schema assertions and URL-forwarding assertions.
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: PASS, including the
new tool-schema assertions and URL-forwarding assertions.
- [ ] **Step 5: Commit**
@@ -642,21 +684,23 @@ git commit -m "docs: expose unstable mode in mcp tools"
### Task 5: Verify the full cross-package feature end to end
**Files:**
- No code changes expected.
- [ ] **Step 1: Run the focused package tests**
Run: `bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts packages/api-server/test/routes.test.ts packages/mcp-server/test/protocol.test.ts`
Run:
`bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts packages/api-server/test/routes.test.ts packages/mcp-server/test/protocol.test.ts`
Expected: PASS with zero failing tests.
- [ ] **Step 2: Run the broader workspace verification**
Run: `bun run ci`
Expected: PASS with clean workspace validation.
Run: `bun run ci` Expected: PASS with clean workspace validation.
- [ ] **Step 3: Commit verification-only follow-ups if needed**
If verification forced any tiny fixes, commit them immediately after the fix with a focused message, for example:
If verification forced any tiny fixes, commit them immediately after the fix with a
focused message, for example:
```bash
git add <exact files changed>
@@ -667,6 +711,8 @@ If no files changed during verification, skip this commit step.
## Self-Review
- Spec coverage: shared classifier, all three scrapers, API exposure, MCP documentation, and tests are each mapped to a task.
- Placeholder scan: no `TODO`, `TBD`, or "write tests later" placeholders remain.
- Type consistency: the plan uses one shared flag name, `unstableFilter`, and one shared core option, `hideUnstableResults`, across all tasks.
- Spec coverage: shared classifier, all three scrapers, API exposure, MCP documentation,
and tests are each mapped to a task.
- Placeholder scan: no `TODO`, `TBD`, or “write tests later” placeholders remain.
- Type consistency: the plan uses one shared flag name, `unstableFilter`, and one shared
core option, `hideUnstableResults`, across all tasks.

View File

@@ -1,14 +1,22 @@
# Code Smell Cleanup Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
> to implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Fix concrete code smells found in repo review without changing marketplace behavior or relaxing lint/type rules.
**Goal:** Fix concrete code smells found in repo review without changing marketplace
behavior or relaxing lint/type rules.
**Architecture:** Start with correctness bugs at transport boundaries, then remove secret-leaking query/log paths, then reduce duplicate parsing and HTTP code. Keep marketplace behavior inside `packages/core`, API routes thin, and MCP as JSON-RPC transport only.
**Architecture:** Start with correctness bugs at transport boundaries, then remove
secret-leaking query/log paths, then reduce duplicate parsing and HTTP code.
Keep marketplace behavior inside `packages/core`, API routes thin, and MCP as JSON-RPC
transport only.
**Tech Stack:** Bun `1.3.13`, TypeScript strict mode, `bun:test`, Biome, framework-free `Bun.serve` adapters.
**Tech Stack:** Bun `1.3.13`, TypeScript strict mode, `bun:test`, Biome, framework-free
`Bun.serve` adapters.
---
* * *
## File Structure
@@ -18,7 +26,8 @@
- Extract shared API call/query-param helpers.
- Stop logging full URLs with cookie-bearing params.
- Modify: `packages/mcp-server/src/protocol/tools.ts`
- Remove `cookies` from Kijiji MCP schema or mark it as unsupported after API route no longer accepts it.
- Remove `cookies` from Kijiji MCP schema or mark it as unsupported after API route no
longer accepts it.
- Modify: `packages/mcp-server/test/protocol.test.ts`
- Add coverage for `id: 0`.
- Add coverage for zero-valued numeric args.
@@ -53,12 +62,15 @@
- Replace `console.error` with repo logger.
- Modify: `packages/core/test/setup.ts`
- Remove redundant comments and make fetch-mock policy explicit.
- Test: existing package tests under `packages/core/test`, `packages/api-server/test`, `packages/mcp-server/test`.
- Test: existing package tests under `packages/core/test`, `packages/api-server/test`,
`packages/mcp-server/test`.
## Task 1: Fix MCP JSON-RPC `id: 0` Handling
**Files:**
- Modify: `packages/mcp-server/src/protocol/handler.ts:61-74`
- Test: `packages/mcp-server/test/protocol.test.ts`
- [ ] **Step 1: Write failing test for `id: 0`**
@@ -137,7 +149,9 @@ git commit -m "fix: preserve zero json-rpc ids"
## Task 2: Preserve Zero Numeric MCP Arguments
**Files:**
- Modify: `packages/mcp-server/src/protocol/handler.ts:107-216`
- Test: `packages/mcp-server/test/protocol.test.ts`
- [ ] **Step 1: Write failing tests for zero-valued params**
@@ -288,10 +302,15 @@ git commit -m "fix: forward zero-valued mcp params"
## Task 3: Remove Cookie Query Path From MCP and API
**Files:**
- Modify: `packages/mcp-server/src/protocol/tools.ts:55-59`
- Modify: `packages/mcp-server/src/protocol/handler.ts:119`
- Modify: `packages/api-server/src/routes/kijiji.ts:65`
- Test: `packages/mcp-server/test/protocol.test.ts`
- Test: `packages/api-server/test/routes.test.ts`
- [ ] **Step 1: Update MCP tests for no cookie exposure**
@@ -341,7 +360,8 @@ test("search_kijiji should not forward cookies query parameters", async () => {
- [ ] **Step 2: Update API test expectation**
In `packages/api-server/test/routes.test.ts`, replace `kijijiRoute passes cookies query parameter` test with:
In `packages/api-server/test/routes.test.ts`, replace
`kijijiRoute passes cookies query parameter` test with:
```ts
test("kijijiRoute ignores cookies query parameter", async () => {
@@ -374,13 +394,15 @@ test("kijijiRoute ignores cookies query parameter", async () => {
- [ ] **Step 3: Run tests to verify failure**
Run: `bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
Run:
`bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
Expected: FAIL because Kijiji cookie query is still exposed/forwarded.
- [ ] **Step 4: Remove Kijiji cookie schema and forwarding**
Delete `cookies` property from `search_kijiji` in `packages/mcp-server/src/protocol/tools.ts`.
Delete `cookies` property from `search_kijiji` in
`packages/mcp-server/src/protocol/tools.ts`.
Delete this line from `packages/mcp-server/src/protocol/handler.ts`:
@@ -396,7 +418,8 @@ cookies: reqUrl.searchParams.get("cookies") || undefined,
- [ ] **Step 5: Run tests**
Run: `bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
Run:
`bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
Expected: PASS.
@@ -410,10 +433,15 @@ git commit -m "fix: remove cookie query forwarding"
## Task 4: Add Strict API Integer Parsing
**Files:**
- Create: `packages/api-server/src/routes/helpers.ts`
- Modify: `packages/api-server/src/routes/facebook.ts`
- Modify: `packages/api-server/src/routes/ebay.ts`
- Modify: `packages/api-server/src/routes/kijiji.ts`
- Test: `packages/api-server/test/routes.test.ts`
- [ ] **Step 1: Write failing API validation tests**
@@ -560,7 +588,9 @@ git commit -m "fix: strictly parse route integers"
## Task 5: De-Duplicate MCP API Calls
**Files:**
- Modify: `packages/mcp-server/src/protocol/handler.ts`
- Test: `packages/mcp-server/test/protocol.test.ts`
- [ ] **Step 1: Add regression test for successful tool result after helper extraction**
@@ -645,7 +675,8 @@ Use `"facebook"` and `"ebay"` in their branches.
- [ ] **Step 4: Run MCP tests and build**
Run: `bun test packages/mcp-server/test/protocol.test.ts && bun run --cwd packages/mcp-server build`
Run:
`bun test packages/mcp-server/test/protocol.test.ts && bun run --cwd packages/mcp-server build`
Expected: PASS.
@@ -659,11 +690,17 @@ git commit -m "refactor: share mcp api calls"
## Task 6: Consolidate Core HTTP Fetching
**Files:**
- Modify: `packages/core/src/utils/http.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Modify: `packages/core/src/scrapers/ebay.ts`
- Test: `packages/core/test/http.test.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- Test: `packages/core/test/ebay-core.test.ts`
- [ ] **Step 1: Add shared HTTP test for response URL and deterministic jitter**
@@ -695,7 +732,8 @@ test("fetchHtml can return response URL", async () => {
});
```
If current `Response.url` cannot be set in Bun tests, use a mocked object cast to `Response` instead:
If current `Response.url` cannot be set in Bun tests, use a mocked object cast to
`Response` instead:
```ts
global.fetch = mock(() =>
@@ -827,7 +865,8 @@ Update error property reads from `err.status` to `err.statusCode`.
- [ ] **Step 5: Replace eBay direct fetch with shared helper**
In `packages/core/src/scrapers/ebay.ts`, import `fetchHtml` and `HttpError` from `../utils/http`.
In `packages/core/src/scrapers/ebay.ts`, import `fetchHtml` and `HttpError` from
`../utils/http`.
Replace direct `fetch` block with:
@@ -845,7 +884,8 @@ logger.error(`Failed to fetch eBay search (${err.statusCode}): ${err.message}`);
- [ ] **Step 6: Run core tests**
Run: `bun test packages/core/test/http.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts`
Run:
`bun test packages/core/test/http.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts`
Expected: PASS.
@@ -865,15 +905,20 @@ git commit -m "refactor: share scraper http fetching"
## Task 7: Clean Kijiji Dead Code and Logging
**Files:**
- Modify: `packages/core/src/scrapers/kijiji.ts`
- Test: `packages/core/test/kijiji-core.test.ts`
- Test: `packages/core/test/kijiji-integration.test.ts`
- [ ] **Step 1: Verify `_parseListing` has no callers**
Run: `rg "_parseListing|parseListing" packages/core packages/api-server packages/mcp-server`
Run:
`rg "_parseListing|parseListing" packages/core packages/api-server packages/mcp-server`
Expected: only `_parseListing` definition appears. If any caller appears, stop and update this task to preserve behavior.
Expected: only `_parseListing` definition appears.
If any caller appears, stop and update this task to preserve behavior.
- [ ] **Step 2: Delete dead function**
@@ -911,7 +956,8 @@ Replace `console.error(...)` calls with `logger.error(...)` preserving message t
- [ ] **Step 4: Run Kijiji tests**
Run: `bun test packages/core/test/kijiji-core.test.ts packages/core/test/kijiji-integration.test.ts`
Run:
`bun test packages/core/test/kijiji-core.test.ts packages/core/test/kijiji-integration.test.ts`
Expected: PASS.
@@ -925,7 +971,9 @@ git commit -m "refactor: clean kijiji scraper internals"
## Task 8: Clean Test Setup Comments and Enforce Fetch Mocking
**Files:**
- Modify: `packages/core/test/setup.ts`
- Test: core test suite
- [ ] **Step 1: Update setup file**
@@ -942,7 +990,8 @@ global.fetch = (() => {
Run: `bun test packages/core/test`
Expected: PASS. If failures occur, fix individual tests by mocking `global.fetch` in `beforeEach` and restoring in `afterEach`.
Expected: PASS. If failures occur, fix individual tests by mocking `global.fetch` in
`beforeEach` and restoring in `afterEach`.
- [ ] **Step 3: Commit**
@@ -954,6 +1003,7 @@ git commit -m "test: require explicit fetch mocks"
## Task 9: Final Verification
**Files:**
- Verify all touched packages.
- [ ] **Step 1: Run full deterministic tests**
@@ -991,9 +1041,13 @@ git commit -m "chore: finish code smell cleanup"
## Self-Review
- Spec coverage: all reviewed smells are covered: JSON-RPC id bug, zero args, cookie query leak, strict integer parsing, duplicate route/MCP helper code, duplicate HTTP clients, dead Kijiji function, direct timers/logging, stale setup comments.
- Placeholder scan: no TBD/TODO/fill-in placeholders remain. Each task has target files, code snippets, commands, and expected results.
- Type consistency: route helper names, MCP helper names, and shared HTTP option names are used consistently across tasks.
- Spec coverage: all reviewed smells are covered: JSON-RPC id bug, zero args, cookie
query leak, strict integer parsing, duplicate route/MCP helper code, duplicate HTTP
clients, dead Kijiji function, direct timers/logging, stale setup comments.
- Placeholder scan: no TBD/TODO/fill-in placeholders remain.
Each task has target files, code snippets, commands, and expected results.
- Type consistency: route helper names, MCP helper names, and shared HTTP option names
are used consistently across tasks.
## Execution Handoff
@@ -1001,5 +1055,7 @@ Plan complete and saved to `docs/superpowers/plans/2026-04-28-code-smell-cleanup
Two execution options:
1. Subagent-Driven (recommended) - dispatch fresh subagent per task, review between tasks, fast iteration.
2. Inline Execution - execute tasks in this session using executing-plans, batch execution with checkpoints.
1. Subagent-Driven (recommended) - dispatch fresh subagent per task, review between
tasks, fast iteration.
2. Inline Execution - execute tasks in this session using executing-plans, batch
execution with checkpoints.

View File

@@ -0,0 +1,110 @@
# Marketplace Dollar Price Inputs Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to
> implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Make public marketplace price inputs use dollars while preserving core scraper
cent-based filtering.
**Architecture:** API server owns HTTP query parsing and converts dollar amounts to
cents before calling core.
MCP server keeps forwarding numeric dollar values as query params.
Core scraper internals remain unchanged because parsed listing prices already use cents.
This applies to eBay `minPrice`/`maxPrice` and Kijiji `priceMin`/`priceMax`; Facebook
exposes no price filter inputs.
**Tech Stack:** Bun, TypeScript, `bun:test`, MCP JSON-RPC adapter, framework-free Bun
HTTP routes.
* * *
### Task 1: API Dollar Parsing
**Files:**
- Modify: `packages/api-server/src/routes/helpers.ts`
- Modify: `packages/api-server/src/routes/ebay.ts`
- Modify: `packages/api-server/src/routes/kijiji.ts`
- Test: `packages/api-server/test/routes.test.ts`
- [ ] **Step 1: Add failing API route tests**
Add tests proving eBay `minPrice=999.99` / `maxPrice=1000` and Kijiji `priceMin=999.99`
/ `priceMax=1000` are forwarded to core as `99999` and `100000` cents.
Add validation tests for empty, whitespace, negative, hex, mixed text, and malformed
decimal price values.
Run: `bun test packages/api-server/test/routes.test.ts`
Expected: new forwarding tests fail because route currently rejects decimals and
forwards integer dollars unchanged.
- [ ] **Step 2: Implement dollar parser helper**
Add `parseDollarPriceParam(searchParams, name)` in
`packages/api-server/src/routes/helpers.ts`. Accept `0`, `1000`, `999.99`, and `0.99`.
Reject values that do not match `^\d+(?:\.\d{1,2})?$`. Convert to cents with
`Math.round(Number(rawValue) * 100)`.
- [ ] **Step 3: Use dollar parser in eBay route**
Replace `parseNonNegativeIntegerParam` calls for eBay `minPrice`/`maxPrice` and Kijiji
`priceMin`/`priceMax` with `parseDollarPriceParam`. Keep pagination/count params on
integer parsing.
- [ ] **Step 4: Verify API tests**
Run: `bun test packages/api-server/test/routes.test.ts`
Expected: all API route tests pass.
### Task 2: MCP Schema Contract
**Files:**
- Modify: `packages/mcp-server/src/protocol/tools.ts`
- Test: `packages/mcp-server/test/protocol.test.ts`
- [ ] **Step 1: Add MCP schema/forwarding tests**
Add tests that `search_ebay` describes `minPrice` and `maxPrice` as dollar filters and
forwards numeric dollar values unchanged in API query params.
Run: `bun test packages/mcp-server/test/protocol.test.ts`
Expected: description test fails until schema text changes; forwarding behavior should
already pass or reveal mapping gaps.
- [ ] **Step 2: Update tool descriptions**
Change eBay `minPrice` and Kijiji `priceMin` descriptions to `Minimum price in dollars`.
Change eBay `maxPrice` and Kijiji `priceMax` descriptions to `Maximum price in dollars`.
- [ ] **Step 3: Verify MCP tests**
Run: `bun test packages/mcp-server/test/protocol.test.ts`
Expected: all MCP protocol tests pass.
### Task 3: Cross-Package Verification
**Files:**
- No additional edits expected.
- [ ] **Step 1: Run relevant package tests**
Run: `bun test packages/api-server/test packages/mcp-server/test`
Expected: all tests pass.
- [ ] **Step 2: Run CI**
Run: `bun run ci`
Expected: typecheck and Biome pass without changing lint config.

View File

@@ -0,0 +1,187 @@
# Live Parser Tests Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
> to implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add explicit live endpoint test suites for each core marketplace scraper,
excluded from default tests and runnable through one script.
**Architecture:** Live tests live under `packages/core/test/live/` and import public
scraper entry points directly.
Normal package tests remain offline because the new files are outside current explicit
test commands and run only through `bun run test:live`.
**Tech Stack:** Bun `1.3.13`, `bun:test`, TypeScript, existing core scraper APIs.
* * *
## File Structure
- Create `packages/core/test/live/ebay.live.test.ts`: live eBay search smoke test
against `fetchEbayItems`.
- Create `packages/core/test/live/kijiji.live.test.ts`: live Kijiji search smoke test
against `fetchKijijiItems`.
- Create `packages/core/test/live/facebook.live.test.ts`: strict live Facebook search
smoke test against `fetchFacebookItems` and `FACEBOOK_COOKIE`.
- Modify `package.json`: add root script `test:live` running all files under
`packages/core/test/live`.
### Task 1: Add eBay Live Suite
**Files:**
- Create: `packages/core/test/live/ebay.live.test.ts`
- [ ] **Step 1: Write the live test file**
```ts
import { describe, expect, test } from "bun:test";
import fetchEbayItems from "../../src/scrapers/ebay";
describe("eBay live parser", () => {
test("scrapes live search results into listing details", async () => {
const results = await fetchEbayItems("iphone", 1, { maxItems: 3 });
expect(results.length).toBeGreaterThan(0);
for (const listing of results) {
expect(listing.url).toStartWith("https://");
expect(listing.title.length).toBeGreaterThan(0);
expect(listing.listingPrice.cents).toBeGreaterThanOrEqual(0);
expect(listing.listingPrice.currency.length).toBeGreaterThan(0);
}
});
});
```
- [ ] **Step 2: Run eBay live test**
Run: `bun test packages/core/test/live/ebay.live.test.ts` Expected: PASS when eBay
returns parseable search results; FAIL on endpoint/rate-limit/parser breakage.
### Task 2: Add Kijiji Live Suite
**Files:**
- Create: `packages/core/test/live/kijiji.live.test.ts`
- [ ] **Step 1: Write the live test file**
```ts
import { describe, expect, test } from "bun:test";
import fetchKijijiItems from "../../src/scrapers/kijiji";
describe("Kijiji live parser", () => {
test("scrapes live search results into detailed listings", async () => {
const results = await fetchKijijiItems(
"iphone",
1,
"https://www.kijiji.ca",
{ maxPages: 1 },
{ includeImages: false, sellerDataDepth: "basic" },
);
expect(results.length).toBeGreaterThan(0);
for (const listing of results) {
expect(listing.url).toStartWith("https://www.kijiji.ca/");
expect(listing.title.length).toBeGreaterThan(0);
expect(listing.listingPrice.cents).toBeGreaterThanOrEqual(0);
expect(listing.listingPrice.currency.length).toBeGreaterThan(0);
}
});
});
```
- [ ] **Step 2: Run Kijiji live test**
Run: `bun test packages/core/test/live/kijiji.live.test.ts` Expected: PASS when Kijiji
returns parseable search and detail pages; FAIL on endpoint/parser breakage.
### Task 3: Add Facebook Live Suite
**Files:**
- Create: `packages/core/test/live/facebook.live.test.ts`
- [ ] **Step 1: Write the live test file**
```ts
import { describe, expect, test } from "bun:test";
import fetchFacebookItems from "../../src/scrapers/facebook";
describe("Facebook live parser", () => {
test("requires FACEBOOK_COOKIE for strict live testing", () => {
expect(process.env.FACEBOOK_COOKIE?.trim().length ?? 0).toBeGreaterThan(0);
});
test("scrapes live marketplace search results into listing details", async () => {
const results = await fetchFacebookItems("iphone", 1, "toronto", 3);
expect(results.length).toBeGreaterThan(0);
for (const listing of results) {
expect(listing.url).toStartWith("https://www.facebook.com/marketplace/item/");
expect(listing.title.length).toBeGreaterThan(0);
expect(listing.listingPrice.cents).toBeGreaterThanOrEqual(0);
expect(listing.listingPrice.currency.length).toBeGreaterThan(0);
}
});
});
```
- [ ] **Step 2: Run Facebook live test**
Run: `bun test packages/core/test/live/facebook.live.test.ts` Expected: PASS with valid
`FACEBOOK_COOKIE`; FAIL when `FACEBOOK_COOKIE` is missing, expired, or parser output is
empty.
### Task 4: Add Root Live Test Script
**Files:**
- Modify: `package.json`
- [ ] **Step 1: Add script**
Change root `scripts` to include:
```json
{
"test:live": "bun test packages/core/test/live"
}
```
- [ ] **Step 2: Run all live tests through script**
Run: `bun run test:live` Expected: runs eBay, Kijiji, and Facebook live suites.
Facebook fails if `FACEBOOK_COOKIE` is unset.
### Task 5: Verify Default Suite Exclusion
**Files:**
- No code files modified.
- [ ] **Step 1: Run existing core tests**
Run: `bun test packages/core/test` Expected: existing mocked tests run.
If Bun discovers `packages/core/test/live`, change normal verification command to
explicit glob `bun test packages/core/test/*.test.ts` and document that in final notes.
- [ ] **Step 2: Run static checks**
Run: `bun run ci` Expected: typecheck and Biome pass.
Fix code issues without changing lint or TypeScript rules.
## Commit Note
Do not commit during execution unless user explicitly requests a commit.
This repo session policy overrides generic plan commit steps.
## Self-Review
- Spec coverage: eBay, Kijiji, Facebook live suites; explicit script; strict Facebook
auth; excluded from default flow.
- Placeholder scan: no `TBD`, `TODO`, or underspecified implementation steps.
- Type consistency: tests use current exported scraper signatures and shared listing
fields from `ListingDetails`.

View File

@@ -1,12 +1,13 @@
# Design: Adopt opencode Monorepo Config
**Date:** 2025-07-14
**Status:** Approved
**Date:** 2025-07-14\
**Status:** Approved\
**Approach:** Full adoption (A)
## Context
Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 packages (`core`, `api-server`, `mcp-server`). Reference: `anomalyco/opencode` monorepo patterns.
Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 packages
(`core`, `api-server`, `mcp-server`). Reference: `anomalyco/opencode` monorepo patterns.
**Gaps vs opencode:**
- No Turbo (task orchestration, caching, dep graph)
@@ -20,7 +21,8 @@ Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 p
### 1. Root `package.json`
- Add `workspaces.catalog` block with shared deps:
- `@typescript/native-preview`, `@types/bun`, `@types/unidecode`, `@types/cli-progress`
- `@typescript/native-preview`, `@types/bun`, `@types/unidecode`,
`@types/cli-progress`
- Add `turbo` to `devDependencies`
- Add `@tsconfig/bun` to `devDependencies` + catalog
- Update root scripts: `typecheck` and `build` delegate to `turbo run`
@@ -93,7 +95,8 @@ exact = true
root = "./do-not-run-tests-from-root"
```
Exact installs = reproducible. Root test guard prevents accidental root-level test runs.
Exact installs = reproducible.
Root test guard prevents accidental root-level test runs.
### 6. Package `exports` field
@@ -102,7 +105,8 @@ Replace `main`/`module` with `exports` in all 3 packages:
"exports": { ".": "./src/index.ts" }
```
Remove `main` and `module` fields. Bun resolves `.ts` directly.
Remove `main` and `module` fields.
Bun resolves `.ts` directly.
### 7. Catalog references in per-package `package.json`

View File

@@ -3,7 +3,9 @@
## Summary
Remove all file-based and request-provided cookie inputs across the repo.
The only supported authentication input becomes a raw `Cookie` header string supplied through scraper-specific environment variables such as `FACEBOOK_COOKIE` and `EBAY_COOKIE`.
The only supported authentication input becomes a raw `Cookie` header string supplied
through scraper-specific environment variables such as `FACEBOOK_COOKIE` and
`EBAY_COOKIE`.
## Goals
@@ -17,7 +19,8 @@ The only supported authentication input becomes a raw `Cookie` header string sup
- Changing scraper behavior unrelated to authentication input.
- Adding new cookie formats or migration helpers.
- Preserving backward compatibility for cookie files, JSON cookie arrays, or request overrides.
- Preserving backward compatibility for cookie files, JSON cookie arrays, or request
overrides.
## Current State
@@ -27,27 +30,33 @@ The current shared cookie utilities support three sources in priority order:
2. Environment variable
3. Cookie file
`packages/core/src/utils/cookies.ts` includes file loading, JSON array parsing, and auto-detection between JSON and header-string formats.
Facebook also exposes deprecated `cookiePath` arguments that still reach shared loading logic.
Docs in `cookies/AGENTS.md` still describe file-based setup and request-level overrides.
`packages/core/src/utils/cookies.ts` includes file loading, JSON array parsing, and
auto-detection between JSON and header-string formats.
Facebook also exposes deprecated `cookiePath` arguments that still reach shared loading
logic. Docs in `cookies/AGENTS.md` still describe file-based setup and request-level
overrides.
## Chosen Approach
Use the hard-reset approach.
Delete the shared multi-source cookie-loading model and reduce the cookie surface to env-header parsing only.
This is a larger diff than a surgical removal, but it avoids leaving behind abstractions that imply unsupported inputs still exist.
Delete the shared multi-source cookie-loading model and reduce the cookie surface to
env-header parsing only.
This is a larger diff than a surgical removal, but it avoids leaving behind abstractions
that imply unsupported inputs still exist.
## Design
### Shared Cookie Utilities
`packages/core/src/utils/cookies.ts` will keep only the pieces needed for env-header-based auth:
`packages/core/src/utils/cookies.ts` will keep only the pieces needed for
env-header-based auth:
- `Cookie` type
- A reduced cookie config shape containing only `name`, `domain`, and `envVar`
- `parseCookieString()` for raw `Cookie` header strings
- `formatCookiesForHeader()` for domain filtering and request formatting
- An env-only loader that reads `process.env[config.envVar]`, parses it, and throws a targeted error when missing or invalid
- An env-only loader that reads `process.env[config.envVar]`, parses it, and throws a
targeted error when missing or invalid
The following shared utilities will be removed:
@@ -68,15 +77,18 @@ For Facebook this means:
For eBay this means:
- Remove any remaining fallback/file-oriented behavior from shared calls and error strings
- Remove any remaining fallback/file-oriented behavior from shared calls and error
strings
- Keep the existing env-var auth path, but make it the only path
### Public API Surface
Exports from `packages/core/src/index.ts` should reflect the new contract.
If exported functions currently advertise cookie-source or cookie-path arguments, their signatures will be tightened so callers cannot pass unsupported inputs.
If exported functions currently advertise cookie-source or cookie-path arguments, their
signatures will be tightened so callers cannot pass unsupported inputs.
Downstream adapter packages should continue calling core through the simplified signatures without adding their own cookie-loading behavior.
Downstream adapter packages should continue calling core through the simplified
signatures without adding their own cookie-loading behavior.
### Error Handling
@@ -93,8 +105,8 @@ Errors should be blunt and specific:
### Testing Strategy
Follow TDD.
Start by changing or adding core tests so the old file/request behavior is no longer accepted.
Follow TDD. Start by changing or adding core tests so the old file/request behavior is
no longer accepted.
Coverage targets:
@@ -102,7 +114,8 @@ Coverage targets:
2. Missing env vars fail with the new env-only error.
3. Invalid env strings fail without falling back to files or request data.
4. Facebook APIs no longer expose or honor cookie-path/request-cookie behavior.
5. Existing tests that depended on missing files or JSON cookie arrays are rewritten to the env-only contract.
5. Existing tests that depended on missing files or JSON cookie arrays are rewritten to
the env-only contract.
Verification target after implementation:
@@ -121,11 +134,15 @@ Update cookie-related docs to match the new contract:
## Risks
- External callers using request cookie overrides will break at compile time or runtime, depending on how they consume the package.
- Recent work added support for custom Facebook cookie paths, so removing that path intentionally reverses a newly introduced behavior.
- Tests that currently model missing-file behavior must be rewritten rather than preserved.
- External callers using request cookie overrides will break at compile time or runtime,
depending on how they consume the package.
- Recent work added support for custom Facebook cookie paths, so removing that path
intentionally reverses a newly introduced behavior.
- Tests that currently model missing-file behavior must be rewritten rather than
preserved.
## Rollout Notes
This is an intentional contract break.
The code, tests, and docs should all land together so there is no mixed messaging about supported cookie sources.
The code, tests, and docs should all land together so there is no mixed messaging about
supported cookie sources.

View File

@@ -2,35 +2,46 @@
## Summary
Replace the legacy Facebook Marketplace scraper with a route-aware implementation built around current Comet bootstrap markers and route-specific extraction.
The new scraper will keep authenticated direct HTTP fetches as the primary transport, but it will stop treating legacy `require`, `__bbox`, and `marketplace_product_details_page` structures as the main parsing contract.
Replace the legacy Facebook Marketplace scraper with a route-aware implementation built
around current Comet bootstrap markers and route-specific extraction.
The new scraper will keep authenticated direct HTTP fetches as the primary transport,
but it will stop treating legacy `require`, `__bbox`, and
`marketplace_product_details_page` structures as the main parsing contract.
## Goals
- Replace both Facebook search and item-detail extraction with a current-shape parser.
- Keep authenticated direct HTTP requests as the primary fetch strategy.
- Parse route-specific Comet bootstrap/state payloads before falling back to rendered-HTML extraction.
- Parse route-specific Comet bootstrap/state payloads before falling back to
rendered-HTML extraction.
- Detect auth-gated, unavailable, and unknown responses explicitly.
- Update tests so they model current route markers and failure modes instead of legacy page objects.
- Update tests so they model current route markers and failure modes instead of legacy
page objects.
## Non-Goals
- Reworking non-Facebook scrapers.
- Converting the scraper to browser-only automation.
- Preserving old parser behavior for `marketplace_product_details_page` or `__bbox`-driven item extraction.
- Reverse-engineering every internal Facebook bootstrap payload shape exhaustively before implementation.
- Preserving old parser behavior for `marketplace_product_details_page` or
`__bbox`-driven item extraction.
- Reverse-engineering every internal Facebook bootstrap payload shape exhaustively
before implementation.
## Current State
The current implementation in `packages/core/src/scrapers/facebook.ts` still uses authenticated HTTP requests, which remains correct.
The search path parses embedded script JSON and looks for `marketplace_search.feed_units.edges`.
The item-detail path is centered on legacy extraction paths such as:
The current implementation in `packages/core/src/scrapers/facebook.ts` still uses
authenticated HTTP requests, which remains correct.
The search path parses embedded script JSON and looks for
`marketplace_search.feed_units.edges`. The item-detail path is centered on legacy
extraction paths such as:
- `parsed.require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
- nested `__bbox.require[...]` variations
- recursive search through `parsed.require`
Live evidence gathered earlier in this session and by the isolated research subagent shows that current Facebook Marketplace pages are Comet route-driven and expose markers such as:
Live evidence gathered earlier in this session and by the isolated research subagent
shows that current Facebook Marketplace pages are Comet route-driven and expose markers
such as:
- `XCometMarketplaceSearchController`
- `XCometMarketplacePermalinkController`
@@ -41,7 +52,9 @@ Live evidence gathered earlier in this session and by the isolated research suba
- `data-sjs`
- `data-btmanifest`
The same live investigation also showed that authenticated item pages no longer expose the old `marketplace_product_details_page` marker reliably, while live search still returns usable results.
The same live investigation also showed that authenticated item pages no longer expose
the old `marketplace_product_details_page` marker reliably, while live search still
returns usable results.
## Chosen Approach
@@ -52,9 +65,11 @@ The scraper will:
1. Fetch authenticated HTML directly.
2. Classify the response using current route and auth markers.
3. Parse inline bootstrap/state payloads using route-specific probes.
4. Fall back to rendered-HTML extraction only when bootstrap markers are present but the payload cannot be decoded into the expected search or item shape.
4. Fall back to rendered-HTML extraction only when bootstrap markers are present but the
payload cannot be decoded into the expected search or item shape.
This keeps the cheaper direct-HTTP transport while shifting the parser contract from legacy page-object names to current Comet route structure.
This keeps the cheaper direct-HTTP transport while shifting the parser contract from
legacy page-object names to current Comet route structure.
## Design
@@ -88,7 +103,8 @@ Primary behavior:
- fetch the Marketplace search HTML with auth cookies
- confirm the response class is `search`
- extract inline bootstrap/state blobs from script tags and page attributes
- probe for route-specific search payloads associated with `XCometMarketplaceSearchController`
- probe for route-specific search payloads associated with
`XCometMarketplaceSearchController`
- map decoded search results into summary listing records
Search summary fields should remain aligned with the current public output shape:
@@ -102,7 +118,8 @@ Search summary fields should remain aligned with the current public output shape
Fallback behavior:
- if search route markers are present but structured payload decoding fails, extract listing summaries from rendered HTML anchors and text patterns
- if search route markers are present but structured payload decoding fails, extract
listing summaries from rendered HTML anchors and text patterns
- use item links matching `/marketplace/item/<id>` as the anchor for fallback extraction
- treat fallback results as summary-only data, not rich detail data
@@ -132,9 +149,12 @@ Priority item fields:
Fallback behavior:
- if permalink route markers are present but no stable payload object is decodable, extract data from rendered HTML text structure
- prioritize title, price, condition, description, location text, and seller module content
- return partial item data when core user-facing fields are present rather than failing solely because deeper commerce metadata is missing
- if permalink route markers are present but no stable payload object is decodable,
extract data from rendered HTML text structure
- prioritize title, price, condition, description, location text, and seller module
content
- return partial item data when core user-facing fields are present rather than failing
solely because deeper commerce metadata is missing
### Bootstrap Parsing Strategy
@@ -151,11 +171,14 @@ Candidate discovery inputs:
- `ServerJS` / `Bootloader` inline blobs
- route controller names
Candidate scoring for search should favor objects that contain repeated result-card semantics, item IDs, listing links, titles, prices, or location summaries.
Candidate scoring for item pages should favor objects that contain singular listing semantics, title, price, condition, description, location, seller, or permalink context.
Candidate scoring for search should favor objects that contain repeated result-card
semantics, item IDs, listing links, titles, prices, or location summaries.
Candidate scoring for item pages should favor objects that contain singular listing
semantics, title, price, condition, description, location, seller, or permalink context.
The parser should not depend on one hard-coded object name surviving forever.
Instead, it should look for route-specific semantic clusters and choose the strongest candidate.
Instead, it should look for route-specific semantic clusters and choose the strongest
candidate.
### Legacy Removal
@@ -166,7 +189,9 @@ Specifically:
- delete legacy-first `require` / `__bbox` navigation tables
- delete tests whose only purpose is to preserve those legacy paths
If a minimal legacy compatibility branch remains, it must be a last-resort fallback behind the new route-aware parser and should not shape test fixtures or design decisions.
If a minimal legacy compatibility branch remains, it must be a last-resort fallback
behind the new route-aware parser and should not shape test fixtures or design
decisions.
### Error Handling
@@ -178,7 +203,8 @@ Facebook responses should now fail with explicit route-aware outcomes:
4. Search or item route detected, but no decodable data found.
5. Unknown response shape.
Error messages should name the actual class of failure instead of implying that every parse miss is caused by expired cookies.
Error messages should name the actual class of failure instead of implying that every
parse miss is caused by expired cookies.
### Testing Strategy
@@ -190,11 +216,15 @@ Coverage targets:
1. Search responses classify correctly from current Comet controller markers.
2. Item responses classify correctly from current Comet controller markers.
3. Login-gated and unavailable responses are detected before parsing.
4. Search bootstrap parsing produces summary listing results from current-shape fixtures.
4. Search bootstrap parsing produces summary listing results from current-shape
fixtures.
5. Item bootstrap parsing produces rich listing details from current-shape fixtures.
6. Search fallback extraction works when route markers exist but structured payload decoding fails.
7. Item fallback extraction works when route markers exist but structured payload decoding fails.
8. Old legacy-only item fixtures are removed or rewritten so they no longer define the contract.
6. Search fallback extraction works when route markers exist but structured payload
decoding fails.
7. Item fallback extraction works when route markers exist but structured payload
decoding fails.
8. Old legacy-only item fixtures are removed or rewritten so they no longer define the
contract.
Verification target after implementation:
@@ -204,23 +234,30 @@ Verification target after implementation:
## Public API Surface
Keep the current public function names unless the rewrite proves that a signature change is required:
Keep the current public function names unless the rewrite proves that a signature change
is required:
- `fetchFacebookItems(...)`
- `fetchFacebookItem(...)`
- `extractFacebookMarketplaceData(...)`
- `extractFacebookItemData(...)`
The internals should change substantially, but callers should not need a new integration surface for this rewrite.
The internals should change substantially, but callers should not need a new integration
surface for this rewrite.
## Risks
- Facebook may change bootstrap payload naming again, so route/controller markers are more stable than exact nested object paths but still not guaranteed.
- Search and item pages may each contain multiple partial payloads, making candidate ranking important.
- Fallback rendered-HTML extraction may be noisier than bootstrap decoding and needs clear precedence rules.
- Live fixtures can drift from production quickly, so tests must model route semantics rather than exact one-off payloads where possible.
- Facebook may change bootstrap payload naming again, so route/controller markers are
more stable than exact nested object paths but still not guaranteed.
- Search and item pages may each contain multiple partial payloads, making candidate
ranking important.
- Fallback rendered-HTML extraction may be noisier than bootstrap decoding and needs
clear precedence rules.
- Live fixtures can drift from production quickly, so tests must model route semantics
rather than exact one-off payloads where possible.
## Rollout Notes
The code, fixtures, and tests should change together.
There should be no mixed state where the implementation is Comet-aware but the tests still encode `marketplace_product_details_page` as the primary contract.
There should be no mixed state where the implementation is Comet-aware but the tests
still encode `marketplace_product_details_page` as the primary contract.

View File

@@ -2,15 +2,18 @@
## Summary
Add an optional shared result mode across Facebook, eBay, and Kijiji that moves suspiciously cheap listings out of the main results into a separate `unstableResults` bucket.
Listings are considered unstable when their price is more than 20% below the median price of the scraper's priced search results.
Add an optional shared result mode across Facebook, eBay, and Kijiji that moves
suspiciously cheap listings out of the main results into a separate `unstableResults`
bucket. Listings are considered unstable when their price is more than 20% below the
median price of the scrapers priced search results.
## Goals
- Support the same optional unstable-listing mode across all scrapers.
- Keep current default scraper and route behavior unchanged unless the mode is enabled.
- Hide unstable listings from the main results while still returning them separately.
- Implement the rule once in shared core code instead of duplicating marketplace-specific logic.
- Implement the rule once in shared core code instead of duplicating
marketplace-specific logic.
- Document the option in MCP tool descriptions so callers can discover it.
## Non-Goals
@@ -24,7 +27,8 @@ Listings are considered unstable when their price is more than 20% below the med
`packages/core` currently returns plain arrays from scraper search functions.
`packages/api-server` forwards those scraper results directly from marketplace routes.
`packages/mcp-server` documents search tools per marketplace, but does not expose or describe any result-stability mode.
`packages/mcp-server` documents search tools per marketplace, but does not expose or
describe any result-stability mode.
There is no shared result-classification utility today.
Price filtering exists in some scrapers, but not a cross-marketplace median-based split.
@@ -33,11 +37,14 @@ Price filtering exists in some scrapers, but not a cross-marketplace median-base
Use a shared core utility plus per-route and per-tool opt-in.
The shared utility will accept parsed listings, compute the median from valid positive prices, and split the data into `results` and `unstableResults`.
Each scraper will opt into that utility when the caller enables unstable-listing mode.
API routes and MCP tools will expose the same optional mode so the feature is consistently available everywhere scraper search is surfaced.
The shared utility will accept parsed listings, compute the median from valid positive
prices, and split the data into `results` and `unstableResults`. Each scraper will opt
into that utility when the caller enables unstable-listing mode.
API routes and MCP tools will expose the same optional mode so the feature is
consistently available everywhere scraper search is surfaced.
This keeps the heuristic centralized, minimizes duplicated logic, and preserves existing consumers by leaving the default path unchanged.
This keeps the heuristic centralized, minimizes duplicated logic, and preserves existing
consumers by leaving the default path unchanged.
## Design
@@ -48,14 +55,16 @@ Add a shared utility in `packages/core` for listing stability classification.
Responsibilities:
- accept parsed listing arrays with `listingPrice.cents`
- ignore listings whose price is missing, non-numeric, or non-positive when computing the median
- ignore listings whose price is missing, non-numeric, or non-positive when computing
the median
- compute the median price from valid priced listings
- classify listings as unstable when `listingPrice.cents < median * 0.8`
- return an object with:
- `results`: listings that remain in the main bucket
- `unstableResults`: listings moved out of the main bucket
Listings excluded from median computation because their price is missing or non-positive remain in `results` unchanged.
Listings excluded from median computation because their price is missing or non-positive
remain in `results` unchanged.
### Scraper Integration
@@ -68,7 +77,8 @@ Default behavior:
Opt-in behavior:
- run the shared classification utility after parsing search results
- classify before final result limiting so unstable items do not consume main-result slots
- classify before final result limiting so unstable items do not consume main-result
slots
- return an object shaped like:
```ts
@@ -82,7 +92,8 @@ Each scraper will use its existing concrete listing subtype for these arrays.
### API Surface
Marketplace API routes will expose an optional query parameter for unstable-listing mode.
Marketplace API routes will expose an optional query parameter for unstable-listing
mode.
Requirements:
@@ -90,7 +101,8 @@ Requirements:
- when enabled, return the object payload with `results` and `unstableResults`
- use the same semantics across Facebook, eBay, and Kijiji routes
The exact parameter name should be consistent across routes and intentionally describe the behavior, for example `unstableFilter=true`.
The exact parameter name should be consistent across routes and intentionally describe
the behavior, for example `unstableFilter=true`.
### MCP Surface
@@ -100,34 +112,43 @@ Tool descriptions should explicitly document:
- that the option is optional
- that it moves listings priced more than 20% below the median into `unstableResults`
- that enabling it changes the response shape from a plain list to an object with `results` and `unstableResults`
- that enabling it changes the response shape from a plain list to an object with
`results` and `unstableResults`
- that the behavior is available for Facebook, eBay, and Kijiji search tools
The wording should be aligned across all three tools so the feature reads as one shared capability.
The wording should be aligned across all three tools so the feature reads as one shared
capability.
### Error Handling
The unstable-listing mode should be best-effort and non-failing.
- If there are no valid positive prices, return all listings in `results` and an empty `unstableResults` array.
- If there are no valid positive prices, return all listings in `results` and an empty
`unstableResults` array.
- If there is only one valid priced listing, do not classify it as unstable.
- Parsing failures remain governed by existing scraper behavior; the classification layer should not introduce new scraper-specific errors.
- Parsing failures remain governed by existing scraper behavior; the classification
layer should not introduce new scraper-specific errors.
### Testing Strategy
Follow TDD.
Start with shared utility tests, then wire the option through scraper and route tests.
Follow TDD. Start with shared utility tests, then wire the option through scraper and
route tests.
Coverage targets:
1. Median calculation for odd-sized valid price sets.
2. Median calculation for even-sized valid price sets.
3. Strict cutoff behavior where only listings with `price < median * 0.8` move to `unstableResults`.
4. Missing, invalid, zero, or negative prices are excluded from median computation and remain in `results`.
3. Strict cutoff behavior where only listings with `price < median * 0.8` move to
`unstableResults`.
4. Missing, invalid, zero, or negative prices are excluded from median computation and
remain in `results`.
5. Default scraper behavior still returns plain arrays when the option is disabled.
6. Enabled scraper behavior returns `{ results, unstableResults }` for Facebook, eBay, and Kijiji.
7. API routes preserve existing response shapes by default and switch to the object payload only when enabled.
8. MCP tool metadata documents the new optional mode for all three marketplace search tools.
6. Enabled scraper behavior returns `{ results, unstableResults }` for Facebook, eBay,
and Kijiji.
7. API routes preserve existing response shapes by default and switch to the object
payload only when enabled.
8. MCP tool metadata documents the new optional mode for all three marketplace search
tools.
Verification target after implementation:
@@ -138,11 +159,15 @@ Verification target after implementation:
## Risks
- The optional mode introduces a union return shape for scraper callers, which can ripple into downstream TypeScript signatures.
- Applying classification before final limiting changes which items appear in the main bucket compared with a naive post-limit split.
- Kijiji and eBay may have different mixes of priced and unpriced results, so excluding non-positive prices from the median must remain explicit and tested.
- The optional mode introduces a union return shape for scraper callers, which can
ripple into downstream TypeScript signatures.
- Applying classification before final limiting changes which items appear in the main
bucket compared with a naive post-limit split.
- Kijiji and eBay may have different mixes of priced and unpriced results, so excluding
non-positive prices from the median must remain explicit and tested.
## Rollout Notes
Land the shared classifier, scraper wiring, route wiring, tests, and MCP description updates together.
That avoids a partial rollout where the feature exists in one surface but is undocumented or inconsistent elsewhere.
Land the shared classifier, scraper wiring, route wiring, tests, and MCP description
updates together. That avoids a partial rollout where the feature exists in one surface
but is undocumented or inconsistent elsewhere.

View File

@@ -0,0 +1,44 @@
# Live Parser Tests Design
## Summary
Add explicit live endpoint tests for each core scraper parser path.
These tests are excluded from normal deterministic test commands and run only through a
dedicated package script.
## Scope
- Add one live suite per parser: eBay, Kijiji, Facebook.
- Place suites under `packages/core/test/live/` so normal
`bun test packages/core/test/*.test.ts` patterns do not include them accidentally.
- Add a root `test:live` script that runs all live suites together.
- Keep existing mocked tests unchanged.
## Behavior
- Each suite calls the public scraper entry point for that marketplace with a narrow
query and low max item count.
- Assertions verify scrape output shape and parser viability, not exact listing
identity.
- eBay and Kijiji require live network access and fail on endpoint/parser breakage.
- Facebook is strict: missing or expired `FACEBOOK_COOKIE` fails the live suite instead
of skipping.
## Test Data
- Use stable broad Canadian queries such as `iphone` or `laptop` to reduce empty-result
risk.
- Use low limits to avoid unnecessary load and rate-limit pressure.
- Avoid exact prices, titles, listing IDs, or ordering assumptions.
## Failure Meaning
- Empty result arrays fail because live parser logic did not produce usable listings.
- Missing required fields fail because adapter contracts depend on those fields.
- Authentication failures fail for Facebook because selected scope is strict.
## Verification
- Normal suite remains offline: `bun test packages/core/test`.
- Live suite runs by explicit script: `bun run test:live`.
- Full static checks remain via `bun run ci`.

View File

@@ -0,0 +1,173 @@
# Facebook Marketplace Anti-Bot Challenge Solver Design
## Summary
Add a challenge-detection and challenge-solving layer to the Facebook Marketplace
scraper so it can handle anti-bot gates (checkpoint pages, token rotation, cookie
requirements) programmatically.
Build the solver in pure Bun — no browser automation in production.
Use `agent-browser` only for one-time debug reconnaissance.
## Goals
- Identify which anti-bot challenge(s) Facebook Marketplace triggers against
programmatic HTTP requests.
- Implement detection + solving for each discovered challenge type.
- Wire the solver into `fetchFacebookItems` and `fetchFacebookItem` so challenges are
handled transparently.
- Follow the same pattern as the existing `ebay-challenge.ts` (detect → solve → retry
with clearance).
- Zero browser automation at runtime.
Pure `fetch` + `Bun` APIs + npm packages only.
## Non-Goals
- Solving login/auth-wall challenges (those require fresh cookies — not solvable
programmatically).
- Full account login automation (cookies must be provided by the user).
- Browser-based scraping or Puppeteer/Playwright integration.
- Solving challenges for non-Marketplace Facebook endpoints.
## Current State
The Facebook scraper (`packages/core/src/scrapers/facebook.ts`) fetches Marketplace
search and item pages via authenticated `fetch` with cookies from `FACEBOOK_COOKIE` env
var. It:
- Sends a browser-like header set (`sec-ch-ua`, `user-agent`, etc.)
- Parses SSR HTML for embedded JSON in script tags
- Has no challenge detection — if Facebook returns a challenge page, the scraper
silently fails (no listings parsed, classifies as “unknown”)
- Depends entirely on cookie freshness
The eBay scraper already follows the challenge-solver pattern in this codebase:
`ebay.ts` uses `warmEbaySession()`, `isChallengeRedirect()`, `isChallengeHtml()`, and
`solveEbayChallenge()` from `ebay-challenge.ts`.
## Chosen Approach
**Reconnaissance-first development:**
1. Use `agent-browser` (debug only) to capture a real Facebook Marketplace browsing
session via HAR.
2. Probe programmatic `fetch` to see what Facebook returns without a browser.
3. Diff the two to identify the gap (missing headers?
missing cookies? missing JS execution?).
4. Build a modular solver in `packages/core/src/utils/facebook-challenge.ts` that
detects each challenge type and applies the appropriate fix.
5. Wire it into `facebook.ts` following the eBay pattern.
## Design
### File Plan
| File | Purpose |
| --- | --- |
| `packages/core/src/utils/facebook-challenge.ts` | Challenge detection, solving, and cookie/session utilities |
| `packages/core/src/scrapers/facebook.ts` | Modified: warmup, challenge detection before parsing, retry loop |
| `packages/core/test/facebook-challenge.test.ts` | Unit tests with mock challenge HTML fixtures |
### Flow
```
fetchFacebookItems(searchUrl)
├── warmFacebookSession() → GET facebook.com/ (collect datr + Akamai cookies)
├── fetchHtml(searchUrl) → receives response
├── detectFacebookChallenge(response)
│ ├── checkpoint/challenge HTML → solveCheckpointChallenge()
│ ├── redirect to /login → fail (cookies expired)
│ ├── missing required cookies → regenerate session
│ ├── 429 rate limit → backoff + retry (existing http.ts handles this)
│ └── no challenge → proceed to parsing
├── if solveCheckpointChallenge succeeds → retry fetchHtml with clearance cookie
└── parse results
```
### Challenge Types (to be confirmed by reconnaissance)
| Type | Expected Signal | Solving Strategy |
| --- | --- | --- |
| Login wall | Redirect to `/login` or HTML `"You must log in"` | Fail — user must provide fresh cookies |
| Checkpoint page | HTML contains `checkpoint` or `challenge` path | Parse hidden form fields, compute proof-of-work if present, submit answer endpoint |
| `datr` cookie missing | No `datr` in cookie jar → request fails | Fetch homepage first to obtain `datr` (session warmup) |
| DTSG token needed | Form submissions fail with CSRF error | Extract `fb_dtsg` from page HTML, include in request body |
| GraphQL header check | Request blocked without internal headers | Extract `x-fb-friendly-name` from browser HAR, replicate |
| Akamai/bot-manager | Redirect loops or blank pages without Akamai cookies | Homepage warmup to collect `bm_sv`, `bm_mi`, etc. |
### Key Modules
**`facebook-challenge.ts`:**
```
// Session warmup — fetch homepage to prime cookies
warmFacebookSession(): Promise<Record<string, string>>
// Challenge detection
detectFacebookChallenge(html, status, url, headers): ChallengeType | null
// Checkpoint solver
solveCheckpointChallenge(html, cookies): Promise<ChallengeResult>
// DTSG token extraction
extractDtsg(html): string | null
// Cookie jar management (shared with ebay.ts pattern)
mergeCookies(...): Record<string, string>
```
**`ChallengeResult` type:**
```ts
interface ChallengeResult {
solved: boolean;
cookies?: Record<string, string>; // clearance cookies to replay
token?: string; // challenge response token
error?: string; // why it failed
}
```
### Error Handling
- Solver failure → return `ChallengeResult { solved: false, error: "..." }`, scraper
logs warning and returns empty results (never throws).
- Unrecognized challenge → log the response URL and HTML snippet for future analysis.
- Rate limits → handled by existing `http.ts` exponential backoff (no change needed).
- Solver timeout → 30s cap on any challenge computation, fall back to `solved: false`.
### Testing
| Test | What It Verifies |
| --- | --- |
| `detectFacebookChallenge` with sample checkpoint HTML | Correctly identifies checkpoint challenge |
| `detectFacebookChallenge` with normal search HTML | Returns null (no false positives) |
| `detectFacebookChallenge` with login redirect | Identifies auth-gated |
| `solveCheckpointChallenge` with known PoW params | Produces correct answer |
| `warmFacebookSession` with mocked fetch | Collects expected cookies |
| `extractDtsg` with sample page HTML | Extracts the DTSG token |
| Integration: fetch → challenge → solve → retry → results | End-to-end mock flow |
| Solver throws → scraper returns empty, no crash | Graceful fallback |
| Solver unknown challenge → logs warning, returns empty | No unhandled challenge crashes |
Test data will use anonymized HTML fixtures (no real user data).
## Reconnaissance Steps (debug-only, one-time)
1. **Probe programmatically:** `fetch` Marketplace search with/without cookies, record
status code and HTML.
2. **Browser session:** `agent-browser` → log into Facebook → navigate Marketplace →
record HAR.
3. **Diff analysis:** Compare browser request headers vs.
our programmatic headers.
4. **Cookie inventory:** List all cookies from browser session, identify which are
essential.
5. **Challenge trigger:** Identify what change in request signature triggers a
challenge.
6. **Replay test:** Replay browsers exact request via `fetch` to confirm
headers/cookies are the differentiator.
All reconnaissance artifacts saved under `docs/facebook-challenge/`.
## Decisions Deferred to Post-Reconnaissance
- Exact challenge types and solving strategies (depends on what Facebook actually uses).
- Whether a PoW solver, CAPTCHA solver, or token-extraction approach is needed.
- npm package dependencies (only add what the reconnaissance proves necessary).

View File

@@ -12,6 +12,7 @@
"build:mcp": "bun build ./packages/mcp-server/src/index.ts --target=bun --outdir=./dist/mcp --minify",
"build:all": "bun run build:api && bun run build:mcp",
"ci": "bun run typecheck && biome check --write",
"test:live": "bun test --cwd packages/core test/live",
"clean": "rm -rf dist",
"start": "./scripts/start.sh"
},

View File

@@ -3,6 +3,7 @@ import { logger } from "../logger";
import {
emptySearchResponse,
getRequiredSearchQuery,
parseDollarPriceParam,
parseNonNegativeIntegerParam,
} from "./helpers";
@@ -18,17 +19,11 @@ export async function ebayRoute(req: Request): Promise<Response> {
return SEARCH_QUERY;
}
const minPrice = parseNonNegativeIntegerParam(
reqUrl.searchParams,
"minPrice",
);
const minPrice = parseDollarPriceParam(reqUrl.searchParams, "minPrice");
if (minPrice instanceof Response) {
return minPrice;
}
const maxPrice = parseNonNegativeIntegerParam(
reqUrl.searchParams,
"maxPrice",
);
const maxPrice = parseDollarPriceParam(reqUrl.searchParams, "maxPrice");
if (maxPrice instanceof Response) {
return maxPrice;
}

View File

@@ -39,6 +39,23 @@ export function parseNonNegativeIntegerParam(
return Number(rawValue);
}
export function parseDollarPriceParam(
searchParams: URLSearchParams,
name: string,
): number | undefined | Response {
const rawValue = searchParams.get(name);
if (rawValue === null) {
return undefined;
}
if (!/^\d+(?:\.\d{1,2})?$/.test(rawValue)) {
return Response.json(
{ message: `Invalid ${name} parameter` },
{ status: 400 },
);
}
return Math.round(Number(rawValue) * 100);
}
export function emptySearchResponse(hint?: string): Response {
const message = hint
? `Search didn't return any results! ${hint}`

View File

@@ -3,6 +3,7 @@ import { logger } from "../logger";
import {
emptySearchResponse,
getRequiredSearchQuery,
parseDollarPriceParam,
parseNonNegativeIntegerParam,
} from "./helpers";
@@ -26,17 +27,11 @@ export async function kijijiRoute(req: Request): Promise<Response> {
if (maxPages instanceof Response) {
return maxPages;
}
const priceMin = parseNonNegativeIntegerParam(
reqUrl.searchParams,
"priceMin",
);
const priceMin = parseDollarPriceParam(reqUrl.searchParams, "priceMin");
if (priceMin instanceof Response) {
return priceMin;
}
const priceMax = parseNonNegativeIntegerParam(
reqUrl.searchParams,
"priceMax",
);
const priceMax = parseDollarPriceParam(reqUrl.searchParams, "priceMax");
if (priceMax instanceof Response) {
return priceMax;
}

View File

@@ -282,6 +282,24 @@ describe("API routes", () => {
);
});
test("kijijiRoute forwards dollar price filters to core as cents", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
await kijijiRoute(
new Request(
"http://localhost/api/kijiji?q=laptop&priceMin=999.99&priceMax=1000",
),
);
expect(fetchKijijiItems).toHaveBeenCalledWith(
"laptop",
4,
"https://www.kijiji.ca",
expect.objectContaining({ priceMin: 99_999, priceMax: 100_000 }),
{},
);
});
test("kijijiRoute does not forward unstableFilter when false", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
@@ -414,6 +432,24 @@ describe("API routes", () => {
);
});
test("ebayRoute forwards dollar price filters to core as cents", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
fetchEbayItems.mockImplementation(() => Promise.resolve([{ title: "a" }]));
await ebayRoute(
new Request(
"http://localhost/api/ebay?q=macbook&minPrice=999.99&maxPrice=1000",
),
);
expect(fetchEbayItems).toHaveBeenCalledWith(
"macbook",
1,
expect.objectContaining({ minPrice: 99_999, maxPrice: 100_000 }),
);
});
test("ebayRoute passes through scraper payload unchanged in unstable mode", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
@@ -730,16 +766,18 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid minPrice parameter");
});
test("ebayRoute returns 400 for decimal minPrice", async () => {
test("ebayRoute accepts decimal minPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&minPrice=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid minPrice parameter");
expect(fetchEbayItems).toHaveBeenCalledWith(
"laptop",
1,
expect.objectContaining({ minPrice: 150 }),
);
});
test("ebayRoute returns 400 for non-integer maxPrice", async () => {
@@ -766,16 +804,18 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid maxPrice parameter");
});
test("ebayRoute returns 400 for decimal maxPrice", async () => {
test("ebayRoute accepts decimal maxPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxPrice=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPrice parameter");
expect(fetchEbayItems).toHaveBeenCalledWith(
"laptop",
1,
expect.objectContaining({ maxPrice: 150 }),
);
});
test("kijijiRoute returns 400 for decimal maxPages", async () => {
@@ -862,16 +902,20 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid priceMin parameter");
});
test("kijijiRoute returns 400 for decimal priceMin", async () => {
test("kijijiRoute accepts decimal priceMin", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMin=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMin parameter");
expect(fetchKijijiItems).toHaveBeenCalledWith(
"laptop",
4,
"https://www.kijiji.ca",
expect.objectContaining({ priceMin: 150 }),
{},
);
});
test("kijijiRoute returns 400 for non-integer priceMin", async () => {
@@ -934,16 +978,20 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid priceMax parameter");
});
test("kijijiRoute returns 400 for decimal priceMax", async () => {
test("kijijiRoute accepts decimal priceMax", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMax=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMax parameter");
expect(fetchKijijiItems).toHaveBeenCalledWith(
"laptop",
4,
"https://www.kijiji.ca",
expect.objectContaining({ priceMax: 150 }),
{},
);
});
test("kijijiRoute returns 400 for non-integer priceMax", async () => {

View File

@@ -10,8 +10,14 @@ import {
type CookieConfig,
ensureCookies,
formatCookiesForHeader,
loadCookiesOptional,
parseCookieString,
} from "../utils/cookies";
import {
buildFacebookHeaders,
detectFacebookChallenge,
warmFacebookSession,
} from "../utils/facebook-challenge";
import { formatCentsToCurrency } from "../utils/format";
import { fetchHtml, HttpError, isRecord, RateLimitError } from "../utils/http";
import { logger } from "../utils/logger";
@@ -20,9 +26,10 @@ import { classifyUnstableListings } from "../utils/unstable";
/**
* Facebook Marketplace Scraper
*
* Note: Facebook Marketplace requires authentication cookies for full access.
* This implementation will return limited or no results without proper authentication.
* This is by design to respect Facebook's authentication requirements.
* Facebook Marketplace returns search results without authentication when
* proper browser headers are sent. Prices and seller details are hidden on
* search results but are available on individual item pages even without
* auth cookies. For full-price search results, provide FACEBOOK_COOKIE.
*/
// Facebook cookie configuration
@@ -263,20 +270,14 @@ function logExtractionMetrics(success: boolean, itemId?: string) {
// ----------------------------- HTTP Client -----------------------------
function createFacebookHeaders(cookies: string): Record<string, string> {
return {
accept:
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
"cache-control": "no-cache",
"upgrade-insecure-requests": "1",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"user-agent":
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
cookie: cookies,
};
const jar: Record<string, string> = {};
if (cookies) {
for (const pair of cookies.split(";")) {
const [name, ...rest] = pair.trim().split("=");
if (name && rest.length > 0) jar[name.trim()] = rest.join("=").trim();
}
}
return buildFacebookHeaders(jar);
}
// ----------------------------- Parsing -----------------------------
@@ -286,13 +287,29 @@ export type FacebookResponseKind =
| "item"
| "auth_gated"
| "unavailable"
| "checkpoint"
| "unknown";
export function classifyFacebookResponse(
htmlString: HTMLString,
responseUrl: string,
status = 200,
) {
const challengeType = detectFacebookChallenge(
status,
htmlString,
responseUrl,
);
if (challengeType === "checkpoint") {
return {
kind: "checkpoint" as const,
authGated: false,
unavailable: false,
};
}
const authGated =
challengeType === "login_wall" ||
responseUrl.includes("/login/") ||
htmlString.includes("You must log in") ||
htmlString.includes("log in to continue");
@@ -764,6 +781,22 @@ export function extractFacebookItemData(
return bestMatch.item;
}
// Try marketplace_product_details_page.target path (current item page structure)
for (const candidate of candidates) {
const detailsPage = findKeyInObject(
candidate,
"marketplace_product_details_page",
) as Record<string, unknown> | undefined;
const target = detailsPage?.target as Record<string, unknown> | undefined;
if (
target &&
typeof target.id === "string" &&
typeof target.marketplace_listing_title === "string"
) {
return target as unknown as FacebookMarketplaceItem;
}
}
if (htmlString.includes("XCometMarketplacePermalinkController")) {
return extractFacebookItemHtmlFallback(htmlString);
}
@@ -771,6 +804,25 @@ export function extractFacebookItemData(
return null;
}
function findKeyInObject(obj: unknown, targetKey: string): unknown {
if (obj == null) return undefined;
if (Array.isArray(obj)) {
for (const item of obj) {
const found = findKeyInObject(item, targetKey);
if (found !== undefined) return found;
}
return undefined;
}
if (typeof obj !== "object") return undefined;
const record = obj as Record<string, unknown>;
if (targetKey in record) return record[targetKey];
for (const [, value] of Object.entries(record)) {
const found = findKeyInObject(value, targetKey);
if (found !== undefined) return found;
}
return undefined;
}
/**
Parse Facebook marketplace search results into ListingDetails[]
*/
@@ -1027,16 +1079,18 @@ export default async function fetchFacebookItems(
};
};
const cookies = await ensureFacebookCookies();
const warmupCookies = await warmFacebookSession();
const warmupHeader = Object.entries(warmupCookies)
.map(([k, v]) => `${k}=${v}`)
.join("; ");
const userCookies = await loadCookiesOptional(FACEBOOK_COOKIE_CONFIG);
// Format cookies for HTTP header
const domain = "www.facebook.com";
const cookiesHeader = formatCookiesForHeader(cookies, domain);
if (!cookiesHeader) {
throw new Error(
"No valid Facebook cookies found. Please check that cookies are not expired and apply to facebook.com domain.",
);
}
const userCookiesHeader = formatCookiesForHeader(userCookies, domain);
const cookiesHeader = [warmupHeader, userCookiesHeader]
.filter(Boolean)
.join("; ");
const DELAY_MS = Math.max(1, Math.floor(1000 / requestsPerSecond));
@@ -1047,7 +1101,9 @@ export default async function fetchFacebookItems(
const searchUrl = `https://www.facebook.com/marketplace/${LOCATION}/search?query=${encodedQuery}&sortBy=creation_time_descend&exact=false`;
logger.log(`Fetching Facebook marketplace: ${searchUrl}`);
logger.log(`Using ${cookies.length} cookies for authentication`);
if (userCookies.length > 0) {
logger.log(`Using ${userCookies.length} cookies for authentication`);
}
let searchHtml: string;
let searchResponseUrl = searchUrl;
@@ -1100,6 +1156,13 @@ export default async function fetchFacebookItems(
return finalizeResults([]);
}
if (classification.kind === "checkpoint") {
logger.warn(
"Facebook marketplace returned a checkpoint challenge. This may require manual verification.",
);
return finalizeResults([]);
}
if (classification.unavailable) {
logger.warn("Facebook marketplace search returned an unavailable route.");
return finalizeResults([]);
@@ -1149,15 +1212,8 @@ export default async function fetchFacebookItems(
export async function fetchFacebookItem(
itemId: string,
): Promise<FacebookListingDetails | null> {
const cookies = await ensureFacebookCookies();
// Format cookies for HTTP header
const cookiesHeader = formatCookiesForHeader(cookies, "www.facebook.com");
if (!cookiesHeader) {
throw new Error(
"No valid Facebook cookies found. Please check that cookies are not expired and apply to facebook.com domain.",
);
}
const userCookies = await loadCookiesOptional(FACEBOOK_COOKIE_CONFIG);
const cookiesHeader = formatCookiesForHeader(userCookies, "www.facebook.com");
const itemUrl = `https://www.facebook.com/marketplace/item/${itemId}/`;
@@ -1230,6 +1286,14 @@ export async function fetchFacebookItem(
const classification = classifyFacebookResponse(itemHtml, itemResponseUrl);
if (classification.kind === "checkpoint") {
logExtractionMetrics(false, itemId);
logger.warn(
`Checkpoint challenge detected for item ${itemId}. Facebook may be limiting access.`,
);
return null;
}
if (classification.authGated) {
logExtractionMetrics(false, itemId);
logger.warn(

View File

@@ -0,0 +1,128 @@
// Facebook Marketplace session & challenge utilities
// ------------------ Types ------------------
export type ChallengeType =
| "login_wall"
| "checkpoint"
| "bad_headers"
| "rate_limited"
| "none";
// ------------------ Constants ------------------
const FACEBOOK_BROWSER_HEADERS: Record<string, string> = {
accept:
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
"cache-control": "no-cache",
"upgrade-insecure-requests": "1",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"sec-ch-ua":
'"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Linux"',
"user-agent":
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
};
// ------------------ Cookie Management ------------------
function parseSetCookies(setCookieHeaders: string[]): Record<string, string> {
const cookies: Record<string, string> = {};
for (const header of setCookieHeaders) {
const parts = header.split(";");
const firstPart = parts[0]?.trim();
if (!firstPart) continue;
const eqIdx = firstPart.indexOf("=");
if (eqIdx === -1) continue;
const name = firstPart.slice(0, eqIdx).trim();
const value = firstPart.slice(eqIdx + 1).trim();
if (name && value) {
cookies[name] = value;
}
}
return cookies;
}
function cookiesToHeader(cookies: Record<string, string>): string {
return Object.entries(cookies)
.map(([name, value]) => `${name}=${value}`)
.join("; ");
}
// ------------------ Session Warmup ------------------
export async function warmFacebookSession(): Promise<Record<string, string>> {
try {
const res = await fetch("https://www.facebook.com/", {
method: "GET",
headers: FACEBOOK_BROWSER_HEADERS,
redirect: "manual",
signal: AbortSignal.timeout(10000),
});
const setCookies = res.headers.getSetCookie?.() ?? [];
return parseSetCookies(setCookies);
} catch {
return {};
}
}
// ------------------ Challenge Detection ------------------
export function detectFacebookChallenge(
status: number,
html: string,
responseUrl: string,
): ChallengeType {
if (status === 400) {
return "bad_headers";
}
if (status === 429) {
return "rate_limited";
}
if (responseUrl.includes("/login/")) {
return "login_wall";
}
if (html.includes("You must log in") || html.includes("log in to continue")) {
return "login_wall";
}
if (
responseUrl.includes("/checkpoint/") ||
(html.includes("checkpoint") && html.includes("challenge"))
) {
return "checkpoint";
}
return "none";
}
// ------------------ Header Construction ------------------
export function buildFacebookHeaders(
cookieJar: Record<string, string>,
extraHeaders?: Record<string, string>,
): Record<string, string> {
const headers: Record<string, string> = {
...FACEBOOK_BROWSER_HEADERS,
};
const cookieString = cookiesToHeader(cookieJar);
if (cookieString) {
headers.cookie = cookieString;
}
if (extraHeaders) {
Object.assign(headers, extraHeaders);
}
return headers;
}

View File

@@ -0,0 +1,35 @@
import { describe, expect, test } from "bun:test";
import fetchEbayItems from "../../src/scrapers/ebay";
const LIVE_RESULT_LIMIT = 3;
const LIVE_TEST_TIMEOUT_MS = 30_000;
describe("eBay live parser", () => {
test(
"scrapes live search results into listing details",
async () => {
const results = await fetchEbayItems("iphone", 1, {
maxItems: LIVE_RESULT_LIMIT,
});
expect(results.length).toBeGreaterThan(0);
for (const listing of results) {
if (!listing.listingPrice) {
throw new Error(`Expected listing price for ${listing.url}`);
}
if (typeof listing.listingPrice.cents !== "number") {
throw new Error(`Expected listing cents for ${listing.url}`);
}
if (!listing.listingPrice.currency) {
throw new Error(`Expected listing currency for ${listing.url}`);
}
expect(listing.url).toStartWith("https://");
expect(listing.title.length).toBeGreaterThan(0);
expect(listing.listingPrice.cents).toBeGreaterThanOrEqual(0);
expect(listing.listingPrice.currency.length).toBeGreaterThan(0);
}
},
LIVE_TEST_TIMEOUT_MS,
);
});

View File

@@ -0,0 +1,44 @@
import { describe, expect, test } from "bun:test";
import fetchFacebookItems from "../../src/scrapers/facebook";
const LIVE_RESULT_LIMIT = 3;
const LIVE_TEST_TIMEOUT_MS = 30_000;
describe("Facebook live parser", () => {
test(
"scrapes live marketplace search results into listing details",
async () => {
if (!process.env.FACEBOOK_COOKIE?.trim()) {
throw new Error("FACEBOOK_COOKIE is required for Facebook live tests");
}
const results = await fetchFacebookItems(
"iphone",
1,
"toronto",
LIVE_RESULT_LIMIT,
);
expect(results.length).toBeGreaterThan(0);
for (const listing of results) {
if (!listing.listingPrice) {
throw new Error(`Expected listing price for ${listing.url}`);
}
if (typeof listing.listingPrice.cents !== "number") {
throw new Error(`Expected listing cents for ${listing.url}`);
}
if (!listing.listingPrice.currency) {
throw new Error(`Expected listing currency for ${listing.url}`);
}
expect(listing.url).toStartWith(
"https://www.facebook.com/marketplace/item/",
);
expect(listing.title.length).toBeGreaterThan(0);
expect(listing.listingPrice.cents).toBeGreaterThanOrEqual(0);
expect(listing.listingPrice.currency.length).toBeGreaterThan(0);
}
},
LIVE_TEST_TIMEOUT_MS,
);
});

View File

@@ -0,0 +1,38 @@
import { describe, expect, test } from "bun:test";
import fetchKijijiItems from "../../src/scrapers/kijiji";
const LIVE_TEST_TIMEOUT_MS = 30_000;
describe("Kijiji live parser", () => {
test(
"scrapes live search results into detailed listings",
async () => {
const results = await fetchKijijiItems(
"iphone",
1,
"https://www.kijiji.ca",
{ maxPages: 1 },
{ includeImages: false, sellerDataDepth: "basic" },
);
expect(results.length).toBeGreaterThan(0);
for (const listing of results) {
if (!listing.listingPrice) {
throw new Error(`Expected listing price for ${listing.url}`);
}
if (typeof listing.listingPrice.cents !== "number") {
throw new Error(`Expected listing cents for ${listing.url}`);
}
if (!listing.listingPrice.currency) {
throw new Error(`Expected listing currency for ${listing.url}`);
}
expect(listing.url).toStartWith("https://www.kijiji.ca/");
expect(listing.title.length).toBeGreaterThan(0);
expect(listing.listingPrice.cents).toBeGreaterThanOrEqual(0);
expect(listing.listingPrice.currency.length).toBeGreaterThan(0);
}
},
LIVE_TEST_TIMEOUT_MS,
);
});

View File

@@ -50,11 +50,11 @@ export const tools = [
},
priceMin: {
type: "number",
description: "Minimum price in cents",
description: "Minimum price in dollars",
},
priceMax: {
type: "number",
description: "Maximum price in cents",
description: "Maximum price in dollars",
},
unstableFilter: {
type: "boolean",
@@ -107,11 +107,11 @@ export const tools = [
},
minPrice: {
type: "number",
description: "Minimum price filter",
description: "Minimum price in dollars",
},
maxPrice: {
type: "number",
description: "Maximum price filter",
description: "Maximum price in dollars",
},
strictMode: {
type: "boolean",

View File

@@ -128,6 +128,46 @@ describe("MCP protocol unstableFilter", () => {
expect(String(calledUrl)).toContain("unstableFilter=true");
});
test("search_kijiji should document price filters as dollars", () => {
const tool = tools.find((candidate) => candidate.name === "search_kijiji");
const priceMin = tool?.inputSchema.properties.priceMin as {
description: string;
};
const priceMax = tool?.inputSchema.properties.priceMax as {
description: string;
};
expect(priceMin.description).toContain("dollars");
expect(priceMax.description).toContain("dollars");
});
test("handler should forward Kijiji dollar price filters to API", async () => {
await handleMcpRequest(
new Request("http://localhost", {
method: "POST",
body: JSON.stringify({
jsonrpc: "2.0",
id: 1,
method: "tools/call",
params: {
name: "search_kijiji",
arguments: {
query: "macbook",
priceMin: 999.99,
priceMax: 1000,
},
},
}),
}),
);
const calledUrl = (global.fetch as unknown as ReturnType<typeof mock>).mock
.calls[0]?.[0];
expect(String(calledUrl)).toContain("priceMin=999.99");
expect(String(calledUrl)).toContain("priceMax=1000");
});
test("handler should forward unstableFilter=true for search_facebook", async () => {
await handleMcpRequest(
new Request("http://localhost", {
@@ -204,4 +244,44 @@ describe("MCP protocol unstableFilter", () => {
.calls[0]?.[0];
expect(String(calledUrl)).toContain("unstableFilter=true");
});
test("search_ebay should document price filters as dollars", () => {
const tool = tools.find((candidate) => candidate.name === "search_ebay");
const minPrice = tool?.inputSchema.properties.minPrice as {
description: string;
};
const maxPrice = tool?.inputSchema.properties.maxPrice as {
description: string;
};
expect(minPrice.description).toContain("dollars");
expect(maxPrice.description).toContain("dollars");
});
test("handler should forward eBay dollar price filters to API", async () => {
await handleMcpRequest(
new Request("http://localhost", {
method: "POST",
body: JSON.stringify({
jsonrpc: "2.0",
id: 1,
method: "tools/call",
params: {
name: "search_ebay",
arguments: {
query: "macbook",
minPrice: 999.99,
maxPrice: 1000,
},
},
}),
}),
);
const calledUrl = (global.fetch as unknown as ReturnType<typeof mock>).mock
.calls[0]?.[0];
expect(String(calledUrl)).toContain("minPrice=999.99");
expect(String(calledUrl)).toContain("maxPrice=1000");
});
});