Compare commits
6 Commits
5c732287c5
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| ec545723bb | |||
| 0a246a29bf | |||
| 7ab33d0b02 | |||
| d2c3c07e7d | |||
| 0470a7bec7 | |||
| 89ad1c521f |
104
FMARKETPLACE.md
104
FMARKETPLACE.md
@@ -1,44 +1,56 @@
|
||||
# Facebook Marketplace API Reverse Engineering
|
||||
|
||||
## Overview
|
||||
This document tracks findings from reverse-engineering Facebook Marketplace APIs for listing details.
|
||||
|
||||
This document tracks findings from reverse-engineering Facebook Marketplace APIs for
|
||||
listing details.
|
||||
|
||||
## Current Implementation Status
|
||||
|
||||
- Search functionality: Implemented in `src/facebook.ts`
|
||||
- Individual listing details: Not yet implemented
|
||||
|
||||
## Findings
|
||||
|
||||
### Step 1: Initial Setup
|
||||
|
||||
- Using Chrome DevTools to inspect Facebook Marketplace
|
||||
- Need to authenticate with Facebook account to access marketplace data
|
||||
- Cookies required for full access
|
||||
- Current status: Successfully logged in and accessed marketplace data
|
||||
|
||||
### Step 2: Individual Listing Details Analysis - COMPLETED
|
||||
|
||||
- **Data Location**: Embedded in HTML script tags within `require` array structure
|
||||
- **Path**: `require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- **Path**:
|
||||
`require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- **Authentication**: Required for full data access
|
||||
- **Current Status**: Successfully reverse-engineered the API structure and data extraction method
|
||||
- **Current Status**: Successfully reverse-engineered the API structure and data
|
||||
extraction method
|
||||
|
||||
### API Endpoints Discovered
|
||||
|
||||
#### Search Endpoint
|
||||
|
||||
- URL: `https://www.facebook.com/marketplace/{location}/search`
|
||||
- Parameters: `query`, `sortBy`, `exact`
|
||||
- Data embedded in HTML script tags with `require` structure
|
||||
- Authentication: Required (cookies)
|
||||
|
||||
#### Listing Details Endpoint
|
||||
|
||||
- **URL Structure**: `https://www.facebook.com/marketplace/item/{listing_id}/`
|
||||
- **Data Source**: Server-side rendered HTML with embedded JSON data in script tags
|
||||
- **Data Structure**: Relay/GraphQL style data structure under `require[0][3].__bbox.require[...].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- **Extraction Method**: Parse JSON from script tags containing marketplace data, navigate to the target object
|
||||
- **Data Structure**: Relay/GraphQL style data structure under
|
||||
`require[0][3].__bbox.require[...].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- **Extraction Method**: Parse JSON from script tags containing marketplace data,
|
||||
navigate to the target object
|
||||
- **Authentication**: Required (cookies)
|
||||
|
||||
### Listing Data Structure Discovered (Current - 2026)
|
||||
|
||||
The current Facebook Marketplace API returns a comprehensive `GroupCommerceProductItem` object with the following key properties:
|
||||
The current Facebook Marketplace API returns a comprehensive `GroupCommerceProductItem`
|
||||
object with the following key properties:
|
||||
|
||||
```typescript
|
||||
interface FacebookMarketplaceItem {
|
||||
@@ -151,6 +163,7 @@ interface FacebookMarketplaceItem {
|
||||
```
|
||||
|
||||
### Example Data Extracted (Current Structure)
|
||||
|
||||
```json
|
||||
{
|
||||
"__typename": "GroupCommerceProductItem",
|
||||
@@ -228,36 +241,47 @@ interface FacebookMarketplaceItem {
|
||||
## Data Extraction Method
|
||||
|
||||
### Current Method (2026)
|
||||
Facebook Marketplace listing data is embedded in JSON within `<script>` tags in the HTML response. The extraction process:
|
||||
|
||||
1. **Find the Correct Script**: Look for script tags containing marketplace listing data by searching for key fields like `marketplace_listing_title`, `redacted_description`, and `formatted_price`.
|
||||
Facebook Marketplace listing data is embedded in JSON within `<script>` tags in the HTML
|
||||
response. The extraction process:
|
||||
|
||||
1. **Find the Correct Script**: Look for script tags containing marketplace listing data
|
||||
by searching for key fields like `marketplace_listing_title`, `redacted_description`,
|
||||
and `formatted_price`.
|
||||
|
||||
2. **Parse JSON Structure**: The data is nested within a `require` array structure:
|
||||
```
|
||||
require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target
|
||||
```
|
||||
|
||||
3. **Navigate to Target Object**: The actual listing data is a `GroupCommerceProductItem` object containing comprehensive information about the listing, seller, and vehicle details.
|
||||
3. **Navigate to Target Object**: The actual listing data is a
|
||||
`GroupCommerceProductItem` object containing comprehensive information about the
|
||||
listing, seller, and vehicle details.
|
||||
|
||||
4. **Handle Dynamic Structure**: Facebook may change the exact path, so robust extraction should search for the target object recursively within the parsed JSON.
|
||||
4. **Handle Dynamic Structure**: Facebook may change the exact path, so robust
|
||||
extraction should search for the target object recursively within the parsed JSON.
|
||||
|
||||
### Authentication Requirements
|
||||
|
||||
- Valid Facebook session cookies are required
|
||||
- User must be logged in to Facebook
|
||||
- Marketplace access may be location-restricted
|
||||
|
||||
## Tools Used
|
||||
|
||||
- Chrome DevTools Protocol
|
||||
- Network monitoring
|
||||
- HTML/script parsing
|
||||
- JSON structure analysis
|
||||
|
||||
## Implementation Status
|
||||
|
||||
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
|
||||
- ✅ Identified current data structure and extraction method (2026)
|
||||
- ✅ Documented comprehensive GroupCommerceProductItem interface
|
||||
- ✅ Implemented `extractFacebookItemData()` function with script parsing logic
|
||||
- ✅ Implemented `parseFacebookItem()` function to convert GroupCommerceProductItem to ListingDetails
|
||||
- ✅ Implemented `parseFacebookItem()` function to convert GroupCommerceProductItem to
|
||||
ListingDetails
|
||||
- ✅ Implemented `fetchFacebookItem()` function with authentication and error handling
|
||||
- ✅ Updated TypeScript interfaces to match current API structure
|
||||
- ✅ Added robust extraction with fallback methods for changing API paths
|
||||
@@ -266,12 +290,15 @@ Facebook Marketplace listing data is embedded in JSON within `<script>` tags in
|
||||
|
||||
### Core Functions Implemented
|
||||
|
||||
1. **`extractFacebookItemData(htmlString)`**: Extracts marketplace item data from HTML-embedded JSON in script tags
|
||||
1. **`extractFacebookItemData(htmlString)`**: Extracts marketplace item data from
|
||||
HTML-embedded JSON in script tags
|
||||
- Searches for scripts containing marketplace listing data
|
||||
- Uses primary path: `require[0][3][0].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- Uses primary path:
|
||||
`require[0][3][0].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- Falls back to recursive search for GroupCommerceProductItem objects
|
||||
|
||||
2. **`parseFacebookItem(item)`**: Converts Facebook's GroupCommerceProductItem to unified ListingDetails format
|
||||
2. **`parseFacebookItem(item)`**: Converts Facebook’s GroupCommerceProductItem to
|
||||
unified ListingDetails format
|
||||
- Handles pricing (FREE listings, CAD currency)
|
||||
- Extracts seller information, location, and status
|
||||
- Supports vehicle-specific metadata
|
||||
@@ -284,25 +311,31 @@ Facebook Marketplace listing data is embedded in JSON within `<script>` tags in
|
||||
- Returns parsed ListingDetails or null on failure
|
||||
|
||||
### Authentication Requirements
|
||||
- Facebook session cookies required in `./cookies/facebook.json` or provided as parameter
|
||||
|
||||
- Facebook session cookies required in `./cookies/facebook.json` or provided as
|
||||
parameter
|
||||
- Cookies must include valid authentication tokens for marketplace access
|
||||
- Handles cookie expiration and domain validation
|
||||
|
||||
## Current Implementation Status - 2026 Verification
|
||||
|
||||
### Step 3: API Verification and Current Structure Analysis (January 2026)
|
||||
|
||||
- **Verification Date**: January 22, 2026
|
||||
- **Status**: Successfully verified current Facebook Marketplace API structure
|
||||
- **Data Source**: Embedded JSON in HTML script tags (server-side rendered)
|
||||
- **Extraction Path**: `require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- **Extraction Path**:
|
||||
`require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
|
||||
#### Verified Listing Structure (Real Example - 2006 Hyundai Tiburon)
|
||||
|
||||
- **Listing ID**: 1226468515995685
|
||||
- **Title**: "2006 Hyundai Tiburon"
|
||||
- **Title**: “2006 Hyundai Tiburon”
|
||||
- **Price**: CA$3,000 (formatted_price.text)
|
||||
- **Raw Price Data**: {"amount_with_offset": "300000", "currency": "CAD", "amount": "3000.00"}
|
||||
- **Raw Price Data**: {"amount_with_offset": “300000”, “currency”: “CAD”, “amount”:
|
||||
"3000.00"}
|
||||
- **Location**: Hamilton, ON (with coordinates: 43.250427246094, -79.963989257812)
|
||||
- **Description**: "As is" (redacted_description.text)
|
||||
- **Description**: “As is” (redacted_description.text)
|
||||
- **Vehicle Details**:
|
||||
- Make: Hyundai
|
||||
- Model: Tiburon
|
||||
@@ -323,41 +356,54 @@ Facebook Marketplace listing data is embedded in JSON within `<script>` tags in
|
||||
- **Messaging**: Enabled
|
||||
|
||||
#### Current API Characteristics
|
||||
|
||||
- **Authentication**: Still requires valid Facebook session cookies
|
||||
- **Data Format**: Server-side rendered HTML with embedded GraphQL/Relay JSON
|
||||
- **Structure Stability**: Primary extraction path remains functional
|
||||
- **Additional Features**: Includes marketplace ratings, seller verification badges, cross-posting info
|
||||
- **Additional Features**: Includes marketplace ratings, seller verification badges,
|
||||
cross-posting info
|
||||
|
||||
### API Changes Observed Since 2024 Documentation
|
||||
|
||||
- **Minimal Changes**: Core data structure largely unchanged
|
||||
- **Enhanced Fields**: Added more detailed vehicle specifications and seller profile information
|
||||
- **GraphQL Integration**: Deeper integration with Facebook's GraphQL infrastructure
|
||||
- **Enhanced Fields**: Added more detailed vehicle specifications and seller profile
|
||||
information
|
||||
- **GraphQL Integration**: Deeper integration with Facebook’s GraphQL infrastructure
|
||||
- **Security Features**: Additional integrity checks and reporting mechanisms
|
||||
|
||||
### Multi-Category Testing Results (January 2026)
|
||||
|
||||
Successfully tested extraction across different listing categories:
|
||||
|
||||
#### 1. Vehicle Listings (Automotive)
|
||||
|
||||
- **Example**: 2006 Hyundai Tiburon (ID: 1226468515995685)
|
||||
- **Status**: ✅ Fully functional
|
||||
- **Data Extracted**: Complete vehicle specs, pricing, seller info, location coordinates
|
||||
- **Unique Fields**: vehicle_make_display_name, vehicle_odometer_data, vehicle_transmission_type, vehicle_exterior_color, vehicle_interior_color, vehicle_fuel_type
|
||||
- **Unique Fields**: vehicle_make_display_name, vehicle_odometer_data,
|
||||
vehicle_transmission_type, vehicle_exterior_color, vehicle_interior_color,
|
||||
vehicle_fuel_type
|
||||
|
||||
#### 2. Electronics Listings
|
||||
|
||||
- **Example**: Nintendo Switch (ID: 3903865769914262)
|
||||
- **Status**: ✅ Fully functional
|
||||
- **Data Extracted**: Title, price (CA$140), location (Toronto, ON), condition (Used - like new), seller (Yitao Hou)
|
||||
- **Data Extracted**: Title, price (CA$140), location (Toronto, ON), condition (Used -
|
||||
like new), seller (Yitao Hou)
|
||||
- **Category**: Electronics (category_id: 479353692612078)
|
||||
- **Notes**: Standard GroupCommerceProductItem structure applies
|
||||
|
||||
#### 3. Home Goods/Furniture Listings
|
||||
|
||||
- **Example**: Tabletop Mirror (cat not included) (ID: 1082389057290709)
|
||||
- **Status**: ✅ Fully functional
|
||||
- **Data Extracted**: Title, price (CA$5), location (Mississauga, ON), condition (Used - like new), seller (Rohit Rehan)
|
||||
- **Data Extracted**: Title, price (CA$5), location (Mississauga, ON), condition (Used -
|
||||
like new), seller (Rohit Rehan)
|
||||
- **Category**: Home Goods (category_id: 1569171756675761)
|
||||
- **Notes**: Includes detailed description and delivery options
|
||||
|
||||
#### Testing Summary
|
||||
|
||||
- **Extraction Method**: Consistent across all categories
|
||||
- **Data Structure**: GroupCommerceProductItem interface works for all listing types
|
||||
- **Authentication**: Required for all categories
|
||||
@@ -365,16 +411,20 @@ Successfully tested extraction across different listing categories:
|
||||
- **Edge Cases**: All tested listings were active/in-person pickup
|
||||
|
||||
## Implementation Status - COMPLETED (January 2026)
|
||||
|
||||
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
|
||||
- ✅ Verified current API structure and extraction method (January 2026)
|
||||
- ✅ Tested extraction across multiple listing categories (vehicles, electronics, home goods)
|
||||
- ✅ Implemented comprehensive error handling for sold/removed listings and authentication failures
|
||||
- ✅ Tested extraction across multiple listing categories (vehicles, electronics, home
|
||||
goods)
|
||||
- ✅ Implemented comprehensive error handling for sold/removed listings and
|
||||
authentication failures
|
||||
- ✅ Enhanced rate limiting and retry logic (already robust)
|
||||
- ✅ Added monitoring and metrics for API stability detection
|
||||
- ✅ Updated all scraper functions to use verified extraction methods
|
||||
- ✅ Documented comprehensive GroupCommerceProductItem interface with real examples
|
||||
|
||||
## Next Steps (Future Maintenance)
|
||||
|
||||
1. Monitor extraction success rates for API change detection
|
||||
2. Update extraction paths if Facebook changes their API structure
|
||||
3. Add support for additional marketplace features as they become available
|
||||
|
||||
145
KIJIJI.md
145
KIJIJI.md
@@ -1,9 +1,13 @@
|
||||
# Kijiji API Findings
|
||||
|
||||
## Overview
|
||||
Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management.
|
||||
|
||||
Kijiji is a Canadian classifieds marketplace that uses a modern web application built
|
||||
with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with
|
||||
client-side state management.
|
||||
|
||||
## Initial Page Load (Homepage)
|
||||
|
||||
- **URL**: https://www.kijiji.ca/
|
||||
- **Architecture**: Server-side rendered React application with Next.js
|
||||
- **Data Sources**:
|
||||
@@ -12,18 +16,27 @@ Kijiji is a Canadian classifieds marketplace that uses a modern web application
|
||||
- No initial API calls for listings - data appears to be embedded in HTML
|
||||
|
||||
## Search Results Page
|
||||
|
||||
- **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0`
|
||||
- **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0`
|
||||
- **Technology Stack**: Next.js with Apollo GraphQL client
|
||||
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized GraphQL cache
|
||||
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized
|
||||
GraphQL cache
|
||||
|
||||
### GraphQL Data Structure
|
||||
|
||||
#### Data Location
|
||||
Search results data is embedded in the Next.js page props under `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data.
|
||||
|
||||
Search results data is embedded in the Next.js page props under
|
||||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server
|
||||
and sent to the client.
|
||||
Each page (including pagination) has its own pre-rendered data.
|
||||
|
||||
#### Search Results Container
|
||||
The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination parameters.
|
||||
|
||||
The search results are stored directly in the Apollo ROOT_QUERY with keys following the
|
||||
pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination
|
||||
parameters.
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -33,17 +46,20 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
```
|
||||
|
||||
#### Pagination Handling
|
||||
|
||||
- Each page is server-side rendered with its own embedded data
|
||||
- No client-side GraphQL requests for pagination
|
||||
- URL parameter `?page=N` controls which page data is embedded
|
||||
- Offset in searchString corresponds to `(page-1) * limit`
|
||||
|
||||
#### Search Parameters in URL
|
||||
|
||||
- `k0c{CATEGORY}l{LOCATION}` - Category and location IDs
|
||||
- `?page=N` - Page number (1-based)
|
||||
- Data contains `offset` and `limit` for API-style pagination
|
||||
|
||||
#### Individual Listing Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "1732061412",
|
||||
@@ -90,6 +106,7 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
```
|
||||
|
||||
### URL Parameters
|
||||
|
||||
- `sort=MATCH` - Sort by relevance
|
||||
- `order=DESC` - Descending order
|
||||
- `type=OFFER` - Show offerings (not wanted ads)
|
||||
@@ -102,6 +119,7 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
- `eaTopAdPosition=1` - ?
|
||||
|
||||
### Image API
|
||||
|
||||
- **Endpoint**: `https://media.kijiji.ca/api/v1/`
|
||||
- **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg`
|
||||
- **Sizes**: 200, 300, 400, 500 pixels
|
||||
@@ -109,10 +127,12 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
### Categories and Locations
|
||||
|
||||
#### Category Structure
|
||||
Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include:
|
||||
|
||||
Categories are hierarchical with parent-child relationships.
|
||||
The main categories under “Buy & Sell” include:
|
||||
|
||||
| ID | Name | Total Results (iPhone search) |
|
||||
|----|------|------------------------------|
|
||||
| --- | --- | --- |
|
||||
| 10 | Buy & Sell | 19956 |
|
||||
| 12 | Arts & Collectibles | 149 |
|
||||
| 767 | Audio | 481 |
|
||||
@@ -145,10 +165,11 @@ Categories are hierarchical with parent-child relationships. The main categories
|
||||
| 26 | Other | 286 |
|
||||
|
||||
#### Location Structure
|
||||
Locations are also hierarchical, with provinces/states under the main "Canada" location:
|
||||
|
||||
Locations are also hierarchical, with provinces/states under the main “Canada” location:
|
||||
|
||||
| ID | Name | Total Results (iPhone search) |
|
||||
|----|------|------------------------------|
|
||||
| --- | --- | --- |
|
||||
| 0 | Canada | - |
|
||||
| 9001 | Québec | 2516 |
|
||||
| 9002 | Nova Scotia | 875 |
|
||||
@@ -163,16 +184,20 @@ Locations are also hierarchical, with provinces/states under the main "Canada" l
|
||||
| 9011 | Prince Edward Island | 31 |
|
||||
|
||||
#### URL Patterns
|
||||
|
||||
- Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0`
|
||||
- Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}`
|
||||
- Combined: `/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
|
||||
- Combined:
|
||||
`/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
|
||||
|
||||
### Pagination
|
||||
|
||||
- Uses offset-based pagination
|
||||
- 40 results per page
|
||||
- Total count provided in pagination metadata
|
||||
|
||||
## Authentication & User Management
|
||||
|
||||
- **Authentication System**: OAuth2-based using CIS (Customer Identity Service)
|
||||
- **Identity Provider**: `id.kijiji.ca`
|
||||
- **OAuth2 Flow**:
|
||||
@@ -184,24 +209,30 @@ Locations are also hierarchical, with provinces/states under the main "Canada" l
|
||||
- **User Features**: Saved searches, messaging, flagging require authentication
|
||||
|
||||
## Posting API
|
||||
|
||||
- **Posting Flow**: Requires authentication, redirects to login if not authenticated
|
||||
- **Posting URL**: `https://www.kijiji.ca/p-post-ad.html`
|
||||
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated users
|
||||
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing)
|
||||
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated
|
||||
users
|
||||
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in
|
||||
anonymous browsing)
|
||||
|
||||
## GraphQL API Endpoint
|
||||
|
||||
- **URL**: `https://www.kijiji.ca/anvil/api`
|
||||
- **Method**: POST
|
||||
- **Content-Type**: application/json
|
||||
- **Headers**:
|
||||
- `apollo-require-preflight: true`
|
||||
- Standard CORS headers
|
||||
- **Authentication**: No authentication required for basic queries (uses cookies for session tracking)
|
||||
- **Authentication**: No authentication required for basic queries (uses cookies for
|
||||
session tracking)
|
||||
- **Technology**: Apollo GraphQL server
|
||||
|
||||
### Sample GraphQL Queries Discovered
|
||||
|
||||
#### Get Search Categories
|
||||
|
||||
```graphql
|
||||
query getSearchCategories($locale: String!) {
|
||||
searchCategories {
|
||||
@@ -218,6 +249,7 @@ Variables: `{"locale": "en-CA"}`
|
||||
Response includes hierarchical category structure with IDs and localized names.
|
||||
|
||||
#### Get Geocode from IP (fails for current IP)
|
||||
|
||||
```graphql
|
||||
query GetGeocodeReverseFromIp {
|
||||
geocodeReverseFromIp {
|
||||
@@ -229,9 +261,11 @@ query GetGeocodeReverseFromIp {
|
||||
}
|
||||
```
|
||||
|
||||
This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges.
|
||||
This query fails for the current IP address, suggesting geolocation-based features may
|
||||
not work or require different IP ranges.
|
||||
|
||||
#### Get Category Path
|
||||
|
||||
```graphql
|
||||
query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
|
||||
category(id: $categoryId) {
|
||||
@@ -256,25 +290,33 @@ Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}`
|
||||
## Latest Findings (2026-01-21)
|
||||
|
||||
### Client-Side GraphQL Queries Observed
|
||||
|
||||
- **getSearchCategories**: Retrieves category hierarchy for search filters
|
||||
- **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP)
|
||||
|
||||
### GraphQL Schema Insights
|
||||
Testing direct GraphQL queries revealed:
|
||||
- Field "searchResults" does not exist on Query type
|
||||
- Suggested alternatives: "searchResultsPage" or "searchUrl"
|
||||
- This suggests the search functionality may use different GraphQL operations than direct queries
|
||||
|
||||
The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation.
|
||||
Testing direct GraphQL queries revealed:
|
||||
- Field “searchResults” does not exist on Query type
|
||||
- Suggested alternatives: “searchResultsPage” or “searchUrl”
|
||||
- This suggests the search functionality may use different GraphQL operations than
|
||||
direct queries
|
||||
|
||||
The embedded Apollo state approach appears to be the primary method for accessing search
|
||||
data, with GraphQL used for auxiliary operations like categories and geolocation.
|
||||
|
||||
### Server-Side Rendering Architecture
|
||||
Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for:
|
||||
|
||||
Search results are fully server-side rendered with data embedded in HTML. Each page
|
||||
(including pagination) contains its own pre-rendered data.
|
||||
No client-side GraphQL requests are made for:
|
||||
|
||||
- Initial search results
|
||||
- Pagination navigation
|
||||
- Search result data
|
||||
|
||||
### Network Analysis Findings
|
||||
|
||||
- GraphQL endpoint: `https://www.kijiji.ca/anvil/api`
|
||||
- Method: POST
|
||||
- Content-Type: application/json
|
||||
@@ -282,7 +324,10 @@ Search results are fully server-side rendered with data embedded in HTML. Each p
|
||||
- Cookies required for session tracking
|
||||
|
||||
### Embedded Data Structure
|
||||
Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object. The data includes:
|
||||
|
||||
Search results data is embedded in the HTML within Next.js
|
||||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object.
|
||||
The data includes:
|
||||
|
||||
- Individual ad listings with complete metadata
|
||||
- Pagination information
|
||||
@@ -290,20 +335,24 @@ Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.
|
||||
- Category/location hierarchies
|
||||
|
||||
### Current Scraper Implementation
|
||||
|
||||
The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state:
|
||||
|
||||
- Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML
|
||||
- Filters Apollo keys containing "Listing" to find ad data
|
||||
- Filters Apollo keys containing “Listing” to find ad data
|
||||
- Extracts `url`, `title`, and other metadata from each listing
|
||||
- Successfully scrapes listings without needing API authentication
|
||||
|
||||
### Authentication Status
|
||||
- **Search functionality**: No authentication required - all search and listing data accessible anonymously
|
||||
|
||||
- **Search functionality**: No authentication required - all search and listing data
|
||||
accessible anonymously
|
||||
- **Posting functionality**: Requires authentication (redirects to login)
|
||||
- **User features**: Saved searches, messaging require authentication
|
||||
- **Rate limiting**: May apply but not observed in anonymous browsing
|
||||
|
||||
### Pagination Implementation
|
||||
|
||||
- Each page is a separate server-rendered route
|
||||
- URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}`
|
||||
- No client-side pagination API calls
|
||||
@@ -313,20 +362,24 @@ The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo
|
||||
## URL Pattern Analysis
|
||||
|
||||
### Search URL Structure
|
||||
|
||||
`https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}`
|
||||
|
||||
#### Examples Observed:
|
||||
|
||||
- All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada)
|
||||
- Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones)
|
||||
- With pagination: `/b-canada/iphone/page-2/k0l0`
|
||||
|
||||
#### URL Components:
|
||||
|
||||
- `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
|
||||
- `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.)
|
||||
- `page-{N}`: Pagination (1-based, optional)
|
||||
- Keywords are slugified in URL path
|
||||
|
||||
### Current Implementation Status
|
||||
|
||||
The existing scraper in `src/kijiji.ts` successfully implements the approach:
|
||||
- Parses embedded Apollo state from HTML responses
|
||||
- Handles rate limiting and retries
|
||||
@@ -336,14 +389,22 @@ The existing scraper in `src/kijiji.ts` successfully implements the approach:
|
||||
## Listing Details Page
|
||||
|
||||
### Overview
|
||||
Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server.
|
||||
|
||||
Similar to search results, listing details pages use server-side rendering with embedded
|
||||
Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing
|
||||
data - all information is pre-rendered on the server.
|
||||
|
||||
### Data Architecture
|
||||
- **Server-Side Rendering**: Each listing page is fully server-rendered with data embedded in HTML
|
||||
- **Embedded Apollo State**: Listing data is stored in `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
|
||||
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API
|
||||
|
||||
- **Server-Side Rendering**: Each listing page is fully server-rendered with data
|
||||
embedded in HTML
|
||||
- **Embedded Apollo State**: Listing data is stored in
|
||||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
|
||||
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings,
|
||||
user profiles) fetched via GraphQL API
|
||||
|
||||
### Listing Data Structure
|
||||
|
||||
The main listing data follows the same pattern as search results:
|
||||
|
||||
```json
|
||||
@@ -385,40 +446,50 @@ The main listing data follows the same pattern as search results:
|
||||
```
|
||||
|
||||
### Client-Side GraphQL Queries
|
||||
|
||||
When loading a listing details page, the following GraphQL queries are executed:
|
||||
|
||||
#### 1. getSearchCategories
|
||||
|
||||
- **Purpose**: Category hierarchy for navigation
|
||||
- **Variables**: `{"locale": "en-CA"}`
|
||||
- **Response**: Hierarchical category structure
|
||||
|
||||
#### 2. getCampaignsForVip
|
||||
|
||||
- **Purpose**: Advertisement targeting data
|
||||
- **Variables**: `{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
|
||||
- **Variables**:
|
||||
`{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
|
||||
- **Response**: Campaign/ads data (usually null)
|
||||
|
||||
#### 3. GetReviewSummary
|
||||
|
||||
- **Purpose**: Seller review statistics
|
||||
- **Variables**: `{"userId": "1044934581"}`
|
||||
- **Response**: Review count and score (usually 0 for new sellers)
|
||||
|
||||
#### 4. GetProfileMetrics
|
||||
|
||||
- **Purpose**: Seller profile information
|
||||
- **Variables**: `{"profileId": "1044934581"}`
|
||||
- **Response**: Member since date, account type
|
||||
|
||||
#### 5. GetListingsSimilar
|
||||
|
||||
- **Purpose**: Similar listings for cross-selling
|
||||
- **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}`
|
||||
- **Response**: Array of similar listings with basic metadata
|
||||
|
||||
#### 6. GetGeocodeReverseFromIp
|
||||
|
||||
- **Purpose**: Geolocation-based features
|
||||
- **Variables**: `{}`
|
||||
- **Response**: Fails with 404 for most IPs
|
||||
|
||||
### Implementation Status
|
||||
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing details from embedded Apollo state:
|
||||
|
||||
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing
|
||||
details from embedded Apollo state:
|
||||
|
||||
- ✅ Extracts title, description, price, location
|
||||
- ✅ Handles contact-based pricing ("Please Contact")
|
||||
@@ -427,22 +498,30 @@ The existing `parseListing()` function in `src/kijiji.ts` successfully extracts
|
||||
- ✅ Works without authentication or API keys
|
||||
|
||||
### Key Findings
|
||||
1. **No Dedicated Listing API**: Unlike search results, there's no separate GraphQL query for individual listing data
|
||||
2. **Complete Data Available**: All listing information is embedded in the initial HTML response
|
||||
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data (reviews, similar listings)
|
||||
|
||||
1. **No Dedicated Listing API**: Unlike search results, there’s no separate GraphQL
|
||||
query for individual listing data
|
||||
2. **Complete Data Available**: All listing information is embedded in the initial HTML
|
||||
response
|
||||
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data
|
||||
(reviews, similar listings)
|
||||
4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages
|
||||
|
||||
### Current Scraper Implementation
|
||||
|
||||
The scraper successfully extracts listing details by:
|
||||
1. Fetching the listing URL HTML
|
||||
2. Parsing embedded `__NEXT_DATA__` Apollo state
|
||||
3. Extracting the `Listing:{id}` object from Apollo cache
|
||||
4. Mapping fields to typed `ListingDetails` interface
|
||||
|
||||
This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches.
|
||||
This approach works reliably without requiring authentication or dealing with rate
|
||||
limiting on individual listing fetches.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Explore posting/authentication APIs (requires user login)
|
||||
- Investigate if GraphQL API can be used for programmatic access with proper authentication
|
||||
- Investigate if GraphQL API can be used for programmatic access with proper
|
||||
authentication
|
||||
- Test rate limiting patterns and optimal scraping strategies
|
||||
- Document additional category and location ID mappings
|
||||
|
||||
@@ -1,14 +1,21 @@
|
||||
# opencode Monorepo Config Adoption Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
|
||||
> to implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Adopt opencode-style monorepo config: Turbo task orchestration, workspace dep catalog, shared root tsconfig, bunfig.toml, and `exports` field in all packages.
|
||||
**Goal:** Adopt opencode-style monorepo config: Turbo task orchestration, workspace dep
|
||||
catalog, shared root tsconfig, bunfig.toml, and `exports` field in all packages.
|
||||
|
||||
**Architecture:** Pure config changes across 10 files — no source code touched. Root config files are added/updated first, then per-package files updated to reference them. Changes are independent within each task and safe to commit atomically.
|
||||
**Architecture:** Pure config changes across 10 files — no source code touched.
|
||||
Root config files are added/updated first, then per-package files updated to reference
|
||||
them. Changes are independent within each task and safe to commit atomically.
|
||||
|
||||
**Tech Stack:** Bun workspaces, Turbo 2.x, @tsconfig/bun, TypeScript (tsgo / @typescript/native-preview)
|
||||
**Tech Stack:** Bun workspaces, Turbo 2.x, @tsconfig/bun, TypeScript (tsgo /
|
||||
@typescript/native-preview)
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
## File Map
|
||||
|
||||
@@ -25,14 +32,16 @@
|
||||
| `packages/api-server/tsconfig.json` | Modify | Slim — extends root, paths only |
|
||||
| `packages/mcp-server/tsconfig.json` | Modify | Slim — extends root, paths only |
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
### Task 1: Add `bunfig.toml` and `turbo.json`
|
||||
|
||||
Two new root config files with no dependencies on other tasks.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `bunfig.toml`
|
||||
|
||||
- Create: `turbo.json`
|
||||
|
||||
- [ ] **Step 1: Create `bunfig.toml`**
|
||||
@@ -83,13 +92,15 @@ git add bunfig.toml turbo.json
|
||||
git commit -m "chore: add bunfig.toml and turbo.json"
|
||||
```
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
### Task 2: Create root `tsconfig.json`
|
||||
|
||||
Shared base tsconfig all packages will extend. Extracts the common options currently duplicated in all 3 per-package tsconfigs.
|
||||
Shared base tsconfig all packages will extend.
|
||||
Extracts the common options currently duplicated in all 3 per-package tsconfigs.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `tsconfig.json`
|
||||
|
||||
- [ ] **Step 1: Create root `tsconfig.json`**
|
||||
@@ -130,13 +141,15 @@ git add tsconfig.json
|
||||
git commit -m "chore: add shared root tsconfig.json"
|
||||
```
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
### Task 3: Update root `package.json`
|
||||
|
||||
Add workspace catalog, `turbo` + `@tsconfig/bun` devDependencies, and update scripts to use `turbo run`.
|
||||
Add workspace catalog, `turbo` + `@tsconfig/bun` devDependencies, and update scripts to
|
||||
use `turbo run`.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `package.json`
|
||||
|
||||
- [ ] **Step 1: Replace root `package.json`**
|
||||
@@ -180,7 +193,11 @@ Write this complete file:
|
||||
}
|
||||
```
|
||||
|
||||
> **Note on catalog versions:** The catalog pins exact versions. The values above are taken from the current package installs. If `@types/bun` was `latest`, check `node_modules/@types/bun/package.json` for the actual installed version and use that. Same for `@typescript/native-preview`.
|
||||
> **Note on catalog versions:** The catalog pins exact versions.
|
||||
> The values above are taken from the current package installs.
|
||||
> If `@types/bun` was `latest`, check `node_modules/@types/bun/package.json` for the
|
||||
> actual installed version and use that.
|
||||
> Same for `@typescript/native-preview`.
|
||||
|
||||
- [ ] **Step 2: Check actual installed versions**
|
||||
|
||||
@@ -208,7 +225,8 @@ Expected: lock file updated, `turbo` and `@tsconfig/bun` appear in `node_modules
|
||||
bunx turbo run typecheck --dry
|
||||
```
|
||||
|
||||
Expected: output lists the `typecheck` task for each package (even if no `typecheck` script exists yet — turbo will note them as skipped/missing).
|
||||
Expected: output lists the `typecheck` task for each package (even if no `typecheck`
|
||||
script exists yet — turbo will note them as skipped/missing).
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -217,15 +235,19 @@ git add package.json bun.lock
|
||||
git commit -m "chore: add workspace catalog and turbo to root package.json"
|
||||
```
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
### Task 4: Update per-package `package.json` files
|
||||
|
||||
Rename `type:check` → `typecheck`, replace `main`/`module` with `exports`, swap pinned dep versions for `catalog:` references.
|
||||
Rename `type:check` → `typecheck`, replace `main`/`module` with `exports`, swap pinned
|
||||
dep versions for `catalog:` references.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/package.json`
|
||||
|
||||
- Modify: `packages/api-server/package.json`
|
||||
|
||||
- Modify: `packages/mcp-server/package.json`
|
||||
|
||||
- [ ] **Step 1: Replace `packages/core/package.json`**
|
||||
@@ -325,7 +347,9 @@ Rename `type:check` → `typecheck`, replace `main`/`module` with `exports`, swa
|
||||
bun install
|
||||
```
|
||||
|
||||
Expected: no errors. Catalog refs resolved. `bun.lock` updated.
|
||||
Expected: no errors.
|
||||
Catalog refs resolved.
|
||||
`bun.lock` updated.
|
||||
|
||||
- [ ] **Step 5: Verify typecheck still works per-package**
|
||||
|
||||
@@ -345,15 +369,19 @@ git add packages/core/package.json packages/api-server/package.json packages/mcp
|
||||
git commit -m "chore: use exports field and catalog refs in all packages"
|
||||
```
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
### Task 5: Slim per-package `tsconfig.json` files
|
||||
|
||||
Replace the duplicated full tsconfig in each package with a slim `extends`-based one pointing to root.
|
||||
Replace the duplicated full tsconfig in each package with a slim `extends`-based one
|
||||
pointing to root.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/tsconfig.json`
|
||||
|
||||
- Modify: `packages/api-server/tsconfig.json`
|
||||
|
||||
- Modify: `packages/mcp-server/tsconfig.json`
|
||||
|
||||
- [ ] **Step 1: Replace `packages/core/tsconfig.json`**
|
||||
@@ -400,7 +428,8 @@ Replace the duplicated full tsconfig in each package with a slim `extends`-based
|
||||
|
||||
- [ ] **Step 4: Verify `@tsconfig/bun` is resolvable**
|
||||
|
||||
The root tsconfig extends `@tsconfig/bun/tsconfig.json`. Confirm the package is installed:
|
||||
The root tsconfig extends `@tsconfig/bun/tsconfig.json`. Confirm the package is
|
||||
installed:
|
||||
|
||||
```bash
|
||||
ls node_modules/@tsconfig/bun/tsconfig.json
|
||||
@@ -414,7 +443,8 @@ Expected: file exists.
|
||||
bun run typecheck
|
||||
```
|
||||
|
||||
Expected: Turbo runs `typecheck` for all 3 packages in parallel, all pass (or same pre-existing errors — no new ones).
|
||||
Expected: Turbo runs `typecheck` for all 3 packages in parallel, all pass (or same
|
||||
pre-existing errors — no new ones).
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
@@ -423,7 +453,7 @@ git add packages/core/tsconfig.json packages/api-server/tsconfig.json packages/m
|
||||
git commit -m "chore: slim per-package tsconfigs to extend root"
|
||||
```
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
### Task 6: Smoke test full build pipeline
|
||||
|
||||
@@ -437,7 +467,8 @@ Verify everything works end-to-end.
|
||||
bun run typecheck
|
||||
```
|
||||
|
||||
Expected: Turbo runs `typecheck` across all packages. Exit 0.
|
||||
Expected: Turbo runs `typecheck` across all packages.
|
||||
Exit 0.
|
||||
|
||||
- [ ] **Step 2: Run full build**
|
||||
|
||||
@@ -445,7 +476,8 @@ Expected: Turbo runs `typecheck` across all packages. Exit 0.
|
||||
bun run build
|
||||
```
|
||||
|
||||
Expected: `dist/` cleaned, Turbo runs `build` (core first, then api-server and mcp-server in parallel), build artifacts appear in `dist/api/` and `dist/mcp/`.
|
||||
Expected: `dist/` cleaned, Turbo runs `build` (core first, then api-server and
|
||||
mcp-server in parallel), build artifacts appear in `dist/api/` and `dist/mcp/`.
|
||||
|
||||
- [ ] **Step 3: Verify dist artifacts**
|
||||
|
||||
@@ -461,7 +493,9 @@ Expected: compiled output files in both directories.
|
||||
grep -c '\^' bun.lock | head -5
|
||||
```
|
||||
|
||||
With `exact = true` in bunfig.toml, new installs won't add `^` ranges. Existing `^` ranges in `bun.lock` from before are fine — they'll be resolved to exact on next fresh install.
|
||||
With `exact = true` in bunfig.toml, new installs won’t add `^` ranges.
|
||||
Existing `^` ranges in `bun.lock` from before are fine — they’ll be resolved to exact on
|
||||
next fresh install.
|
||||
|
||||
- [ ] **Step 5: Final commit if any loose files**
|
||||
|
||||
|
||||
@@ -1,53 +1,64 @@
|
||||
# Cookie Env-Only Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
|
||||
> to implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Remove cookie files and request-provided cookie overrides so all authenticated marketplace scraping reads raw `Cookie` header strings only from environment variables.
|
||||
**Goal:** Remove cookie files and request-provided cookie overrides so all authenticated
|
||||
marketplace scraping reads raw `Cookie` header strings only from environment variables.
|
||||
|
||||
**Architecture:** Collapse shared cookie loading to a single env-var reader in `packages/core/src/utils/cookies.ts`, then tighten Facebook and eBay core signatures to stop accepting request/file cookie inputs. Update the API and MCP adapters so they no longer advertise or forward cookie parameters, and rewrite docs/tests to match the env-only contract.
|
||||
**Architecture:** Collapse shared cookie loading to a single env-var reader in
|
||||
`packages/core/src/utils/cookies.ts`, then tighten Facebook and eBay core signatures to
|
||||
stop accepting request/file cookie inputs.
|
||||
Update the API and MCP adapters so they no longer advertise or forward cookie
|
||||
parameters, and rewrite docs/tests to match the env-only contract.
|
||||
|
||||
**Tech Stack:** Bun, TypeScript, Bun test, Biome, workspace package exports
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
## File Map
|
||||
|
||||
- Modify: `packages/core/src/utils/cookies.ts`
|
||||
Purpose: remove JSON/file/request-source loading and keep env-only cookie parsing/formatting.
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
Purpose: drop `cookiesSource` / `cookiePath` arguments and env-only error text.
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts`
|
||||
Purpose: remove `opts.cookies` request override and use env-only cookie loading.
|
||||
- Modify: `packages/core/src/index.ts`
|
||||
Purpose: keep exports aligned with tightened core signatures.
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
Purpose: replace missing-file coverage with env-only auth tests.
|
||||
- Create: `packages/core/test/ebay-core.test.ts`
|
||||
Purpose: add dedicated eBay auth regression coverage instead of mixing it into Facebook tests.
|
||||
- Modify: `packages/api-server/src/routes/facebook.ts`
|
||||
Purpose: stop parsing/forwarding `cookies` query params.
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts`
|
||||
Purpose: stop parsing/forwarding `cookies` query params.
|
||||
- Create: `packages/api-server/test/routes.test.ts`
|
||||
Purpose: verify Facebook/eBay routes ignore cookie query params and still call core correctly.
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts`
|
||||
Purpose: remove Facebook/eBay cookie tool inputs and descriptions.
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts`
|
||||
Purpose: stop mapping removed cookie tool inputs into API URLs.
|
||||
- Create: `packages/mcp-server/test/protocol.test.ts`
|
||||
Purpose: verify tool schemas and handler URL building no longer include Facebook/eBay cookie fields.
|
||||
- Modify: `cookies/AGENTS.md`
|
||||
Purpose: document env vars as the only supported cookie input.
|
||||
- Modify: `packages/core/src/utils/cookies.ts` Purpose: remove JSON/file/request-source
|
||||
loading and keep env-only cookie parsing/formatting.
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts` Purpose: drop `cookiesSource` /
|
||||
`cookiePath` arguments and env-only error text.
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts` Purpose: remove `opts.cookies` request
|
||||
override and use env-only cookie loading.
|
||||
- Modify: `packages/core/src/index.ts` Purpose: keep exports aligned with tightened core
|
||||
signatures.
|
||||
- Modify: `packages/core/test/facebook-core.test.ts` Purpose: replace missing-file
|
||||
coverage with env-only auth tests.
|
||||
- Create: `packages/core/test/ebay-core.test.ts` Purpose: add dedicated eBay auth
|
||||
regression coverage instead of mixing it into Facebook tests.
|
||||
- Modify: `packages/api-server/src/routes/facebook.ts` Purpose: stop parsing/forwarding
|
||||
`cookies` query params.
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts` Purpose: stop parsing/forwarding
|
||||
`cookies` query params.
|
||||
- Create: `packages/api-server/test/routes.test.ts` Purpose: verify Facebook/eBay routes
|
||||
ignore cookie query params and still call core correctly.
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts` Purpose: remove Facebook/eBay
|
||||
cookie tool inputs and descriptions.
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts` Purpose: stop mapping removed
|
||||
cookie tool inputs into API URLs.
|
||||
- Create: `packages/mcp-server/test/protocol.test.ts` Purpose: verify tool schemas and
|
||||
handler URL building no longer include Facebook/eBay cookie fields.
|
||||
- Modify: `cookies/AGENTS.md` Purpose: document env vars as the only supported cookie
|
||||
input.
|
||||
|
||||
### Task 1: Lock core cookie utilities to env-only loading
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/utils/cookies.ts:19-227`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
Add or replace the auth-source test block in `packages/core/test/facebook-core.test.ts` with env-only expectations:
|
||||
Add or replace the auth-source test block in `packages/core/test/facebook-core.test.ts`
|
||||
with env-only expectations:
|
||||
|
||||
```ts
|
||||
test("should load Facebook cookies from FACEBOOK_COOKIE env var", async () => {
|
||||
@@ -85,12 +96,14 @@ test("should reject missing Facebook auth env var", async () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts`
|
||||
Expected: FAIL because the current implementation still allows missing env values to fall through to file/request-based behavior and does not emit the new env-only error.
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: FAIL because the
|
||||
current implementation still allows missing env values to fall through to
|
||||
file/request-based behavior and does not emit the new env-only error.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
Replace the multi-source loader in `packages/core/src/utils/cookies.ts` with an env-only loader. The target shape is:
|
||||
Replace the multi-source loader in `packages/core/src/utils/cookies.ts` with an env-only
|
||||
loader. The target shape is:
|
||||
|
||||
```ts
|
||||
export interface CookieConfig {
|
||||
@@ -129,8 +142,8 @@ Delete the now-dead helpers and types that exist only for JSON/file/request load
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts`
|
||||
Expected: PASS for the new env-only tests.
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS for the new
|
||||
env-only tests.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -142,10 +155,15 @@ git commit -m "refactor: make cookie loading env-only"
|
||||
### Task 2: Tighten Facebook core APIs to the new contract
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts:23-29`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts:214-228`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts:823-929`
|
||||
|
||||
- Modify: `packages/core/src/index.ts:5-15`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
@@ -171,8 +189,9 @@ test("should fail Facebook item fetch when FACEBOOK_COOKIE is unset", async () =
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts`
|
||||
Expected: FAIL because the current function signatures and error text still mention parameter/file-based auth paths.
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: FAIL because the
|
||||
current function signatures and error text still mention parameter/file-based auth
|
||||
paths.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -206,12 +225,14 @@ console.warn(
|
||||
);
|
||||
```
|
||||
|
||||
Remove the extra cookie arguments from `fetchFacebookItem(...)` and keep `packages/core/src/index.ts` exporting the tightened functions without the old parameter contract.
|
||||
Remove the extra cookie arguments from `fetchFacebookItem(...)` and keep
|
||||
`packages/core/src/index.ts` exporting the tightened functions without the old parameter
|
||||
contract.
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts`
|
||||
Expected: PASS with the new env-only Facebook API surface.
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS with the new
|
||||
env-only Facebook API surface.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -223,8 +244,11 @@ git commit -m "refactor: remove facebook cookie overrides"
|
||||
### Task 3: Tighten eBay core APIs to env-only auth
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts:9-15`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts:337-389`
|
||||
|
||||
- Create: `packages/core/test/ebay-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
@@ -249,8 +273,8 @@ test("should warn and continue without eBay cookies when EBAY_COOKIE is unset",
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/ebay-core.test.ts`
|
||||
Expected: FAIL because `loadEbayCookies` still accepts request overrides and mentions file/json sources.
|
||||
Run: `bun test packages/core/test/ebay-core.test.ts` Expected: FAIL because
|
||||
`loadEbayCookies` still accepts request overrides and mentions file/json sources.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -276,12 +300,13 @@ async function loadEbayCookies(): Promise<string | undefined> {
|
||||
}
|
||||
```
|
||||
|
||||
Then remove `cookies` from `fetchEbayItems(..., opts)` and the destructuring that feeds it into `loadEbayCookies()`.
|
||||
Then remove `cookies` from `fetchEbayItems(..., opts)` and the destructuring that feeds
|
||||
it into `loadEbayCookies()`.
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/ebay-core.test.ts`
|
||||
Expected: PASS for the eBay env-only regression coverage.
|
||||
Run: `bun test packages/core/test/ebay-core.test.ts` Expected: PASS for the eBay
|
||||
env-only regression coverage.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -293,13 +318,17 @@ git commit -m "refactor: make ebay auth env-only"
|
||||
### Task 4: Remove cookie query parameters from the API adapter
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/api-server/src/routes/facebook.ts:3-33`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts:3-52`
|
||||
|
||||
- Create: `packages/api-server/test/routes.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
Create `packages/api-server/test/routes.test.ts` and mock `@marketplace-scrapers/core` so the route contract is explicit:
|
||||
Create `packages/api-server/test/routes.test.ts` and mock `@marketplace-scrapers/core`
|
||||
so the route contract is explicit:
|
||||
|
||||
```ts
|
||||
import { afterEach, describe, expect, mock, test } from "bun:test";
|
||||
@@ -347,8 +376,9 @@ test("ebayRoute ignores cookies query parameter", async () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/api-server/test/routes.test.ts`
|
||||
Expected: FAIL because the current routes still parse `reqUrl.searchParams.get("cookies")` and forward it downstream.
|
||||
Run: `bun test packages/api-server/test/routes.test.ts` Expected: FAIL because the
|
||||
current routes still parse `reqUrl.searchParams.get("cookies")` and forward it
|
||||
downstream.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -383,8 +413,8 @@ const items = await fetchEbayItems(SEARCH_QUERY, 1, {
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/api-server/test/routes.test.ts`
|
||||
Expected: PASS for route coverage and no remaining adapter references to `cookies` for Facebook/eBay.
|
||||
Run: `bun test packages/api-server/test/routes.test.ts` Expected: PASS for route
|
||||
coverage and no remaining adapter references to `cookies` for Facebook/eBay.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -396,13 +426,17 @@ git commit -m "refactor: remove api cookie query overrides"
|
||||
### Task 5: Remove cookie inputs from MCP tool schemas and request mapping
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts:65-148`
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts:154-211`
|
||||
|
||||
- Create: `packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
Create `packages/mcp-server/test/protocol.test.ts` with schema and URL-building assertions:
|
||||
Create `packages/mcp-server/test/protocol.test.ts` with schema and URL-building
|
||||
assertions:
|
||||
|
||||
```ts
|
||||
import { expect, mock, test } from "bun:test";
|
||||
@@ -445,8 +479,8 @@ expect(calledUrl).not.toContain("cookies=");
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts`
|
||||
Expected: FAIL because the current MCP schema and handler still expose and forward those inputs.
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: FAIL because the
|
||||
current MCP schema and handler still expose and forward those inputs.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -465,12 +499,13 @@ Delete the Facebook/eBay cookie tool properties and handler mapping:
|
||||
// if (args.cookies) params.append("cookies", args.cookies);
|
||||
```
|
||||
|
||||
Leave Kijiji alone; this plan only changes Facebook/eBay env-only auth paths defined by the approved spec.
|
||||
Leave Kijiji alone; this plan only changes Facebook/eBay env-only auth paths defined by
|
||||
the approved spec.
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts`
|
||||
Expected: PASS with MCP definitions and handler mapping in sync.
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: PASS with MCP
|
||||
definitions and handler mapping in sync.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -482,12 +517,16 @@ git commit -m "refactor: remove mcp cookie parameters"
|
||||
### Task 6: Rewrite cookie documentation and run full verification
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `cookies/AGENTS.md:9-85`
|
||||
- Modify: `docs/superpowers/specs/2026-04-21-cookie-env-only-design.md` only if implementation reveals a spec mismatch
|
||||
|
||||
- Modify: `docs/superpowers/specs/2026-04-21-cookie-env-only-design.md` only if
|
||||
implementation reveals a spec mismatch
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
Treat docs drift as a contract failure. Capture the required state before editing:
|
||||
Treat docs drift as a contract failure.
|
||||
Capture the required state before editing:
|
||||
|
||||
```md
|
||||
- Cookie setup docs mention env vars only for Facebook and eBay
|
||||
@@ -497,14 +536,14 @@ Treat docs drift as a contract failure. Capture the required state before editin
|
||||
|
||||
- [ ] **Step 2: Run verification to prove current docs are stale**
|
||||
|
||||
Run: `rg -n "facebook\.json|ebay\.json|cookies=" cookies/AGENTS.md`
|
||||
Expected: matches found
|
||||
Run: `rg -n "facebook\.json|ebay\.json|cookies=" cookies/AGENTS.md` Expected: matches
|
||||
found
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
Rewrite the cookie setup doc so Facebook and eBay each show only env-var setup:
|
||||
|
||||
```md
|
||||
````md
|
||||
## Cookie Configuration
|
||||
|
||||
All supported authenticated scrapers read cookies only from environment variables.
|
||||
@@ -513,14 +552,14 @@ All supported authenticated scrapers read cookies only from environment variable
|
||||
|
||||
```bash
|
||||
export FACEBOOK_COOKIE='c_user=123; xs=token; fr=request'
|
||||
```
|
||||
````
|
||||
|
||||
### eBay
|
||||
|
||||
```bash
|
||||
export EBAY_COOKIE='s=VALUE; ds2=VALUE; ebay=VALUE'
|
||||
```
|
||||
```
|
||||
````
|
||||
|
||||
Remove the file-based and request-parameter sections entirely.
|
||||
|
||||
@@ -534,10 +573,14 @@ Expected: all commands pass
|
||||
```bash
|
||||
git add cookies/AGENTS.md docs/superpowers/specs/2026-04-21-cookie-env-only-design.md
|
||||
git commit -m "docs: align cookie setup with env-only auth"
|
||||
```
|
||||
````
|
||||
|
||||
## Self-Review
|
||||
|
||||
- Spec coverage check: shared cookie utils, Facebook, eBay, API adapter, MCP adapter, tests, and docs each have explicit tasks.
|
||||
- Placeholder scan: concrete test files are now named for eBay core, API routes, and MCP protocol coverage.
|
||||
- Type consistency check: `ensureCookies(config)` is the single shared loader name used across Tasks 1-3, and Facebook/eBay route signatures stay aligned with the core changes.
|
||||
- Spec coverage check: shared cookie utils, Facebook, eBay, API adapter, MCP adapter,
|
||||
tests, and docs each have explicit tasks.
|
||||
- Placeholder scan: concrete test files are now named for eBay core, API routes, and MCP
|
||||
protocol coverage.
|
||||
- Type consistency check: `ensureCookies(config)` is the single shared loader name used
|
||||
across Tasks 1-3, and Facebook/eBay route signatures stay aligned with the core
|
||||
changes.
|
||||
|
||||
@@ -1,34 +1,49 @@
|
||||
# Facebook Comet Rewrite Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
|
||||
> to implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid Comet-bootstrap parser for both search and item routes.
|
||||
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid
|
||||
Comet-bootstrap parser for both search and item routes.
|
||||
|
||||
**Architecture:** Keep authenticated direct HTTP fetches as the transport. Classify each Facebook response first, then parse route-specific Comet bootstrap/state candidates, and fall back to rendered-HTML extraction only when bootstrap decoding cannot produce the expected search or item shape.
|
||||
**Architecture:** Keep authenticated direct HTTP fetches as the transport.
|
||||
Classify each Facebook response first, then parse route-specific Comet bootstrap/state
|
||||
candidates, and fall back to rendered-HTML extraction only when bootstrap decoding
|
||||
cannot produce the expected search or item shape.
|
||||
|
||||
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http helpers
|
||||
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http
|
||||
helpers
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
## File Structure
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction, search parsing, item parsing, and HTML fallbacks.
|
||||
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction,
|
||||
search parsing, item parsing, and HTML fallbacks.
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing, and route-aware item/search extraction behavior.
|
||||
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing,
|
||||
and route-aware item/search extraction behavior.
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts`
|
||||
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping for search/item entrypoints.
|
||||
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping
|
||||
for search/item entrypoints.
|
||||
|
||||
### Task 1: Add Route Classification Coverage
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Add these tests near the Facebook parser tests in `packages/core/test/facebook-core.test.ts`:
|
||||
Add these tests near the Facebook parser tests in
|
||||
`packages/core/test/facebook-core.test.ts`:
|
||||
|
||||
```ts
|
||||
test("classifies Comet search responses", () => {
|
||||
@@ -89,12 +104,14 @@ test("classifies unavailable item responses", () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Expected: FAIL because `classifyFacebookResponse` does not exist yet.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
Add this type and function near the parsing section in `packages/core/src/scrapers/facebook.ts`:
|
||||
Add this type and function near the parsing section in
|
||||
`packages/core/src/scrapers/facebook.ts`:
|
||||
|
||||
```ts
|
||||
type FacebookResponseKind = "search" | "item" | "auth_gated" | "unavailable" | "unknown";
|
||||
@@ -128,7 +145,8 @@ export function classifyFacebookResponse(htmlString: HTMLString, responseUrl: st
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -141,8 +159,11 @@ git commit -m "refactor: add facebook response classification"
|
||||
### Task 2: Add Bootstrap Candidate Extraction
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -185,7 +206,8 @@ test("keeps candidate order stable for later scoring", () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Expected: FAIL because `extractFacebookBootstrapCandidates` does not exist.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
@@ -218,7 +240,8 @@ export function extractFacebookBootstrapCandidates(htmlString: HTMLString): Reco
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -231,10 +254,15 @@ git commit -m "refactor: add facebook bootstrap candidate extraction"
|
||||
### Task 3: Replace Search Parsing With Candidate Scoring
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -323,12 +351,15 @@ const mockSearchHtml = `
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
|
||||
Expected: FAIL because the current search extractor only understands legacy `marketplace_search` shapes.
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
|
||||
Expected: FAIL because the current search extractor only understands legacy
|
||||
`marketplace_search` shapes.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
Replace the search extraction internals in `extractFacebookMarketplaceData()` with candidate scoring like this:
|
||||
Replace the search extraction internals in `extractFacebookMarketplaceData()` with
|
||||
candidate scoring like this:
|
||||
|
||||
```ts
|
||||
function findSearchEdges(candidate: unknown): FacebookEdge[] | null {
|
||||
@@ -383,7 +414,8 @@ export function extractFacebookMarketplaceData(htmlString: HTMLString): Facebook
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Expected: PASS for the rewritten search fixtures and existing unaffected tests.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -396,8 +428,11 @@ git commit -m "refactor: rewrite facebook search parser for comet bootstrap"
|
||||
### Task 4: Replace Item Parsing With Candidate Scoring
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -438,7 +473,8 @@ test("extracts item details from Comet permalink bootstrap candidates", () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
|
||||
Expected: FAIL because the current item extractor depends on legacy permalink markers.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
@@ -491,8 +527,8 @@ export function extractFacebookItemData(htmlString: HTMLString): FacebookMarketp
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts`
|
||||
Expected: PASS for current-shape item tests and remaining parser tests.
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS for
|
||||
current-shape item tests and remaining parser tests.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -504,8 +540,11 @@ git commit -m "refactor: rewrite facebook item parser for comet bootstrap"
|
||||
### Task 5: Add HTML Fallback Extraction
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -549,8 +588,10 @@ test("falls back to rendered item HTML when bootstrap payloads are undecodable",
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Expected: FAIL because the extractor currently returns `null` without a structured candidate.
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Expected: FAIL because the extractor currently returns `null` without a structured
|
||||
candidate.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -607,11 +648,13 @@ function extractItemFallback(htmlString: HTMLString): FacebookMarketplaceItem |
|
||||
}
|
||||
```
|
||||
|
||||
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()` and `extractFacebookItemData()`.
|
||||
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()`
|
||||
and `extractFacebookItemData()`.
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -624,8 +667,11 @@ git commit -m "refactor: add facebook html fallbacks"
|
||||
### Task 6: Wire Route-Aware Failures Into Entry Points
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -664,8 +710,10 @@ test("returns null for unavailable item responses", async () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
|
||||
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by route/auth state.
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
|
||||
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by
|
||||
route/auth state.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -690,12 +738,13 @@ if (itemResponseClass.kind === "unavailable") {
|
||||
}
|
||||
```
|
||||
|
||||
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to return both HTML and final URL; otherwise start by threading final URL support through the fetch helper in the same task.
|
||||
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to
|
||||
return both HTML and final URL; otherwise start by threading final URL support through
|
||||
the fetch helper in the same task.
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-integration.test.ts`
|
||||
Expected: PASS
|
||||
Run: `bun test packages/core/test/facebook-integration.test.ts` Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -707,19 +756,22 @@ git commit -m "refactor: handle facebook route-aware failure states"
|
||||
### Task 7: Run Full Verification And Live Probe
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts` if small cleanup is required
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts` if small cleanup is required
|
||||
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts` if small cleanup is required
|
||||
|
||||
- [ ] **Step 1: Run focused Facebook tests**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 2: Run broader core tests**
|
||||
|
||||
Run: `bun test packages/core/test`
|
||||
Expected: PASS
|
||||
Run: `bun test packages/core/test` Expected: PASS
|
||||
|
||||
- [ ] **Step 3: Run live authenticated Facebook probe**
|
||||
|
||||
@@ -742,11 +794,14 @@ if (results[0]?.url) {
|
||||
Expected:
|
||||
|
||||
- search returns at least one result
|
||||
- item fetch returns non-null for the first live result when the route is not stale/unavailable
|
||||
|
||||
- item fetch returns non-null for the first live result when the route is not
|
||||
stale/unavailable
|
||||
|
||||
- [ ] **Step 4: Make any minimal cleanup needed to keep tests and live probe green**
|
||||
|
||||
If cleanup is needed, keep it limited to naming, dead-code removal caused by the rewrite, or small parser corrections directly exposed by the verification commands.
|
||||
If cleanup is needed, keep it limited to naming, dead-code removal caused by the
|
||||
rewrite, or small parser corrections directly exposed by the verification commands.
|
||||
|
||||
- [ ] **Step 5: Re-run verification**
|
||||
|
||||
@@ -767,6 +822,11 @@ git commit -m "refactor: complete facebook comet scraper rewrite"
|
||||
|
||||
## Self-Review
|
||||
|
||||
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and live verification.
|
||||
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps remain.
|
||||
- Type consistency: all planned functions and types use the same names across tasks: `classifyFacebookResponse`, `extractFacebookBootstrapCandidates`, `extractFacebookMarketplaceData`, and `extractFacebookItemData`.
|
||||
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware
|
||||
item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and
|
||||
live verification.
|
||||
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps
|
||||
remain.
|
||||
- Type consistency: all planned functions and types use the same names across tasks:
|
||||
`classifyFacebookResponse`, `extractFacebookBootstrapCandidates`,
|
||||
`extractFacebookMarketplaceData`, and `extractFacebookItemData`.
|
||||
|
||||
@@ -1,63 +1,75 @@
|
||||
# Unstable Listing Mode Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
|
||||
> to implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add an optional shared mode across Facebook, eBay, and Kijiji that moves listings priced below 80% of the median into `unstableResults`, while preserving current default response shapes.
|
||||
**Goal:** Add an optional shared mode across Facebook, eBay, and Kijiji that moves
|
||||
listings priced below 80% of the median into `unstableResults`, while preserving current
|
||||
default response shapes.
|
||||
|
||||
**Architecture:** Introduce a shared generic classifier in `packages/core` that splits any listing array into `results` and `unstableResults` using the same median-based rule. Then thread one opt-in flag through the scraper entrypoints, API routes, and MCP tool definitions so all surfaces expose the same behavior without changing existing defaults.
|
||||
**Architecture:** Introduce a shared generic classifier in `packages/core` that splits
|
||||
any listing array into `results` and `unstableResults` using the same median-based rule.
|
||||
Then thread one opt-in flag through the scraper entrypoints, API routes, and MCP tool
|
||||
definitions so all surfaces expose the same behavior without changing existing defaults.
|
||||
|
||||
**Tech Stack:** Bun, TypeScript, Bun test, workspace packages, JSON-RPC MCP server
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
## File Map
|
||||
|
||||
- Create: `packages/core/src/utils/unstable.ts`
|
||||
Purpose: shared generic median/cutoff classifier for listing arrays.
|
||||
- Modify: `packages/core/src/types/common.ts`
|
||||
Purpose: add shared mode types used by scrapers and adapters.
|
||||
- Modify: `packages/core/src/index.ts`
|
||||
Purpose: export the new shared classifier/types.
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
Purpose: add the optional mode flag and return bucketed results when enabled.
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts`
|
||||
Purpose: add the optional mode flag and return bucketed results when enabled.
|
||||
- Modify: `packages/core/src/scrapers/kijiji.ts`
|
||||
Purpose: add the optional mode flag and return bucketed results when enabled.
|
||||
- Create: `packages/core/test/unstable-listing-mode.test.ts`
|
||||
Purpose: lock the shared classifier behavior with direct unit tests.
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
Purpose: prove Facebook preserves default arrays and returns buckets when enabled.
|
||||
- Modify: `packages/core/test/ebay-core.test.ts`
|
||||
Purpose: prove eBay preserves default arrays and returns buckets when enabled.
|
||||
- Modify: `packages/core/test/kijiji-core.test.ts`
|
||||
Purpose: prove Kijiji preserves default arrays and returns buckets when enabled.
|
||||
- Modify: `packages/api-server/src/routes/facebook.ts`
|
||||
Purpose: expose a shared opt-in query parameter and preserve default response shape.
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts`
|
||||
Purpose: expose the same query parameter and preserve default response shape.
|
||||
- Modify: `packages/api-server/src/routes/kijiji.ts`
|
||||
Purpose: expose the same query parameter and preserve default response shape.
|
||||
- Modify: `packages/api-server/test/routes.test.ts`
|
||||
Purpose: verify route forwarding and route response-shape switching.
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts`
|
||||
Purpose: document the optional unstable mode in all search tools.
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts`
|
||||
Purpose: forward the optional mode to API routes for all search tools.
|
||||
- Modify: `packages/mcp-server/test/protocol.test.ts`
|
||||
Purpose: verify MCP tool metadata and forwarded URLs include the new option.
|
||||
- Create: `packages/core/src/utils/unstable.ts` Purpose: shared generic median/cutoff
|
||||
classifier for listing arrays.
|
||||
- Modify: `packages/core/src/types/common.ts` Purpose: add shared mode types used by
|
||||
scrapers and adapters.
|
||||
- Modify: `packages/core/src/index.ts` Purpose: export the new shared classifier/types.
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts` Purpose: add the optional mode flag
|
||||
and return bucketed results when enabled.
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts` Purpose: add the optional mode flag and
|
||||
return bucketed results when enabled.
|
||||
- Modify: `packages/core/src/scrapers/kijiji.ts` Purpose: add the optional mode flag and
|
||||
return bucketed results when enabled.
|
||||
- Create: `packages/core/test/unstable-listing-mode.test.ts` Purpose: lock the shared
|
||||
classifier behavior with direct unit tests.
|
||||
- Modify: `packages/core/test/facebook-core.test.ts` Purpose: prove Facebook preserves
|
||||
default arrays and returns buckets when enabled.
|
||||
- Modify: `packages/core/test/ebay-core.test.ts` Purpose: prove eBay preserves default
|
||||
arrays and returns buckets when enabled.
|
||||
- Modify: `packages/core/test/kijiji-core.test.ts` Purpose: prove Kijiji preserves
|
||||
default arrays and returns buckets when enabled.
|
||||
- Modify: `packages/api-server/src/routes/facebook.ts` Purpose: expose a shared opt-in
|
||||
query parameter and preserve default response shape.
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts` Purpose: expose the same query
|
||||
parameter and preserve default response shape.
|
||||
- Modify: `packages/api-server/src/routes/kijiji.ts` Purpose: expose the same query
|
||||
parameter and preserve default response shape.
|
||||
- Modify: `packages/api-server/test/routes.test.ts` Purpose: verify route forwarding and
|
||||
route response-shape switching.
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts` Purpose: document the optional
|
||||
unstable mode in all search tools.
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts` Purpose: forward the optional
|
||||
mode to API routes for all search tools.
|
||||
- Modify: `packages/mcp-server/test/protocol.test.ts` Purpose: verify MCP tool metadata
|
||||
and forwarded URLs include the new option.
|
||||
|
||||
### Task 1: Add the shared unstable-listing classifier
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `packages/core/src/utils/unstable.ts`
|
||||
|
||||
- Modify: `packages/core/src/types/common.ts`
|
||||
|
||||
- Modify: `packages/core/src/index.ts`
|
||||
|
||||
- Test: `packages/core/test/unstable-listing-mode.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
Create `packages/core/test/unstable-listing-mode.test.ts` with focused shared-behavior coverage:
|
||||
Create `packages/core/test/unstable-listing-mode.test.ts` with focused shared-behavior
|
||||
coverage:
|
||||
|
||||
```ts
|
||||
import { describe, expect, test } from "bun:test";
|
||||
@@ -127,8 +139,8 @@ describe("classifyUnstableListings", () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/unstable-listing-mode.test.ts`
|
||||
Expected: FAIL because `classifyUnstableListings` and the shared mode types do not exist yet.
|
||||
Run: `bun test packages/core/test/unstable-listing-mode.test.ts` Expected: FAIL because
|
||||
`classifyUnstableListings` and the shared mode types do not exist yet.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -202,8 +214,8 @@ export { classifyUnstableListings } from "./utils/unstable";
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/unstable-listing-mode.test.ts`
|
||||
Expected: PASS with 4 passing tests.
|
||||
Run: `bun test packages/core/test/unstable-listing-mode.test.ts` Expected: PASS with 4
|
||||
passing tests.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -215,16 +227,24 @@ git commit -m "feat: add shared unstable listing classifier"
|
||||
### Task 2: Thread the optional mode through all core scrapers
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/kijiji.ts`
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/test/ebay-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/test/kijiji-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Add one focused opt-in test per scraper. Use the new shared classifier through the public scraper entrypoints instead of testing internal helpers.
|
||||
Add one focused opt-in test per scraper.
|
||||
Use the new shared classifier through the public scraper entrypoints instead of testing
|
||||
internal helpers.
|
||||
|
||||
In `packages/core/test/facebook-core.test.ts`, add:
|
||||
|
||||
@@ -286,7 +306,8 @@ test("fetchKijijiItems returns stable and unstable buckets when unstable mode is
|
||||
});
|
||||
```
|
||||
|
||||
Also add one default-mode assertion in one existing scraper test file, for example in `packages/core/test/facebook-core.test.ts`:
|
||||
Also add one default-mode assertion in one existing scraper test file, for example in
|
||||
`packages/core/test/facebook-core.test.ts`:
|
||||
|
||||
```ts
|
||||
test("fetchFacebookItems keeps returning an array by default", async () => {
|
||||
@@ -307,8 +328,10 @@ test("fetchFacebookItems keeps returning an array by default", async () => {
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
|
||||
Expected: FAIL because the scraper signatures do not yet accept the new option and still always return arrays.
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
|
||||
Expected: FAIL because the scraper signatures do not yet accept the new option and still
|
||||
always return arrays.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -322,7 +345,8 @@ import {
|
||||
} from "../index";
|
||||
```
|
||||
|
||||
In `packages/core/src/scrapers/facebook.ts`, extend the default export signature and branch at the end:
|
||||
In `packages/core/src/scrapers/facebook.ts`, extend the default export signature and
|
||||
branch at the end:
|
||||
|
||||
```ts
|
||||
export default async function fetchFacebookItems(
|
||||
@@ -371,7 +395,8 @@ export default async function fetchEbayItems(
|
||||
}
|
||||
```
|
||||
|
||||
In `packages/core/src/scrapers/kijiji.ts`, add the same final argument after `listingOptions`:
|
||||
In `packages/core/src/scrapers/kijiji.ts`, add the same final argument after
|
||||
`listingOptions`:
|
||||
|
||||
```ts
|
||||
export default async function fetchKijijiItems(
|
||||
@@ -392,12 +417,15 @@ export default async function fetchKijijiItems(
|
||||
}
|
||||
```
|
||||
|
||||
Keep the default branch untouched in all three files so existing callers still receive arrays.
|
||||
Keep the default branch untouched in all three files so existing callers still receive
|
||||
arrays.
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
|
||||
Expected: PASS, including the new opt-in bucket assertions and the default-array regression assertion.
|
||||
Run:
|
||||
`bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts`
|
||||
Expected: PASS, including the new opt-in bucket assertions and the default-array
|
||||
regression assertion.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -409,14 +437,19 @@ git commit -m "feat: add unstable mode to scraper results"
|
||||
### Task 3: Expose unstable mode in API routes
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/api-server/src/routes/facebook.ts`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/kijiji.ts`
|
||||
|
||||
- Modify: `packages/api-server/test/routes.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Extend `packages/api-server/test/routes.test.ts` with route-forwarding coverage for the new query parameter:
|
||||
Extend `packages/api-server/test/routes.test.ts` with route-forwarding coverage for the
|
||||
new query parameter:
|
||||
|
||||
```ts
|
||||
test("facebookRoute forwards unstableFilter=true to core", async () => {
|
||||
@@ -480,8 +513,8 @@ test("kijijiRoute forwards unstableFilter=true to core", async () => {
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `bun test packages/api-server/test/routes.test.ts`
|
||||
Expected: FAIL because the routes do not yet parse or forward `unstableFilter`.
|
||||
Run: `bun test packages/api-server/test/routes.test.ts` Expected: FAIL because the
|
||||
routes do not yet parse or forward `unstableFilter`.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -533,12 +566,14 @@ const items = await fetchKijijiItems(
|
||||
);
|
||||
```
|
||||
|
||||
Do not add any response wrapper logic in the routes; simply return whatever the core scraper returns so the default array path remains unchanged.
|
||||
Do not add any response wrapper logic in the routes; simply return whatever the core
|
||||
scraper returns so the default array path remains unchanged.
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `bun test packages/api-server/test/routes.test.ts`
|
||||
Expected: PASS, including existing cookie-parameter regression tests and the new unstable-mode forwarding assertions.
|
||||
Run: `bun test packages/api-server/test/routes.test.ts` Expected: PASS, including
|
||||
existing cookie-parameter regression tests and the new unstable-mode forwarding
|
||||
assertions.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -550,13 +585,17 @@ git commit -m "feat: expose unstable mode in api routes"
|
||||
### Task 4: Document and forward unstable mode in MCP tools
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts`
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts`
|
||||
|
||||
- Modify: `packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Extend `packages/mcp-server/test/protocol.test.ts` with metadata and forwarding coverage:
|
||||
Extend `packages/mcp-server/test/protocol.test.ts` with metadata and forwarding
|
||||
coverage:
|
||||
|
||||
```ts
|
||||
test("search tools document unstable listing mode", () => {
|
||||
@@ -601,12 +640,14 @@ Mirror the forwarding assertion for `search_kijiji` and `search_ebay` in the sam
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts`
|
||||
Expected: FAIL because the tools do not yet describe `unstableFilter` and the handler does not append it to API URLs.
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: FAIL because the
|
||||
tools do not yet describe `unstableFilter` and the handler does not append it to API
|
||||
URLs.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
In `packages/mcp-server/src/protocol/tools.ts`, add the same optional property to all three tools:
|
||||
In `packages/mcp-server/src/protocol/tools.ts`, add the same optional property to all
|
||||
three tools:
|
||||
|
||||
```ts
|
||||
unstableFilter: {
|
||||
@@ -617,7 +658,8 @@ unstableFilter: {
|
||||
},
|
||||
```
|
||||
|
||||
In `packages/mcp-server/src/protocol/handler.ts`, append the shared flag in each search branch:
|
||||
In `packages/mcp-server/src/protocol/handler.ts`, append the shared flag in each search
|
||||
branch:
|
||||
|
||||
```ts
|
||||
if (args.unstableFilter !== undefined) {
|
||||
@@ -629,8 +671,8 @@ Add that snippet to the `search_kijiji`, `search_facebook`, and `search_ebay` br
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts`
|
||||
Expected: PASS, including the new tool-schema assertions and URL-forwarding assertions.
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts` Expected: PASS, including the
|
||||
new tool-schema assertions and URL-forwarding assertions.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -642,21 +684,23 @@ git commit -m "docs: expose unstable mode in mcp tools"
|
||||
### Task 5: Verify the full cross-package feature end to end
|
||||
|
||||
**Files:**
|
||||
|
||||
- No code changes expected.
|
||||
|
||||
- [ ] **Step 1: Run the focused package tests**
|
||||
|
||||
Run: `bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts packages/api-server/test/routes.test.ts packages/mcp-server/test/protocol.test.ts`
|
||||
Run:
|
||||
`bun test packages/core/test/unstable-listing-mode.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts packages/core/test/kijiji-core.test.ts packages/api-server/test/routes.test.ts packages/mcp-server/test/protocol.test.ts`
|
||||
Expected: PASS with zero failing tests.
|
||||
|
||||
- [ ] **Step 2: Run the broader workspace verification**
|
||||
|
||||
Run: `bun run ci`
|
||||
Expected: PASS with clean workspace validation.
|
||||
Run: `bun run ci` Expected: PASS with clean workspace validation.
|
||||
|
||||
- [ ] **Step 3: Commit verification-only follow-ups if needed**
|
||||
|
||||
If verification forced any tiny fixes, commit them immediately after the fix with a focused message, for example:
|
||||
If verification forced any tiny fixes, commit them immediately after the fix with a
|
||||
focused message, for example:
|
||||
|
||||
```bash
|
||||
git add <exact files changed>
|
||||
@@ -667,6 +711,8 @@ If no files changed during verification, skip this commit step.
|
||||
|
||||
## Self-Review
|
||||
|
||||
- Spec coverage: shared classifier, all three scrapers, API exposure, MCP documentation, and tests are each mapped to a task.
|
||||
- Placeholder scan: no `TODO`, `TBD`, or "write tests later" placeholders remain.
|
||||
- Type consistency: the plan uses one shared flag name, `unstableFilter`, and one shared core option, `hideUnstableResults`, across all tasks.
|
||||
- Spec coverage: shared classifier, all three scrapers, API exposure, MCP documentation,
|
||||
and tests are each mapped to a task.
|
||||
- Placeholder scan: no `TODO`, `TBD`, or “write tests later” placeholders remain.
|
||||
- Type consistency: the plan uses one shared flag name, `unstableFilter`, and one shared
|
||||
core option, `hideUnstableResults`, across all tasks.
|
||||
|
||||
@@ -1,14 +1,22 @@
|
||||
# Code Smell Cleanup Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
|
||||
> to implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Fix concrete code smells found in repo review without changing marketplace behavior or relaxing lint/type rules.
|
||||
**Goal:** Fix concrete code smells found in repo review without changing marketplace
|
||||
behavior or relaxing lint/type rules.
|
||||
|
||||
**Architecture:** Start with correctness bugs at transport boundaries, then remove secret-leaking query/log paths, then reduce duplicate parsing and HTTP code. Keep marketplace behavior inside `packages/core`, API routes thin, and MCP as JSON-RPC transport only.
|
||||
**Architecture:** Start with correctness bugs at transport boundaries, then remove
|
||||
secret-leaking query/log paths, then reduce duplicate parsing and HTTP code.
|
||||
Keep marketplace behavior inside `packages/core`, API routes thin, and MCP as JSON-RPC
|
||||
transport only.
|
||||
|
||||
**Tech Stack:** Bun `1.3.13`, TypeScript strict mode, `bun:test`, Biome, framework-free `Bun.serve` adapters.
|
||||
**Tech Stack:** Bun `1.3.13`, TypeScript strict mode, `bun:test`, Biome, framework-free
|
||||
`Bun.serve` adapters.
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
## File Structure
|
||||
|
||||
@@ -18,7 +26,8 @@
|
||||
- Extract shared API call/query-param helpers.
|
||||
- Stop logging full URLs with cookie-bearing params.
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts`
|
||||
- Remove `cookies` from Kijiji MCP schema or mark it as unsupported after API route no longer accepts it.
|
||||
- Remove `cookies` from Kijiji MCP schema or mark it as unsupported after API route no
|
||||
longer accepts it.
|
||||
- Modify: `packages/mcp-server/test/protocol.test.ts`
|
||||
- Add coverage for `id: 0`.
|
||||
- Add coverage for zero-valued numeric args.
|
||||
@@ -53,12 +62,15 @@
|
||||
- Replace `console.error` with repo logger.
|
||||
- Modify: `packages/core/test/setup.ts`
|
||||
- Remove redundant comments and make fetch-mock policy explicit.
|
||||
- Test: existing package tests under `packages/core/test`, `packages/api-server/test`, `packages/mcp-server/test`.
|
||||
- Test: existing package tests under `packages/core/test`, `packages/api-server/test`,
|
||||
`packages/mcp-server/test`.
|
||||
|
||||
## Task 1: Fix MCP JSON-RPC `id: 0` Handling
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts:61-74`
|
||||
|
||||
- Test: `packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write failing test for `id: 0`**
|
||||
@@ -137,7 +149,9 @@ git commit -m "fix: preserve zero json-rpc ids"
|
||||
## Task 2: Preserve Zero Numeric MCP Arguments
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts:107-216`
|
||||
|
||||
- Test: `packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write failing tests for zero-valued params**
|
||||
@@ -288,10 +302,15 @@ git commit -m "fix: forward zero-valued mcp params"
|
||||
## Task 3: Remove Cookie Query Path From MCP and API
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts:55-59`
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts:119`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/kijiji.ts:65`
|
||||
|
||||
- Test: `packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
- Test: `packages/api-server/test/routes.test.ts`
|
||||
|
||||
- [ ] **Step 1: Update MCP tests for no cookie exposure**
|
||||
@@ -341,7 +360,8 @@ test("search_kijiji should not forward cookies query parameters", async () => {
|
||||
|
||||
- [ ] **Step 2: Update API test expectation**
|
||||
|
||||
In `packages/api-server/test/routes.test.ts`, replace `kijijiRoute passes cookies query parameter` test with:
|
||||
In `packages/api-server/test/routes.test.ts`, replace
|
||||
`kijijiRoute passes cookies query parameter` test with:
|
||||
|
||||
```ts
|
||||
test("kijijiRoute ignores cookies query parameter", async () => {
|
||||
@@ -374,13 +394,15 @@ test("kijijiRoute ignores cookies query parameter", async () => {
|
||||
|
||||
- [ ] **Step 3: Run tests to verify failure**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
|
||||
Run:
|
||||
`bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
|
||||
|
||||
Expected: FAIL because Kijiji cookie query is still exposed/forwarded.
|
||||
|
||||
- [ ] **Step 4: Remove Kijiji cookie schema and forwarding**
|
||||
|
||||
Delete `cookies` property from `search_kijiji` in `packages/mcp-server/src/protocol/tools.ts`.
|
||||
Delete `cookies` property from `search_kijiji` in
|
||||
`packages/mcp-server/src/protocol/tools.ts`.
|
||||
|
||||
Delete this line from `packages/mcp-server/src/protocol/handler.ts`:
|
||||
|
||||
@@ -396,7 +418,8 @@ cookies: reqUrl.searchParams.get("cookies") || undefined,
|
||||
|
||||
- [ ] **Step 5: Run tests**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
|
||||
Run:
|
||||
`bun test packages/mcp-server/test/protocol.test.ts packages/api-server/test/routes.test.ts`
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
@@ -410,10 +433,15 @@ git commit -m "fix: remove cookie query forwarding"
|
||||
## Task 4: Add Strict API Integer Parsing
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `packages/api-server/src/routes/helpers.ts`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/facebook.ts`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/kijiji.ts`
|
||||
|
||||
- Test: `packages/api-server/test/routes.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write failing API validation tests**
|
||||
@@ -560,7 +588,9 @@ git commit -m "fix: strictly parse route integers"
|
||||
## Task 5: De-Duplicate MCP API Calls
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/handler.ts`
|
||||
|
||||
- Test: `packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
- [ ] **Step 1: Add regression test for successful tool result after helper extraction**
|
||||
@@ -645,7 +675,8 @@ Use `"facebook"` and `"ebay"` in their branches.
|
||||
|
||||
- [ ] **Step 4: Run MCP tests and build**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts && bun run --cwd packages/mcp-server build`
|
||||
Run:
|
||||
`bun test packages/mcp-server/test/protocol.test.ts && bun run --cwd packages/mcp-server build`
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
@@ -659,11 +690,17 @@ git commit -m "refactor: share mcp api calls"
|
||||
## Task 6: Consolidate Core HTTP Fetching
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/utils/http.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/ebay.ts`
|
||||
|
||||
- Test: `packages/core/test/http.test.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Test: `packages/core/test/ebay-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Add shared HTTP test for response URL and deterministic jitter**
|
||||
@@ -695,7 +732,8 @@ test("fetchHtml can return response URL", async () => {
|
||||
});
|
||||
```
|
||||
|
||||
If current `Response.url` cannot be set in Bun tests, use a mocked object cast to `Response` instead:
|
||||
If current `Response.url` cannot be set in Bun tests, use a mocked object cast to
|
||||
`Response` instead:
|
||||
|
||||
```ts
|
||||
global.fetch = mock(() =>
|
||||
@@ -827,7 +865,8 @@ Update error property reads from `err.status` to `err.statusCode`.
|
||||
|
||||
- [ ] **Step 5: Replace eBay direct fetch with shared helper**
|
||||
|
||||
In `packages/core/src/scrapers/ebay.ts`, import `fetchHtml` and `HttpError` from `../utils/http`.
|
||||
In `packages/core/src/scrapers/ebay.ts`, import `fetchHtml` and `HttpError` from
|
||||
`../utils/http`.
|
||||
|
||||
Replace direct `fetch` block with:
|
||||
|
||||
@@ -845,7 +884,8 @@ logger.error(`Failed to fetch eBay search (${err.statusCode}): ${err.message}`);
|
||||
|
||||
- [ ] **Step 6: Run core tests**
|
||||
|
||||
Run: `bun test packages/core/test/http.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts`
|
||||
Run:
|
||||
`bun test packages/core/test/http.test.ts packages/core/test/facebook-core.test.ts packages/core/test/ebay-core.test.ts`
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
@@ -865,15 +905,20 @@ git commit -m "refactor: share scraper http fetching"
|
||||
## Task 7: Clean Kijiji Dead Code and Logging
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/scrapers/kijiji.ts`
|
||||
|
||||
- Test: `packages/core/test/kijiji-core.test.ts`
|
||||
|
||||
- Test: `packages/core/test/kijiji-integration.test.ts`
|
||||
|
||||
- [ ] **Step 1: Verify `_parseListing` has no callers**
|
||||
|
||||
Run: `rg "_parseListing|parseListing" packages/core packages/api-server packages/mcp-server`
|
||||
Run:
|
||||
`rg "_parseListing|parseListing" packages/core packages/api-server packages/mcp-server`
|
||||
|
||||
Expected: only `_parseListing` definition appears. If any caller appears, stop and update this task to preserve behavior.
|
||||
Expected: only `_parseListing` definition appears.
|
||||
If any caller appears, stop and update this task to preserve behavior.
|
||||
|
||||
- [ ] **Step 2: Delete dead function**
|
||||
|
||||
@@ -911,7 +956,8 @@ Replace `console.error(...)` calls with `logger.error(...)` preserving message t
|
||||
|
||||
- [ ] **Step 4: Run Kijiji tests**
|
||||
|
||||
Run: `bun test packages/core/test/kijiji-core.test.ts packages/core/test/kijiji-integration.test.ts`
|
||||
Run:
|
||||
`bun test packages/core/test/kijiji-core.test.ts packages/core/test/kijiji-integration.test.ts`
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
@@ -925,7 +971,9 @@ git commit -m "refactor: clean kijiji scraper internals"
|
||||
## Task 8: Clean Test Setup Comments and Enforce Fetch Mocking
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/setup.ts`
|
||||
|
||||
- Test: core test suite
|
||||
|
||||
- [ ] **Step 1: Update setup file**
|
||||
@@ -942,7 +990,8 @@ global.fetch = (() => {
|
||||
|
||||
Run: `bun test packages/core/test`
|
||||
|
||||
Expected: PASS. If failures occur, fix individual tests by mocking `global.fetch` in `beforeEach` and restoring in `afterEach`.
|
||||
Expected: PASS. If failures occur, fix individual tests by mocking `global.fetch` in
|
||||
`beforeEach` and restoring in `afterEach`.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
@@ -954,6 +1003,7 @@ git commit -m "test: require explicit fetch mocks"
|
||||
## Task 9: Final Verification
|
||||
|
||||
**Files:**
|
||||
|
||||
- Verify all touched packages.
|
||||
|
||||
- [ ] **Step 1: Run full deterministic tests**
|
||||
@@ -991,9 +1041,13 @@ git commit -m "chore: finish code smell cleanup"
|
||||
|
||||
## Self-Review
|
||||
|
||||
- Spec coverage: all reviewed smells are covered: JSON-RPC id bug, zero args, cookie query leak, strict integer parsing, duplicate route/MCP helper code, duplicate HTTP clients, dead Kijiji function, direct timers/logging, stale setup comments.
|
||||
- Placeholder scan: no TBD/TODO/fill-in placeholders remain. Each task has target files, code snippets, commands, and expected results.
|
||||
- Type consistency: route helper names, MCP helper names, and shared HTTP option names are used consistently across tasks.
|
||||
- Spec coverage: all reviewed smells are covered: JSON-RPC id bug, zero args, cookie
|
||||
query leak, strict integer parsing, duplicate route/MCP helper code, duplicate HTTP
|
||||
clients, dead Kijiji function, direct timers/logging, stale setup comments.
|
||||
- Placeholder scan: no TBD/TODO/fill-in placeholders remain.
|
||||
Each task has target files, code snippets, commands, and expected results.
|
||||
- Type consistency: route helper names, MCP helper names, and shared HTTP option names
|
||||
are used consistently across tasks.
|
||||
|
||||
## Execution Handoff
|
||||
|
||||
@@ -1001,5 +1055,7 @@ Plan complete and saved to `docs/superpowers/plans/2026-04-28-code-smell-cleanup
|
||||
|
||||
Two execution options:
|
||||
|
||||
1. Subagent-Driven (recommended) - dispatch fresh subagent per task, review between tasks, fast iteration.
|
||||
2. Inline Execution - execute tasks in this session using executing-plans, batch execution with checkpoints.
|
||||
1. Subagent-Driven (recommended) - dispatch fresh subagent per task, review between
|
||||
tasks, fast iteration.
|
||||
2. Inline Execution - execute tasks in this session using executing-plans, batch
|
||||
execution with checkpoints.
|
||||
|
||||
110
docs/superpowers/plans/2026-04-30-ebay-dollar-price-inputs.md
Normal file
110
docs/superpowers/plans/2026-04-30-ebay-dollar-price-inputs.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# Marketplace Dollar Price Inputs Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to
|
||||
> implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Make public marketplace price inputs use dollars while preserving core scraper
|
||||
cent-based filtering.
|
||||
|
||||
**Architecture:** API server owns HTTP query parsing and converts dollar amounts to
|
||||
cents before calling core.
|
||||
MCP server keeps forwarding numeric dollar values as query params.
|
||||
Core scraper internals remain unchanged because parsed listing prices already use cents.
|
||||
This applies to eBay `minPrice`/`maxPrice` and Kijiji `priceMin`/`priceMax`; Facebook
|
||||
exposes no price filter inputs.
|
||||
|
||||
**Tech Stack:** Bun, TypeScript, `bun:test`, MCP JSON-RPC adapter, framework-free Bun
|
||||
HTTP routes.
|
||||
|
||||
* * *
|
||||
|
||||
### Task 1: API Dollar Parsing
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/api-server/src/routes/helpers.ts`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/ebay.ts`
|
||||
|
||||
- Modify: `packages/api-server/src/routes/kijiji.ts`
|
||||
|
||||
- Test: `packages/api-server/test/routes.test.ts`
|
||||
|
||||
- [ ] **Step 1: Add failing API route tests**
|
||||
|
||||
Add tests proving eBay `minPrice=999.99` / `maxPrice=1000` and Kijiji `priceMin=999.99`
|
||||
/ `priceMax=1000` are forwarded to core as `99999` and `100000` cents.
|
||||
Add validation tests for empty, whitespace, negative, hex, mixed text, and malformed
|
||||
decimal price values.
|
||||
|
||||
Run: `bun test packages/api-server/test/routes.test.ts`
|
||||
|
||||
Expected: new forwarding tests fail because route currently rejects decimals and
|
||||
forwards integer dollars unchanged.
|
||||
|
||||
- [ ] **Step 2: Implement dollar parser helper**
|
||||
|
||||
Add `parseDollarPriceParam(searchParams, name)` in
|
||||
`packages/api-server/src/routes/helpers.ts`. Accept `0`, `1000`, `999.99`, and `0.99`.
|
||||
Reject values that do not match `^\d+(?:\.\d{1,2})?$`. Convert to cents with
|
||||
`Math.round(Number(rawValue) * 100)`.
|
||||
|
||||
- [ ] **Step 3: Use dollar parser in eBay route**
|
||||
|
||||
Replace `parseNonNegativeIntegerParam` calls for eBay `minPrice`/`maxPrice` and Kijiji
|
||||
`priceMin`/`priceMax` with `parseDollarPriceParam`. Keep pagination/count params on
|
||||
integer parsing.
|
||||
|
||||
- [ ] **Step 4: Verify API tests**
|
||||
|
||||
Run: `bun test packages/api-server/test/routes.test.ts`
|
||||
|
||||
Expected: all API route tests pass.
|
||||
|
||||
### Task 2: MCP Schema Contract
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/mcp-server/src/protocol/tools.ts`
|
||||
|
||||
- Test: `packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
- [ ] **Step 1: Add MCP schema/forwarding tests**
|
||||
|
||||
Add tests that `search_ebay` describes `minPrice` and `maxPrice` as dollar filters and
|
||||
forwards numeric dollar values unchanged in API query params.
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
Expected: description test fails until schema text changes; forwarding behavior should
|
||||
already pass or reveal mapping gaps.
|
||||
|
||||
- [ ] **Step 2: Update tool descriptions**
|
||||
|
||||
Change eBay `minPrice` and Kijiji `priceMin` descriptions to `Minimum price in dollars`.
|
||||
Change eBay `maxPrice` and Kijiji `priceMax` descriptions to `Maximum price in dollars`.
|
||||
|
||||
- [ ] **Step 3: Verify MCP tests**
|
||||
|
||||
Run: `bun test packages/mcp-server/test/protocol.test.ts`
|
||||
|
||||
Expected: all MCP protocol tests pass.
|
||||
|
||||
### Task 3: Cross-Package Verification
|
||||
|
||||
**Files:**
|
||||
|
||||
- No additional edits expected.
|
||||
|
||||
- [ ] **Step 1: Run relevant package tests**
|
||||
|
||||
Run: `bun test packages/api-server/test packages/mcp-server/test`
|
||||
|
||||
Expected: all tests pass.
|
||||
|
||||
- [ ] **Step 2: Run CI**
|
||||
|
||||
Run: `bun run ci`
|
||||
|
||||
Expected: typecheck and Biome pass without changing lint config.
|
||||
@@ -1,25 +1,37 @@
|
||||
# Live Parser Tests Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
|
||||
> to implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add explicit live endpoint test suites for each core marketplace scraper, excluded from default tests and runnable through one script.
|
||||
**Goal:** Add explicit live endpoint test suites for each core marketplace scraper,
|
||||
excluded from default tests and runnable through one script.
|
||||
|
||||
**Architecture:** Live tests live under `packages/core/test/live/` and import public scraper entry points directly. Normal package tests remain offline because the new files are outside current explicit test commands and run only through `bun run test:live`.
|
||||
**Architecture:** Live tests live under `packages/core/test/live/` and import public
|
||||
scraper entry points directly.
|
||||
Normal package tests remain offline because the new files are outside current explicit
|
||||
test commands and run only through `bun run test:live`.
|
||||
|
||||
**Tech Stack:** Bun `1.3.13`, `bun:test`, TypeScript, existing core scraper APIs.
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
## File Structure
|
||||
|
||||
- Create `packages/core/test/live/ebay.live.test.ts`: live eBay search smoke test against `fetchEbayItems`.
|
||||
- Create `packages/core/test/live/kijiji.live.test.ts`: live Kijiji search smoke test against `fetchKijijiItems`.
|
||||
- Create `packages/core/test/live/facebook.live.test.ts`: strict live Facebook search smoke test against `fetchFacebookItems` and `FACEBOOK_COOKIE`.
|
||||
- Modify `package.json`: add root script `test:live` running all files under `packages/core/test/live`.
|
||||
- Create `packages/core/test/live/ebay.live.test.ts`: live eBay search smoke test
|
||||
against `fetchEbayItems`.
|
||||
- Create `packages/core/test/live/kijiji.live.test.ts`: live Kijiji search smoke test
|
||||
against `fetchKijijiItems`.
|
||||
- Create `packages/core/test/live/facebook.live.test.ts`: strict live Facebook search
|
||||
smoke test against `fetchFacebookItems` and `FACEBOOK_COOKIE`.
|
||||
- Modify `package.json`: add root script `test:live` running all files under
|
||||
`packages/core/test/live`.
|
||||
|
||||
### Task 1: Add eBay Live Suite
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `packages/core/test/live/ebay.live.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the live test file**
|
||||
@@ -45,12 +57,13 @@ describe("eBay live parser", () => {
|
||||
|
||||
- [ ] **Step 2: Run eBay live test**
|
||||
|
||||
Run: `bun test packages/core/test/live/ebay.live.test.ts`
|
||||
Expected: PASS when eBay returns parseable search results; FAIL on endpoint/rate-limit/parser breakage.
|
||||
Run: `bun test packages/core/test/live/ebay.live.test.ts` Expected: PASS when eBay
|
||||
returns parseable search results; FAIL on endpoint/rate-limit/parser breakage.
|
||||
|
||||
### Task 2: Add Kijiji Live Suite
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `packages/core/test/live/kijiji.live.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the live test file**
|
||||
@@ -82,12 +95,13 @@ describe("Kijiji live parser", () => {
|
||||
|
||||
- [ ] **Step 2: Run Kijiji live test**
|
||||
|
||||
Run: `bun test packages/core/test/live/kijiji.live.test.ts`
|
||||
Expected: PASS when Kijiji returns parseable search and detail pages; FAIL on endpoint/parser breakage.
|
||||
Run: `bun test packages/core/test/live/kijiji.live.test.ts` Expected: PASS when Kijiji
|
||||
returns parseable search and detail pages; FAIL on endpoint/parser breakage.
|
||||
|
||||
### Task 3: Add Facebook Live Suite
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `packages/core/test/live/facebook.live.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the live test file**
|
||||
@@ -117,12 +131,14 @@ describe("Facebook live parser", () => {
|
||||
|
||||
- [ ] **Step 2: Run Facebook live test**
|
||||
|
||||
Run: `bun test packages/core/test/live/facebook.live.test.ts`
|
||||
Expected: PASS with valid `FACEBOOK_COOKIE`; FAIL when `FACEBOOK_COOKIE` is missing, expired, or parser output is empty.
|
||||
Run: `bun test packages/core/test/live/facebook.live.test.ts` Expected: PASS with valid
|
||||
`FACEBOOK_COOKIE`; FAIL when `FACEBOOK_COOKIE` is missing, expired, or parser output is
|
||||
empty.
|
||||
|
||||
### Task 4: Add Root Live Test Script
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `package.json`
|
||||
|
||||
- [ ] **Step 1: Add script**
|
||||
@@ -137,30 +153,35 @@ Change root `scripts` to include:
|
||||
|
||||
- [ ] **Step 2: Run all live tests through script**
|
||||
|
||||
Run: `bun run test:live`
|
||||
Expected: runs eBay, Kijiji, and Facebook live suites. Facebook fails if `FACEBOOK_COOKIE` is unset.
|
||||
Run: `bun run test:live` Expected: runs eBay, Kijiji, and Facebook live suites.
|
||||
Facebook fails if `FACEBOOK_COOKIE` is unset.
|
||||
|
||||
### Task 5: Verify Default Suite Exclusion
|
||||
|
||||
**Files:**
|
||||
|
||||
- No code files modified.
|
||||
|
||||
- [ ] **Step 1: Run existing core tests**
|
||||
|
||||
Run: `bun test packages/core/test`
|
||||
Expected: existing mocked tests run. If Bun discovers `packages/core/test/live`, change normal verification command to explicit glob `bun test packages/core/test/*.test.ts` and document that in final notes.
|
||||
Run: `bun test packages/core/test` Expected: existing mocked tests run.
|
||||
If Bun discovers `packages/core/test/live`, change normal verification command to
|
||||
explicit glob `bun test packages/core/test/*.test.ts` and document that in final notes.
|
||||
|
||||
- [ ] **Step 2: Run static checks**
|
||||
|
||||
Run: `bun run ci`
|
||||
Expected: typecheck and Biome pass. Fix code issues without changing lint or TypeScript rules.
|
||||
Run: `bun run ci` Expected: typecheck and Biome pass.
|
||||
Fix code issues without changing lint or TypeScript rules.
|
||||
|
||||
## Commit Note
|
||||
|
||||
Do not commit during execution unless user explicitly requests a commit. This repo session policy overrides generic plan commit steps.
|
||||
Do not commit during execution unless user explicitly requests a commit.
|
||||
This repo session policy overrides generic plan commit steps.
|
||||
|
||||
## Self-Review
|
||||
|
||||
- Spec coverage: eBay, Kijiji, Facebook live suites; explicit script; strict Facebook auth; excluded from default flow.
|
||||
- Spec coverage: eBay, Kijiji, Facebook live suites; explicit script; strict Facebook
|
||||
auth; excluded from default flow.
|
||||
- Placeholder scan: no `TBD`, `TODO`, or underspecified implementation steps.
|
||||
- Type consistency: tests use current exported scraper signatures and shared listing fields from `ListingDetails`.
|
||||
- Type consistency: tests use current exported scraper signatures and shared listing
|
||||
fields from `ListingDetails`.
|
||||
|
||||
@@ -1,12 +1,13 @@
|
||||
# Design: Adopt opencode Monorepo Config
|
||||
|
||||
**Date:** 2025-07-14
|
||||
**Status:** Approved
|
||||
**Date:** 2025-07-14\
|
||||
**Status:** Approved\
|
||||
**Approach:** Full adoption (A)
|
||||
|
||||
## Context
|
||||
|
||||
Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 packages (`core`, `api-server`, `mcp-server`). Reference: `anomalyco/opencode` monorepo patterns.
|
||||
Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 packages
|
||||
(`core`, `api-server`, `mcp-server`). Reference: `anomalyco/opencode` monorepo patterns.
|
||||
|
||||
**Gaps vs opencode:**
|
||||
- No Turbo (task orchestration, caching, dep graph)
|
||||
@@ -20,7 +21,8 @@ Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 p
|
||||
### 1. Root `package.json`
|
||||
|
||||
- Add `workspaces.catalog` block with shared deps:
|
||||
- `@typescript/native-preview`, `@types/bun`, `@types/unidecode`, `@types/cli-progress`
|
||||
- `@typescript/native-preview`, `@types/bun`, `@types/unidecode`,
|
||||
`@types/cli-progress`
|
||||
- Add `turbo` to `devDependencies`
|
||||
- Add `@tsconfig/bun` to `devDependencies` + catalog
|
||||
- Update root scripts: `typecheck` and `build` delegate to `turbo run`
|
||||
@@ -93,7 +95,8 @@ exact = true
|
||||
root = "./do-not-run-tests-from-root"
|
||||
```
|
||||
|
||||
Exact installs = reproducible. Root test guard prevents accidental root-level test runs.
|
||||
Exact installs = reproducible.
|
||||
Root test guard prevents accidental root-level test runs.
|
||||
|
||||
### 6. Package `exports` field
|
||||
|
||||
@@ -102,7 +105,8 @@ Replace `main`/`module` with `exports` in all 3 packages:
|
||||
"exports": { ".": "./src/index.ts" }
|
||||
```
|
||||
|
||||
Remove `main` and `module` fields. Bun resolves `.ts` directly.
|
||||
Remove `main` and `module` fields.
|
||||
Bun resolves `.ts` directly.
|
||||
|
||||
### 7. Catalog references in per-package `package.json`
|
||||
|
||||
|
||||
@@ -3,7 +3,9 @@
|
||||
## Summary
|
||||
|
||||
Remove all file-based and request-provided cookie inputs across the repo.
|
||||
The only supported authentication input becomes a raw `Cookie` header string supplied through scraper-specific environment variables such as `FACEBOOK_COOKIE` and `EBAY_COOKIE`.
|
||||
The only supported authentication input becomes a raw `Cookie` header string supplied
|
||||
through scraper-specific environment variables such as `FACEBOOK_COOKIE` and
|
||||
`EBAY_COOKIE`.
|
||||
|
||||
## Goals
|
||||
|
||||
@@ -17,7 +19,8 @@ The only supported authentication input becomes a raw `Cookie` header string sup
|
||||
|
||||
- Changing scraper behavior unrelated to authentication input.
|
||||
- Adding new cookie formats or migration helpers.
|
||||
- Preserving backward compatibility for cookie files, JSON cookie arrays, or request overrides.
|
||||
- Preserving backward compatibility for cookie files, JSON cookie arrays, or request
|
||||
overrides.
|
||||
|
||||
## Current State
|
||||
|
||||
@@ -27,27 +30,33 @@ The current shared cookie utilities support three sources in priority order:
|
||||
2. Environment variable
|
||||
3. Cookie file
|
||||
|
||||
`packages/core/src/utils/cookies.ts` includes file loading, JSON array parsing, and auto-detection between JSON and header-string formats.
|
||||
Facebook also exposes deprecated `cookiePath` arguments that still reach shared loading logic.
|
||||
Docs in `cookies/AGENTS.md` still describe file-based setup and request-level overrides.
|
||||
`packages/core/src/utils/cookies.ts` includes file loading, JSON array parsing, and
|
||||
auto-detection between JSON and header-string formats.
|
||||
Facebook also exposes deprecated `cookiePath` arguments that still reach shared loading
|
||||
logic. Docs in `cookies/AGENTS.md` still describe file-based setup and request-level
|
||||
overrides.
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
Use the hard-reset approach.
|
||||
Delete the shared multi-source cookie-loading model and reduce the cookie surface to env-header parsing only.
|
||||
This is a larger diff than a surgical removal, but it avoids leaving behind abstractions that imply unsupported inputs still exist.
|
||||
Delete the shared multi-source cookie-loading model and reduce the cookie surface to
|
||||
env-header parsing only.
|
||||
This is a larger diff than a surgical removal, but it avoids leaving behind abstractions
|
||||
that imply unsupported inputs still exist.
|
||||
|
||||
## Design
|
||||
|
||||
### Shared Cookie Utilities
|
||||
|
||||
`packages/core/src/utils/cookies.ts` will keep only the pieces needed for env-header-based auth:
|
||||
`packages/core/src/utils/cookies.ts` will keep only the pieces needed for
|
||||
env-header-based auth:
|
||||
|
||||
- `Cookie` type
|
||||
- A reduced cookie config shape containing only `name`, `domain`, and `envVar`
|
||||
- `parseCookieString()` for raw `Cookie` header strings
|
||||
- `formatCookiesForHeader()` for domain filtering and request formatting
|
||||
- An env-only loader that reads `process.env[config.envVar]`, parses it, and throws a targeted error when missing or invalid
|
||||
- An env-only loader that reads `process.env[config.envVar]`, parses it, and throws a
|
||||
targeted error when missing or invalid
|
||||
|
||||
The following shared utilities will be removed:
|
||||
|
||||
@@ -68,15 +77,18 @@ For Facebook this means:
|
||||
|
||||
For eBay this means:
|
||||
|
||||
- Remove any remaining fallback/file-oriented behavior from shared calls and error strings
|
||||
- Remove any remaining fallback/file-oriented behavior from shared calls and error
|
||||
strings
|
||||
- Keep the existing env-var auth path, but make it the only path
|
||||
|
||||
### Public API Surface
|
||||
|
||||
Exports from `packages/core/src/index.ts` should reflect the new contract.
|
||||
If exported functions currently advertise cookie-source or cookie-path arguments, their signatures will be tightened so callers cannot pass unsupported inputs.
|
||||
If exported functions currently advertise cookie-source or cookie-path arguments, their
|
||||
signatures will be tightened so callers cannot pass unsupported inputs.
|
||||
|
||||
Downstream adapter packages should continue calling core through the simplified signatures without adding their own cookie-loading behavior.
|
||||
Downstream adapter packages should continue calling core through the simplified
|
||||
signatures without adding their own cookie-loading behavior.
|
||||
|
||||
### Error Handling
|
||||
|
||||
@@ -93,8 +105,8 @@ Errors should be blunt and specific:
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
Follow TDD.
|
||||
Start by changing or adding core tests so the old file/request behavior is no longer accepted.
|
||||
Follow TDD. Start by changing or adding core tests so the old file/request behavior is
|
||||
no longer accepted.
|
||||
|
||||
Coverage targets:
|
||||
|
||||
@@ -102,7 +114,8 @@ Coverage targets:
|
||||
2. Missing env vars fail with the new env-only error.
|
||||
3. Invalid env strings fail without falling back to files or request data.
|
||||
4. Facebook APIs no longer expose or honor cookie-path/request-cookie behavior.
|
||||
5. Existing tests that depended on missing files or JSON cookie arrays are rewritten to the env-only contract.
|
||||
5. Existing tests that depended on missing files or JSON cookie arrays are rewritten to
|
||||
the env-only contract.
|
||||
|
||||
Verification target after implementation:
|
||||
|
||||
@@ -121,11 +134,15 @@ Update cookie-related docs to match the new contract:
|
||||
|
||||
## Risks
|
||||
|
||||
- External callers using request cookie overrides will break at compile time or runtime, depending on how they consume the package.
|
||||
- Recent work added support for custom Facebook cookie paths, so removing that path intentionally reverses a newly introduced behavior.
|
||||
- Tests that currently model missing-file behavior must be rewritten rather than preserved.
|
||||
- External callers using request cookie overrides will break at compile time or runtime,
|
||||
depending on how they consume the package.
|
||||
- Recent work added support for custom Facebook cookie paths, so removing that path
|
||||
intentionally reverses a newly introduced behavior.
|
||||
- Tests that currently model missing-file behavior must be rewritten rather than
|
||||
preserved.
|
||||
|
||||
## Rollout Notes
|
||||
|
||||
This is an intentional contract break.
|
||||
The code, tests, and docs should all land together so there is no mixed messaging about supported cookie sources.
|
||||
The code, tests, and docs should all land together so there is no mixed messaging about
|
||||
supported cookie sources.
|
||||
|
||||
@@ -2,35 +2,46 @@
|
||||
|
||||
## Summary
|
||||
|
||||
Replace the legacy Facebook Marketplace scraper with a route-aware implementation built around current Comet bootstrap markers and route-specific extraction.
|
||||
The new scraper will keep authenticated direct HTTP fetches as the primary transport, but it will stop treating legacy `require`, `__bbox`, and `marketplace_product_details_page` structures as the main parsing contract.
|
||||
Replace the legacy Facebook Marketplace scraper with a route-aware implementation built
|
||||
around current Comet bootstrap markers and route-specific extraction.
|
||||
The new scraper will keep authenticated direct HTTP fetches as the primary transport,
|
||||
but it will stop treating legacy `require`, `__bbox`, and
|
||||
`marketplace_product_details_page` structures as the main parsing contract.
|
||||
|
||||
## Goals
|
||||
|
||||
- Replace both Facebook search and item-detail extraction with a current-shape parser.
|
||||
- Keep authenticated direct HTTP requests as the primary fetch strategy.
|
||||
- Parse route-specific Comet bootstrap/state payloads before falling back to rendered-HTML extraction.
|
||||
- Parse route-specific Comet bootstrap/state payloads before falling back to
|
||||
rendered-HTML extraction.
|
||||
- Detect auth-gated, unavailable, and unknown responses explicitly.
|
||||
- Update tests so they model current route markers and failure modes instead of legacy page objects.
|
||||
- Update tests so they model current route markers and failure modes instead of legacy
|
||||
page objects.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Reworking non-Facebook scrapers.
|
||||
- Converting the scraper to browser-only automation.
|
||||
- Preserving old parser behavior for `marketplace_product_details_page` or `__bbox`-driven item extraction.
|
||||
- Reverse-engineering every internal Facebook bootstrap payload shape exhaustively before implementation.
|
||||
- Preserving old parser behavior for `marketplace_product_details_page` or
|
||||
`__bbox`-driven item extraction.
|
||||
- Reverse-engineering every internal Facebook bootstrap payload shape exhaustively
|
||||
before implementation.
|
||||
|
||||
## Current State
|
||||
|
||||
The current implementation in `packages/core/src/scrapers/facebook.ts` still uses authenticated HTTP requests, which remains correct.
|
||||
The search path parses embedded script JSON and looks for `marketplace_search.feed_units.edges`.
|
||||
The item-detail path is centered on legacy extraction paths such as:
|
||||
The current implementation in `packages/core/src/scrapers/facebook.ts` still uses
|
||||
authenticated HTTP requests, which remains correct.
|
||||
The search path parses embedded script JSON and looks for
|
||||
`marketplace_search.feed_units.edges`. The item-detail path is centered on legacy
|
||||
extraction paths such as:
|
||||
|
||||
- `parsed.require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- nested `__bbox.require[...]` variations
|
||||
- recursive search through `parsed.require`
|
||||
|
||||
Live evidence gathered earlier in this session and by the isolated research subagent shows that current Facebook Marketplace pages are Comet route-driven and expose markers such as:
|
||||
Live evidence gathered earlier in this session and by the isolated research subagent
|
||||
shows that current Facebook Marketplace pages are Comet route-driven and expose markers
|
||||
such as:
|
||||
|
||||
- `XCometMarketplaceSearchController`
|
||||
- `XCometMarketplacePermalinkController`
|
||||
@@ -41,7 +52,9 @@ Live evidence gathered earlier in this session and by the isolated research suba
|
||||
- `data-sjs`
|
||||
- `data-btmanifest`
|
||||
|
||||
The same live investigation also showed that authenticated item pages no longer expose the old `marketplace_product_details_page` marker reliably, while live search still returns usable results.
|
||||
The same live investigation also showed that authenticated item pages no longer expose
|
||||
the old `marketplace_product_details_page` marker reliably, while live search still
|
||||
returns usable results.
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
@@ -52,9 +65,11 @@ The scraper will:
|
||||
1. Fetch authenticated HTML directly.
|
||||
2. Classify the response using current route and auth markers.
|
||||
3. Parse inline bootstrap/state payloads using route-specific probes.
|
||||
4. Fall back to rendered-HTML extraction only when bootstrap markers are present but the payload cannot be decoded into the expected search or item shape.
|
||||
4. Fall back to rendered-HTML extraction only when bootstrap markers are present but the
|
||||
payload cannot be decoded into the expected search or item shape.
|
||||
|
||||
This keeps the cheaper direct-HTTP transport while shifting the parser contract from legacy page-object names to current Comet route structure.
|
||||
This keeps the cheaper direct-HTTP transport while shifting the parser contract from
|
||||
legacy page-object names to current Comet route structure.
|
||||
|
||||
## Design
|
||||
|
||||
@@ -88,7 +103,8 @@ Primary behavior:
|
||||
- fetch the Marketplace search HTML with auth cookies
|
||||
- confirm the response class is `search`
|
||||
- extract inline bootstrap/state blobs from script tags and page attributes
|
||||
- probe for route-specific search payloads associated with `XCometMarketplaceSearchController`
|
||||
- probe for route-specific search payloads associated with
|
||||
`XCometMarketplaceSearchController`
|
||||
- map decoded search results into summary listing records
|
||||
|
||||
Search summary fields should remain aligned with the current public output shape:
|
||||
@@ -102,7 +118,8 @@ Search summary fields should remain aligned with the current public output shape
|
||||
|
||||
Fallback behavior:
|
||||
|
||||
- if search route markers are present but structured payload decoding fails, extract listing summaries from rendered HTML anchors and text patterns
|
||||
- if search route markers are present but structured payload decoding fails, extract
|
||||
listing summaries from rendered HTML anchors and text patterns
|
||||
- use item links matching `/marketplace/item/<id>` as the anchor for fallback extraction
|
||||
- treat fallback results as summary-only data, not rich detail data
|
||||
|
||||
@@ -132,9 +149,12 @@ Priority item fields:
|
||||
|
||||
Fallback behavior:
|
||||
|
||||
- if permalink route markers are present but no stable payload object is decodable, extract data from rendered HTML text structure
|
||||
- prioritize title, price, condition, description, location text, and seller module content
|
||||
- return partial item data when core user-facing fields are present rather than failing solely because deeper commerce metadata is missing
|
||||
- if permalink route markers are present but no stable payload object is decodable,
|
||||
extract data from rendered HTML text structure
|
||||
- prioritize title, price, condition, description, location text, and seller module
|
||||
content
|
||||
- return partial item data when core user-facing fields are present rather than failing
|
||||
solely because deeper commerce metadata is missing
|
||||
|
||||
### Bootstrap Parsing Strategy
|
||||
|
||||
@@ -151,11 +171,14 @@ Candidate discovery inputs:
|
||||
- `ServerJS` / `Bootloader` inline blobs
|
||||
- route controller names
|
||||
|
||||
Candidate scoring for search should favor objects that contain repeated result-card semantics, item IDs, listing links, titles, prices, or location summaries.
|
||||
Candidate scoring for item pages should favor objects that contain singular listing semantics, title, price, condition, description, location, seller, or permalink context.
|
||||
Candidate scoring for search should favor objects that contain repeated result-card
|
||||
semantics, item IDs, listing links, titles, prices, or location summaries.
|
||||
Candidate scoring for item pages should favor objects that contain singular listing
|
||||
semantics, title, price, condition, description, location, seller, or permalink context.
|
||||
|
||||
The parser should not depend on one hard-coded object name surviving forever.
|
||||
Instead, it should look for route-specific semantic clusters and choose the strongest candidate.
|
||||
Instead, it should look for route-specific semantic clusters and choose the strongest
|
||||
candidate.
|
||||
|
||||
### Legacy Removal
|
||||
|
||||
@@ -166,7 +189,9 @@ Specifically:
|
||||
- delete legacy-first `require` / `__bbox` navigation tables
|
||||
- delete tests whose only purpose is to preserve those legacy paths
|
||||
|
||||
If a minimal legacy compatibility branch remains, it must be a last-resort fallback behind the new route-aware parser and should not shape test fixtures or design decisions.
|
||||
If a minimal legacy compatibility branch remains, it must be a last-resort fallback
|
||||
behind the new route-aware parser and should not shape test fixtures or design
|
||||
decisions.
|
||||
|
||||
### Error Handling
|
||||
|
||||
@@ -178,7 +203,8 @@ Facebook responses should now fail with explicit route-aware outcomes:
|
||||
4. Search or item route detected, but no decodable data found.
|
||||
5. Unknown response shape.
|
||||
|
||||
Error messages should name the actual class of failure instead of implying that every parse miss is caused by expired cookies.
|
||||
Error messages should name the actual class of failure instead of implying that every
|
||||
parse miss is caused by expired cookies.
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
@@ -190,11 +216,15 @@ Coverage targets:
|
||||
1. Search responses classify correctly from current Comet controller markers.
|
||||
2. Item responses classify correctly from current Comet controller markers.
|
||||
3. Login-gated and unavailable responses are detected before parsing.
|
||||
4. Search bootstrap parsing produces summary listing results from current-shape fixtures.
|
||||
4. Search bootstrap parsing produces summary listing results from current-shape
|
||||
fixtures.
|
||||
5. Item bootstrap parsing produces rich listing details from current-shape fixtures.
|
||||
6. Search fallback extraction works when route markers exist but structured payload decoding fails.
|
||||
7. Item fallback extraction works when route markers exist but structured payload decoding fails.
|
||||
8. Old legacy-only item fixtures are removed or rewritten so they no longer define the contract.
|
||||
6. Search fallback extraction works when route markers exist but structured payload
|
||||
decoding fails.
|
||||
7. Item fallback extraction works when route markers exist but structured payload
|
||||
decoding fails.
|
||||
8. Old legacy-only item fixtures are removed or rewritten so they no longer define the
|
||||
contract.
|
||||
|
||||
Verification target after implementation:
|
||||
|
||||
@@ -204,23 +234,30 @@ Verification target after implementation:
|
||||
|
||||
## Public API Surface
|
||||
|
||||
Keep the current public function names unless the rewrite proves that a signature change is required:
|
||||
Keep the current public function names unless the rewrite proves that a signature change
|
||||
is required:
|
||||
|
||||
- `fetchFacebookItems(...)`
|
||||
- `fetchFacebookItem(...)`
|
||||
- `extractFacebookMarketplaceData(...)`
|
||||
- `extractFacebookItemData(...)`
|
||||
|
||||
The internals should change substantially, but callers should not need a new integration surface for this rewrite.
|
||||
The internals should change substantially, but callers should not need a new integration
|
||||
surface for this rewrite.
|
||||
|
||||
## Risks
|
||||
|
||||
- Facebook may change bootstrap payload naming again, so route/controller markers are more stable than exact nested object paths but still not guaranteed.
|
||||
- Search and item pages may each contain multiple partial payloads, making candidate ranking important.
|
||||
- Fallback rendered-HTML extraction may be noisier than bootstrap decoding and needs clear precedence rules.
|
||||
- Live fixtures can drift from production quickly, so tests must model route semantics rather than exact one-off payloads where possible.
|
||||
- Facebook may change bootstrap payload naming again, so route/controller markers are
|
||||
more stable than exact nested object paths but still not guaranteed.
|
||||
- Search and item pages may each contain multiple partial payloads, making candidate
|
||||
ranking important.
|
||||
- Fallback rendered-HTML extraction may be noisier than bootstrap decoding and needs
|
||||
clear precedence rules.
|
||||
- Live fixtures can drift from production quickly, so tests must model route semantics
|
||||
rather than exact one-off payloads where possible.
|
||||
|
||||
## Rollout Notes
|
||||
|
||||
The code, fixtures, and tests should change together.
|
||||
There should be no mixed state where the implementation is Comet-aware but the tests still encode `marketplace_product_details_page` as the primary contract.
|
||||
There should be no mixed state where the implementation is Comet-aware but the tests
|
||||
still encode `marketplace_product_details_page` as the primary contract.
|
||||
|
||||
@@ -2,15 +2,18 @@
|
||||
|
||||
## Summary
|
||||
|
||||
Add an optional shared result mode across Facebook, eBay, and Kijiji that moves suspiciously cheap listings out of the main results into a separate `unstableResults` bucket.
|
||||
Listings are considered unstable when their price is more than 20% below the median price of the scraper's priced search results.
|
||||
Add an optional shared result mode across Facebook, eBay, and Kijiji that moves
|
||||
suspiciously cheap listings out of the main results into a separate `unstableResults`
|
||||
bucket. Listings are considered unstable when their price is more than 20% below the
|
||||
median price of the scraper’s priced search results.
|
||||
|
||||
## Goals
|
||||
|
||||
- Support the same optional unstable-listing mode across all scrapers.
|
||||
- Keep current default scraper and route behavior unchanged unless the mode is enabled.
|
||||
- Hide unstable listings from the main results while still returning them separately.
|
||||
- Implement the rule once in shared core code instead of duplicating marketplace-specific logic.
|
||||
- Implement the rule once in shared core code instead of duplicating
|
||||
marketplace-specific logic.
|
||||
- Document the option in MCP tool descriptions so callers can discover it.
|
||||
|
||||
## Non-Goals
|
||||
@@ -24,7 +27,8 @@ Listings are considered unstable when their price is more than 20% below the med
|
||||
|
||||
`packages/core` currently returns plain arrays from scraper search functions.
|
||||
`packages/api-server` forwards those scraper results directly from marketplace routes.
|
||||
`packages/mcp-server` documents search tools per marketplace, but does not expose or describe any result-stability mode.
|
||||
`packages/mcp-server` documents search tools per marketplace, but does not expose or
|
||||
describe any result-stability mode.
|
||||
|
||||
There is no shared result-classification utility today.
|
||||
Price filtering exists in some scrapers, but not a cross-marketplace median-based split.
|
||||
@@ -33,11 +37,14 @@ Price filtering exists in some scrapers, but not a cross-marketplace median-base
|
||||
|
||||
Use a shared core utility plus per-route and per-tool opt-in.
|
||||
|
||||
The shared utility will accept parsed listings, compute the median from valid positive prices, and split the data into `results` and `unstableResults`.
|
||||
Each scraper will opt into that utility when the caller enables unstable-listing mode.
|
||||
API routes and MCP tools will expose the same optional mode so the feature is consistently available everywhere scraper search is surfaced.
|
||||
The shared utility will accept parsed listings, compute the median from valid positive
|
||||
prices, and split the data into `results` and `unstableResults`. Each scraper will opt
|
||||
into that utility when the caller enables unstable-listing mode.
|
||||
API routes and MCP tools will expose the same optional mode so the feature is
|
||||
consistently available everywhere scraper search is surfaced.
|
||||
|
||||
This keeps the heuristic centralized, minimizes duplicated logic, and preserves existing consumers by leaving the default path unchanged.
|
||||
This keeps the heuristic centralized, minimizes duplicated logic, and preserves existing
|
||||
consumers by leaving the default path unchanged.
|
||||
|
||||
## Design
|
||||
|
||||
@@ -48,14 +55,16 @@ Add a shared utility in `packages/core` for listing stability classification.
|
||||
Responsibilities:
|
||||
|
||||
- accept parsed listing arrays with `listingPrice.cents`
|
||||
- ignore listings whose price is missing, non-numeric, or non-positive when computing the median
|
||||
- ignore listings whose price is missing, non-numeric, or non-positive when computing
|
||||
the median
|
||||
- compute the median price from valid priced listings
|
||||
- classify listings as unstable when `listingPrice.cents < median * 0.8`
|
||||
- return an object with:
|
||||
- `results`: listings that remain in the main bucket
|
||||
- `unstableResults`: listings moved out of the main bucket
|
||||
|
||||
Listings excluded from median computation because their price is missing or non-positive remain in `results` unchanged.
|
||||
Listings excluded from median computation because their price is missing or non-positive
|
||||
remain in `results` unchanged.
|
||||
|
||||
### Scraper Integration
|
||||
|
||||
@@ -68,7 +77,8 @@ Default behavior:
|
||||
Opt-in behavior:
|
||||
|
||||
- run the shared classification utility after parsing search results
|
||||
- classify before final result limiting so unstable items do not consume main-result slots
|
||||
- classify before final result limiting so unstable items do not consume main-result
|
||||
slots
|
||||
- return an object shaped like:
|
||||
|
||||
```ts
|
||||
@@ -82,7 +92,8 @@ Each scraper will use its existing concrete listing subtype for these arrays.
|
||||
|
||||
### API Surface
|
||||
|
||||
Marketplace API routes will expose an optional query parameter for unstable-listing mode.
|
||||
Marketplace API routes will expose an optional query parameter for unstable-listing
|
||||
mode.
|
||||
|
||||
Requirements:
|
||||
|
||||
@@ -90,7 +101,8 @@ Requirements:
|
||||
- when enabled, return the object payload with `results` and `unstableResults`
|
||||
- use the same semantics across Facebook, eBay, and Kijiji routes
|
||||
|
||||
The exact parameter name should be consistent across routes and intentionally describe the behavior, for example `unstableFilter=true`.
|
||||
The exact parameter name should be consistent across routes and intentionally describe
|
||||
the behavior, for example `unstableFilter=true`.
|
||||
|
||||
### MCP Surface
|
||||
|
||||
@@ -100,34 +112,43 @@ Tool descriptions should explicitly document:
|
||||
|
||||
- that the option is optional
|
||||
- that it moves listings priced more than 20% below the median into `unstableResults`
|
||||
- that enabling it changes the response shape from a plain list to an object with `results` and `unstableResults`
|
||||
- that enabling it changes the response shape from a plain list to an object with
|
||||
`results` and `unstableResults`
|
||||
- that the behavior is available for Facebook, eBay, and Kijiji search tools
|
||||
|
||||
The wording should be aligned across all three tools so the feature reads as one shared capability.
|
||||
The wording should be aligned across all three tools so the feature reads as one shared
|
||||
capability.
|
||||
|
||||
### Error Handling
|
||||
|
||||
The unstable-listing mode should be best-effort and non-failing.
|
||||
|
||||
- If there are no valid positive prices, return all listings in `results` and an empty `unstableResults` array.
|
||||
- If there are no valid positive prices, return all listings in `results` and an empty
|
||||
`unstableResults` array.
|
||||
- If there is only one valid priced listing, do not classify it as unstable.
|
||||
- Parsing failures remain governed by existing scraper behavior; the classification layer should not introduce new scraper-specific errors.
|
||||
- Parsing failures remain governed by existing scraper behavior; the classification
|
||||
layer should not introduce new scraper-specific errors.
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
Follow TDD.
|
||||
Start with shared utility tests, then wire the option through scraper and route tests.
|
||||
Follow TDD. Start with shared utility tests, then wire the option through scraper and
|
||||
route tests.
|
||||
|
||||
Coverage targets:
|
||||
|
||||
1. Median calculation for odd-sized valid price sets.
|
||||
2. Median calculation for even-sized valid price sets.
|
||||
3. Strict cutoff behavior where only listings with `price < median * 0.8` move to `unstableResults`.
|
||||
4. Missing, invalid, zero, or negative prices are excluded from median computation and remain in `results`.
|
||||
3. Strict cutoff behavior where only listings with `price < median * 0.8` move to
|
||||
`unstableResults`.
|
||||
4. Missing, invalid, zero, or negative prices are excluded from median computation and
|
||||
remain in `results`.
|
||||
5. Default scraper behavior still returns plain arrays when the option is disabled.
|
||||
6. Enabled scraper behavior returns `{ results, unstableResults }` for Facebook, eBay, and Kijiji.
|
||||
7. API routes preserve existing response shapes by default and switch to the object payload only when enabled.
|
||||
8. MCP tool metadata documents the new optional mode for all three marketplace search tools.
|
||||
6. Enabled scraper behavior returns `{ results, unstableResults }` for Facebook, eBay,
|
||||
and Kijiji.
|
||||
7. API routes preserve existing response shapes by default and switch to the object
|
||||
payload only when enabled.
|
||||
8. MCP tool metadata documents the new optional mode for all three marketplace search
|
||||
tools.
|
||||
|
||||
Verification target after implementation:
|
||||
|
||||
@@ -138,11 +159,15 @@ Verification target after implementation:
|
||||
|
||||
## Risks
|
||||
|
||||
- The optional mode introduces a union return shape for scraper callers, which can ripple into downstream TypeScript signatures.
|
||||
- Applying classification before final limiting changes which items appear in the main bucket compared with a naive post-limit split.
|
||||
- Kijiji and eBay may have different mixes of priced and unpriced results, so excluding non-positive prices from the median must remain explicit and tested.
|
||||
- The optional mode introduces a union return shape for scraper callers, which can
|
||||
ripple into downstream TypeScript signatures.
|
||||
- Applying classification before final limiting changes which items appear in the main
|
||||
bucket compared with a naive post-limit split.
|
||||
- Kijiji and eBay may have different mixes of priced and unpriced results, so excluding
|
||||
non-positive prices from the median must remain explicit and tested.
|
||||
|
||||
## Rollout Notes
|
||||
|
||||
Land the shared classifier, scraper wiring, route wiring, tests, and MCP description updates together.
|
||||
That avoids a partial rollout where the feature exists in one surface but is undocumented or inconsistent elsewhere.
|
||||
Land the shared classifier, scraper wiring, route wiring, tests, and MCP description
|
||||
updates together. That avoids a partial rollout where the feature exists in one surface
|
||||
but is undocumented or inconsistent elsewhere.
|
||||
|
||||
@@ -2,25 +2,32 @@
|
||||
|
||||
## Summary
|
||||
|
||||
Add explicit live endpoint tests for each core scraper parser path. These tests are excluded from normal deterministic test commands and run only through a dedicated package script.
|
||||
Add explicit live endpoint tests for each core scraper parser path.
|
||||
These tests are excluded from normal deterministic test commands and run only through a
|
||||
dedicated package script.
|
||||
|
||||
## Scope
|
||||
|
||||
- Add one live suite per parser: eBay, Kijiji, Facebook.
|
||||
- Place suites under `packages/core/test/live/` so normal `bun test packages/core/test/*.test.ts` patterns do not include them accidentally.
|
||||
- Place suites under `packages/core/test/live/` so normal
|
||||
`bun test packages/core/test/*.test.ts` patterns do not include them accidentally.
|
||||
- Add a root `test:live` script that runs all live suites together.
|
||||
- Keep existing mocked tests unchanged.
|
||||
|
||||
## Behavior
|
||||
|
||||
- Each suite calls the public scraper entry point for that marketplace with a narrow query and low max item count.
|
||||
- Assertions verify scrape output shape and parser viability, not exact listing identity.
|
||||
- Each suite calls the public scraper entry point for that marketplace with a narrow
|
||||
query and low max item count.
|
||||
- Assertions verify scrape output shape and parser viability, not exact listing
|
||||
identity.
|
||||
- eBay and Kijiji require live network access and fail on endpoint/parser breakage.
|
||||
- Facebook is strict: missing or expired `FACEBOOK_COOKIE` fails the live suite instead of skipping.
|
||||
- Facebook is strict: missing or expired `FACEBOOK_COOKIE` fails the live suite instead
|
||||
of skipping.
|
||||
|
||||
## Test Data
|
||||
|
||||
- Use stable broad Canadian queries such as `iphone` or `laptop` to reduce empty-result risk.
|
||||
- Use stable broad Canadian queries such as `iphone` or `laptop` to reduce empty-result
|
||||
risk.
|
||||
- Use low limits to avoid unnecessary load and rate-limit pressure.
|
||||
- Avoid exact prices, titles, listing IDs, or ordering assumptions.
|
||||
|
||||
|
||||
@@ -0,0 +1,173 @@
|
||||
# Facebook Marketplace Anti-Bot Challenge Solver Design
|
||||
|
||||
## Summary
|
||||
|
||||
Add a challenge-detection and challenge-solving layer to the Facebook Marketplace
|
||||
scraper so it can handle anti-bot gates (checkpoint pages, token rotation, cookie
|
||||
requirements) programmatically.
|
||||
Build the solver in pure Bun — no browser automation in production.
|
||||
Use `agent-browser` only for one-time debug reconnaissance.
|
||||
|
||||
## Goals
|
||||
|
||||
- Identify which anti-bot challenge(s) Facebook Marketplace triggers against
|
||||
programmatic HTTP requests.
|
||||
- Implement detection + solving for each discovered challenge type.
|
||||
- Wire the solver into `fetchFacebookItems` and `fetchFacebookItem` so challenges are
|
||||
handled transparently.
|
||||
- Follow the same pattern as the existing `ebay-challenge.ts` (detect → solve → retry
|
||||
with clearance).
|
||||
- Zero browser automation at runtime.
|
||||
Pure `fetch` + `Bun` APIs + npm packages only.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Solving login/auth-wall challenges (those require fresh cookies — not solvable
|
||||
programmatically).
|
||||
- Full account login automation (cookies must be provided by the user).
|
||||
- Browser-based scraping or Puppeteer/Playwright integration.
|
||||
- Solving challenges for non-Marketplace Facebook endpoints.
|
||||
|
||||
## Current State
|
||||
|
||||
The Facebook scraper (`packages/core/src/scrapers/facebook.ts`) fetches Marketplace
|
||||
search and item pages via authenticated `fetch` with cookies from `FACEBOOK_COOKIE` env
|
||||
var. It:
|
||||
|
||||
- Sends a browser-like header set (`sec-ch-ua`, `user-agent`, etc.)
|
||||
- Parses SSR HTML for embedded JSON in script tags
|
||||
- Has no challenge detection — if Facebook returns a challenge page, the scraper
|
||||
silently fails (no listings parsed, classifies as “unknown”)
|
||||
- Depends entirely on cookie freshness
|
||||
|
||||
The eBay scraper already follows the challenge-solver pattern in this codebase:
|
||||
`ebay.ts` uses `warmEbaySession()`, `isChallengeRedirect()`, `isChallengeHtml()`, and
|
||||
`solveEbayChallenge()` from `ebay-challenge.ts`.
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
**Reconnaissance-first development:**
|
||||
|
||||
1. Use `agent-browser` (debug only) to capture a real Facebook Marketplace browsing
|
||||
session via HAR.
|
||||
2. Probe programmatic `fetch` to see what Facebook returns without a browser.
|
||||
3. Diff the two to identify the gap (missing headers?
|
||||
missing cookies? missing JS execution?).
|
||||
4. Build a modular solver in `packages/core/src/utils/facebook-challenge.ts` that
|
||||
detects each challenge type and applies the appropriate fix.
|
||||
5. Wire it into `facebook.ts` following the eBay pattern.
|
||||
|
||||
## Design
|
||||
|
||||
### File Plan
|
||||
|
||||
| File | Purpose |
|
||||
| --- | --- |
|
||||
| `packages/core/src/utils/facebook-challenge.ts` | Challenge detection, solving, and cookie/session utilities |
|
||||
| `packages/core/src/scrapers/facebook.ts` | Modified: warmup, challenge detection before parsing, retry loop |
|
||||
| `packages/core/test/facebook-challenge.test.ts` | Unit tests with mock challenge HTML fixtures |
|
||||
|
||||
### Flow
|
||||
|
||||
```
|
||||
fetchFacebookItems(searchUrl)
|
||||
├── warmFacebookSession() → GET facebook.com/ (collect datr + Akamai cookies)
|
||||
├── fetchHtml(searchUrl) → receives response
|
||||
├── detectFacebookChallenge(response)
|
||||
│ ├── checkpoint/challenge HTML → solveCheckpointChallenge()
|
||||
│ ├── redirect to /login → fail (cookies expired)
|
||||
│ ├── missing required cookies → regenerate session
|
||||
│ ├── 429 rate limit → backoff + retry (existing http.ts handles this)
|
||||
│ └── no challenge → proceed to parsing
|
||||
├── if solveCheckpointChallenge succeeds → retry fetchHtml with clearance cookie
|
||||
└── parse results
|
||||
```
|
||||
|
||||
### Challenge Types (to be confirmed by reconnaissance)
|
||||
|
||||
| Type | Expected Signal | Solving Strategy |
|
||||
| --- | --- | --- |
|
||||
| Login wall | Redirect to `/login` or HTML `"You must log in"` | Fail — user must provide fresh cookies |
|
||||
| Checkpoint page | HTML contains `checkpoint` or `challenge` path | Parse hidden form fields, compute proof-of-work if present, submit answer endpoint |
|
||||
| `datr` cookie missing | No `datr` in cookie jar → request fails | Fetch homepage first to obtain `datr` (session warmup) |
|
||||
| DTSG token needed | Form submissions fail with CSRF error | Extract `fb_dtsg` from page HTML, include in request body |
|
||||
| GraphQL header check | Request blocked without internal headers | Extract `x-fb-friendly-name` from browser HAR, replicate |
|
||||
| Akamai/bot-manager | Redirect loops or blank pages without Akamai cookies | Homepage warmup to collect `bm_sv`, `bm_mi`, etc. |
|
||||
|
||||
### Key Modules
|
||||
|
||||
**`facebook-challenge.ts`:**
|
||||
|
||||
```
|
||||
// Session warmup — fetch homepage to prime cookies
|
||||
warmFacebookSession(): Promise<Record<string, string>>
|
||||
|
||||
// Challenge detection
|
||||
detectFacebookChallenge(html, status, url, headers): ChallengeType | null
|
||||
|
||||
// Checkpoint solver
|
||||
solveCheckpointChallenge(html, cookies): Promise<ChallengeResult>
|
||||
|
||||
// DTSG token extraction
|
||||
extractDtsg(html): string | null
|
||||
|
||||
// Cookie jar management (shared with ebay.ts pattern)
|
||||
mergeCookies(...): Record<string, string>
|
||||
```
|
||||
|
||||
**`ChallengeResult` type:**
|
||||
```ts
|
||||
interface ChallengeResult {
|
||||
solved: boolean;
|
||||
cookies?: Record<string, string>; // clearance cookies to replay
|
||||
token?: string; // challenge response token
|
||||
error?: string; // why it failed
|
||||
}
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Solver failure → return `ChallengeResult { solved: false, error: "..." }`, scraper
|
||||
logs warning and returns empty results (never throws).
|
||||
- Unrecognized challenge → log the response URL and HTML snippet for future analysis.
|
||||
- Rate limits → handled by existing `http.ts` exponential backoff (no change needed).
|
||||
- Solver timeout → 30s cap on any challenge computation, fall back to `solved: false`.
|
||||
|
||||
### Testing
|
||||
|
||||
| Test | What It Verifies |
|
||||
| --- | --- |
|
||||
| `detectFacebookChallenge` with sample checkpoint HTML | Correctly identifies checkpoint challenge |
|
||||
| `detectFacebookChallenge` with normal search HTML | Returns null (no false positives) |
|
||||
| `detectFacebookChallenge` with login redirect | Identifies auth-gated |
|
||||
| `solveCheckpointChallenge` with known PoW params | Produces correct answer |
|
||||
| `warmFacebookSession` with mocked fetch | Collects expected cookies |
|
||||
| `extractDtsg` with sample page HTML | Extracts the DTSG token |
|
||||
| Integration: fetch → challenge → solve → retry → results | End-to-end mock flow |
|
||||
| Solver throws → scraper returns empty, no crash | Graceful fallback |
|
||||
| Solver unknown challenge → logs warning, returns empty | No unhandled challenge crashes |
|
||||
|
||||
Test data will use anonymized HTML fixtures (no real user data).
|
||||
|
||||
## Reconnaissance Steps (debug-only, one-time)
|
||||
|
||||
1. **Probe programmatically:** `fetch` Marketplace search with/without cookies, record
|
||||
status code and HTML.
|
||||
2. **Browser session:** `agent-browser` → log into Facebook → navigate Marketplace →
|
||||
record HAR.
|
||||
3. **Diff analysis:** Compare browser request headers vs.
|
||||
our programmatic headers.
|
||||
4. **Cookie inventory:** List all cookies from browser session, identify which are
|
||||
essential.
|
||||
5. **Challenge trigger:** Identify what change in request signature triggers a
|
||||
challenge.
|
||||
6. **Replay test:** Replay browser’s exact request via `fetch` to confirm
|
||||
headers/cookies are the differentiator.
|
||||
|
||||
All reconnaissance artifacts saved under `docs/facebook-challenge/`.
|
||||
|
||||
## Decisions Deferred to Post-Reconnaissance
|
||||
|
||||
- Exact challenge types and solving strategies (depends on what Facebook actually uses).
|
||||
- Whether a PoW solver, CAPTCHA solver, or token-extraction approach is needed.
|
||||
- npm package dependencies (only add what the reconnaissance proves necessary).
|
||||
@@ -3,6 +3,7 @@ import { logger } from "../logger";
|
||||
import {
|
||||
emptySearchResponse,
|
||||
getRequiredSearchQuery,
|
||||
parseDollarPriceParam,
|
||||
parseNonNegativeIntegerParam,
|
||||
} from "./helpers";
|
||||
|
||||
@@ -18,17 +19,11 @@ export async function ebayRoute(req: Request): Promise<Response> {
|
||||
return SEARCH_QUERY;
|
||||
}
|
||||
|
||||
const minPrice = parseNonNegativeIntegerParam(
|
||||
reqUrl.searchParams,
|
||||
"minPrice",
|
||||
);
|
||||
const minPrice = parseDollarPriceParam(reqUrl.searchParams, "minPrice");
|
||||
if (minPrice instanceof Response) {
|
||||
return minPrice;
|
||||
}
|
||||
const maxPrice = parseNonNegativeIntegerParam(
|
||||
reqUrl.searchParams,
|
||||
"maxPrice",
|
||||
);
|
||||
const maxPrice = parseDollarPriceParam(reqUrl.searchParams, "maxPrice");
|
||||
if (maxPrice instanceof Response) {
|
||||
return maxPrice;
|
||||
}
|
||||
|
||||
@@ -39,6 +39,23 @@ export function parseNonNegativeIntegerParam(
|
||||
return Number(rawValue);
|
||||
}
|
||||
|
||||
export function parseDollarPriceParam(
|
||||
searchParams: URLSearchParams,
|
||||
name: string,
|
||||
): number | undefined | Response {
|
||||
const rawValue = searchParams.get(name);
|
||||
if (rawValue === null) {
|
||||
return undefined;
|
||||
}
|
||||
if (!/^\d+(?:\.\d{1,2})?$/.test(rawValue)) {
|
||||
return Response.json(
|
||||
{ message: `Invalid ${name} parameter` },
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
return Math.round(Number(rawValue) * 100);
|
||||
}
|
||||
|
||||
export function emptySearchResponse(hint?: string): Response {
|
||||
const message = hint
|
||||
? `Search didn't return any results! ${hint}`
|
||||
|
||||
@@ -3,6 +3,7 @@ import { logger } from "../logger";
|
||||
import {
|
||||
emptySearchResponse,
|
||||
getRequiredSearchQuery,
|
||||
parseDollarPriceParam,
|
||||
parseNonNegativeIntegerParam,
|
||||
} from "./helpers";
|
||||
|
||||
@@ -26,17 +27,11 @@ export async function kijijiRoute(req: Request): Promise<Response> {
|
||||
if (maxPages instanceof Response) {
|
||||
return maxPages;
|
||||
}
|
||||
const priceMin = parseNonNegativeIntegerParam(
|
||||
reqUrl.searchParams,
|
||||
"priceMin",
|
||||
);
|
||||
const priceMin = parseDollarPriceParam(reqUrl.searchParams, "priceMin");
|
||||
if (priceMin instanceof Response) {
|
||||
return priceMin;
|
||||
}
|
||||
const priceMax = parseNonNegativeIntegerParam(
|
||||
reqUrl.searchParams,
|
||||
"priceMax",
|
||||
);
|
||||
const priceMax = parseDollarPriceParam(reqUrl.searchParams, "priceMax");
|
||||
if (priceMax instanceof Response) {
|
||||
return priceMax;
|
||||
}
|
||||
|
||||
@@ -282,6 +282,24 @@ describe("API routes", () => {
|
||||
);
|
||||
});
|
||||
|
||||
test("kijijiRoute forwards dollar price filters to core as cents", async () => {
|
||||
const { kijijiRoute } = await import("../src/routes/kijiji");
|
||||
|
||||
await kijijiRoute(
|
||||
new Request(
|
||||
"http://localhost/api/kijiji?q=laptop&priceMin=999.99&priceMax=1000",
|
||||
),
|
||||
);
|
||||
|
||||
expect(fetchKijijiItems).toHaveBeenCalledWith(
|
||||
"laptop",
|
||||
4,
|
||||
"https://www.kijiji.ca",
|
||||
expect.objectContaining({ priceMin: 99_999, priceMax: 100_000 }),
|
||||
{},
|
||||
);
|
||||
});
|
||||
|
||||
test("kijijiRoute does not forward unstableFilter when false", async () => {
|
||||
const { kijijiRoute } = await import("../src/routes/kijiji");
|
||||
|
||||
@@ -414,6 +432,24 @@ describe("API routes", () => {
|
||||
);
|
||||
});
|
||||
|
||||
test("ebayRoute forwards dollar price filters to core as cents", async () => {
|
||||
const { ebayRoute } = await import("../src/routes/ebay");
|
||||
|
||||
fetchEbayItems.mockImplementation(() => Promise.resolve([{ title: "a" }]));
|
||||
|
||||
await ebayRoute(
|
||||
new Request(
|
||||
"http://localhost/api/ebay?q=macbook&minPrice=999.99&maxPrice=1000",
|
||||
),
|
||||
);
|
||||
|
||||
expect(fetchEbayItems).toHaveBeenCalledWith(
|
||||
"macbook",
|
||||
1,
|
||||
expect.objectContaining({ minPrice: 99_999, maxPrice: 100_000 }),
|
||||
);
|
||||
});
|
||||
|
||||
test("ebayRoute passes through scraper payload unchanged in unstable mode", async () => {
|
||||
const { ebayRoute } = await import("../src/routes/ebay");
|
||||
|
||||
@@ -730,16 +766,18 @@ describe("API routes", () => {
|
||||
expect(body.message).toBe("Invalid minPrice parameter");
|
||||
});
|
||||
|
||||
test("ebayRoute returns 400 for decimal minPrice", async () => {
|
||||
test("ebayRoute accepts decimal minPrice", async () => {
|
||||
const { ebayRoute } = await import("../src/routes/ebay");
|
||||
|
||||
const response = await ebayRoute(
|
||||
await ebayRoute(
|
||||
new Request("http://localhost/api/ebay?q=laptop&minPrice=1.5"),
|
||||
);
|
||||
|
||||
expect(response.status).toBe(400);
|
||||
const body = await response.json();
|
||||
expect(body.message).toBe("Invalid minPrice parameter");
|
||||
expect(fetchEbayItems).toHaveBeenCalledWith(
|
||||
"laptop",
|
||||
1,
|
||||
expect.objectContaining({ minPrice: 150 }),
|
||||
);
|
||||
});
|
||||
|
||||
test("ebayRoute returns 400 for non-integer maxPrice", async () => {
|
||||
@@ -766,16 +804,18 @@ describe("API routes", () => {
|
||||
expect(body.message).toBe("Invalid maxPrice parameter");
|
||||
});
|
||||
|
||||
test("ebayRoute returns 400 for decimal maxPrice", async () => {
|
||||
test("ebayRoute accepts decimal maxPrice", async () => {
|
||||
const { ebayRoute } = await import("../src/routes/ebay");
|
||||
|
||||
const response = await ebayRoute(
|
||||
await ebayRoute(
|
||||
new Request("http://localhost/api/ebay?q=laptop&maxPrice=1.5"),
|
||||
);
|
||||
|
||||
expect(response.status).toBe(400);
|
||||
const body = await response.json();
|
||||
expect(body.message).toBe("Invalid maxPrice parameter");
|
||||
expect(fetchEbayItems).toHaveBeenCalledWith(
|
||||
"laptop",
|
||||
1,
|
||||
expect.objectContaining({ maxPrice: 150 }),
|
||||
);
|
||||
});
|
||||
|
||||
test("kijijiRoute returns 400 for decimal maxPages", async () => {
|
||||
@@ -862,16 +902,20 @@ describe("API routes", () => {
|
||||
expect(body.message).toBe("Invalid priceMin parameter");
|
||||
});
|
||||
|
||||
test("kijijiRoute returns 400 for decimal priceMin", async () => {
|
||||
test("kijijiRoute accepts decimal priceMin", async () => {
|
||||
const { kijijiRoute } = await import("../src/routes/kijiji");
|
||||
|
||||
const response = await kijijiRoute(
|
||||
await kijijiRoute(
|
||||
new Request("http://localhost/api/kijiji?q=laptop&priceMin=1.5"),
|
||||
);
|
||||
|
||||
expect(response.status).toBe(400);
|
||||
const body = await response.json();
|
||||
expect(body.message).toBe("Invalid priceMin parameter");
|
||||
expect(fetchKijijiItems).toHaveBeenCalledWith(
|
||||
"laptop",
|
||||
4,
|
||||
"https://www.kijiji.ca",
|
||||
expect.objectContaining({ priceMin: 150 }),
|
||||
{},
|
||||
);
|
||||
});
|
||||
|
||||
test("kijijiRoute returns 400 for non-integer priceMin", async () => {
|
||||
@@ -934,16 +978,20 @@ describe("API routes", () => {
|
||||
expect(body.message).toBe("Invalid priceMax parameter");
|
||||
});
|
||||
|
||||
test("kijijiRoute returns 400 for decimal priceMax", async () => {
|
||||
test("kijijiRoute accepts decimal priceMax", async () => {
|
||||
const { kijijiRoute } = await import("../src/routes/kijiji");
|
||||
|
||||
const response = await kijijiRoute(
|
||||
await kijijiRoute(
|
||||
new Request("http://localhost/api/kijiji?q=laptop&priceMax=1.5"),
|
||||
);
|
||||
|
||||
expect(response.status).toBe(400);
|
||||
const body = await response.json();
|
||||
expect(body.message).toBe("Invalid priceMax parameter");
|
||||
expect(fetchKijijiItems).toHaveBeenCalledWith(
|
||||
"laptop",
|
||||
4,
|
||||
"https://www.kijiji.ca",
|
||||
expect.objectContaining({ priceMax: 150 }),
|
||||
{},
|
||||
);
|
||||
});
|
||||
|
||||
test("kijijiRoute returns 400 for non-integer priceMax", async () => {
|
||||
|
||||
@@ -10,8 +10,14 @@ import {
|
||||
type CookieConfig,
|
||||
ensureCookies,
|
||||
formatCookiesForHeader,
|
||||
loadCookiesOptional,
|
||||
parseCookieString,
|
||||
} from "../utils/cookies";
|
||||
import {
|
||||
buildFacebookHeaders,
|
||||
detectFacebookChallenge,
|
||||
warmFacebookSession,
|
||||
} from "../utils/facebook-challenge";
|
||||
import { formatCentsToCurrency } from "../utils/format";
|
||||
import { fetchHtml, HttpError, isRecord, RateLimitError } from "../utils/http";
|
||||
import { logger } from "../utils/logger";
|
||||
@@ -20,9 +26,10 @@ import { classifyUnstableListings } from "../utils/unstable";
|
||||
/**
|
||||
* Facebook Marketplace Scraper
|
||||
*
|
||||
* Note: Facebook Marketplace requires authentication cookies for full access.
|
||||
* This implementation will return limited or no results without proper authentication.
|
||||
* This is by design to respect Facebook's authentication requirements.
|
||||
* Facebook Marketplace returns search results without authentication when
|
||||
* proper browser headers are sent. Prices and seller details are hidden on
|
||||
* search results but are available on individual item pages even without
|
||||
* auth cookies. For full-price search results, provide FACEBOOK_COOKIE.
|
||||
*/
|
||||
|
||||
// Facebook cookie configuration
|
||||
@@ -263,20 +270,14 @@ function logExtractionMetrics(success: boolean, itemId?: string) {
|
||||
// ----------------------------- HTTP Client -----------------------------
|
||||
|
||||
function createFacebookHeaders(cookies: string): Record<string, string> {
|
||||
return {
|
||||
accept:
|
||||
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
|
||||
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
|
||||
"cache-control": "no-cache",
|
||||
"upgrade-insecure-requests": "1",
|
||||
"sec-fetch-dest": "document",
|
||||
"sec-fetch-mode": "navigate",
|
||||
"sec-fetch-site": "none",
|
||||
"sec-fetch-user": "?1",
|
||||
"user-agent":
|
||||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||
cookie: cookies,
|
||||
};
|
||||
const jar: Record<string, string> = {};
|
||||
if (cookies) {
|
||||
for (const pair of cookies.split(";")) {
|
||||
const [name, ...rest] = pair.trim().split("=");
|
||||
if (name && rest.length > 0) jar[name.trim()] = rest.join("=").trim();
|
||||
}
|
||||
}
|
||||
return buildFacebookHeaders(jar);
|
||||
}
|
||||
|
||||
// ----------------------------- Parsing -----------------------------
|
||||
@@ -286,13 +287,29 @@ export type FacebookResponseKind =
|
||||
| "item"
|
||||
| "auth_gated"
|
||||
| "unavailable"
|
||||
| "checkpoint"
|
||||
| "unknown";
|
||||
|
||||
export function classifyFacebookResponse(
|
||||
htmlString: HTMLString,
|
||||
responseUrl: string,
|
||||
status = 200,
|
||||
) {
|
||||
const challengeType = detectFacebookChallenge(
|
||||
status,
|
||||
htmlString,
|
||||
responseUrl,
|
||||
);
|
||||
if (challengeType === "checkpoint") {
|
||||
return {
|
||||
kind: "checkpoint" as const,
|
||||
authGated: false,
|
||||
unavailable: false,
|
||||
};
|
||||
}
|
||||
|
||||
const authGated =
|
||||
challengeType === "login_wall" ||
|
||||
responseUrl.includes("/login/") ||
|
||||
htmlString.includes("You must log in") ||
|
||||
htmlString.includes("log in to continue");
|
||||
@@ -764,6 +781,22 @@ export function extractFacebookItemData(
|
||||
return bestMatch.item;
|
||||
}
|
||||
|
||||
// Try marketplace_product_details_page.target path (current item page structure)
|
||||
for (const candidate of candidates) {
|
||||
const detailsPage = findKeyInObject(
|
||||
candidate,
|
||||
"marketplace_product_details_page",
|
||||
) as Record<string, unknown> | undefined;
|
||||
const target = detailsPage?.target as Record<string, unknown> | undefined;
|
||||
if (
|
||||
target &&
|
||||
typeof target.id === "string" &&
|
||||
typeof target.marketplace_listing_title === "string"
|
||||
) {
|
||||
return target as unknown as FacebookMarketplaceItem;
|
||||
}
|
||||
}
|
||||
|
||||
if (htmlString.includes("XCometMarketplacePermalinkController")) {
|
||||
return extractFacebookItemHtmlFallback(htmlString);
|
||||
}
|
||||
@@ -771,6 +804,25 @@ export function extractFacebookItemData(
|
||||
return null;
|
||||
}
|
||||
|
||||
function findKeyInObject(obj: unknown, targetKey: string): unknown {
|
||||
if (obj == null) return undefined;
|
||||
if (Array.isArray(obj)) {
|
||||
for (const item of obj) {
|
||||
const found = findKeyInObject(item, targetKey);
|
||||
if (found !== undefined) return found;
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
if (typeof obj !== "object") return undefined;
|
||||
const record = obj as Record<string, unknown>;
|
||||
if (targetKey in record) return record[targetKey];
|
||||
for (const [, value] of Object.entries(record)) {
|
||||
const found = findKeyInObject(value, targetKey);
|
||||
if (found !== undefined) return found;
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
/**
|
||||
Parse Facebook marketplace search results into ListingDetails[]
|
||||
*/
|
||||
@@ -1027,16 +1079,18 @@ export default async function fetchFacebookItems(
|
||||
};
|
||||
};
|
||||
|
||||
const cookies = await ensureFacebookCookies();
|
||||
const warmupCookies = await warmFacebookSession();
|
||||
const warmupHeader = Object.entries(warmupCookies)
|
||||
.map(([k, v]) => `${k}=${v}`)
|
||||
.join("; ");
|
||||
|
||||
const userCookies = await loadCookiesOptional(FACEBOOK_COOKIE_CONFIG);
|
||||
|
||||
// Format cookies for HTTP header
|
||||
const domain = "www.facebook.com";
|
||||
const cookiesHeader = formatCookiesForHeader(cookies, domain);
|
||||
if (!cookiesHeader) {
|
||||
throw new Error(
|
||||
"No valid Facebook cookies found. Please check that cookies are not expired and apply to facebook.com domain.",
|
||||
);
|
||||
}
|
||||
const userCookiesHeader = formatCookiesForHeader(userCookies, domain);
|
||||
const cookiesHeader = [warmupHeader, userCookiesHeader]
|
||||
.filter(Boolean)
|
||||
.join("; ");
|
||||
|
||||
const DELAY_MS = Math.max(1, Math.floor(1000 / requestsPerSecond));
|
||||
|
||||
@@ -1047,7 +1101,9 @@ export default async function fetchFacebookItems(
|
||||
const searchUrl = `https://www.facebook.com/marketplace/${LOCATION}/search?query=${encodedQuery}&sortBy=creation_time_descend&exact=false`;
|
||||
|
||||
logger.log(`Fetching Facebook marketplace: ${searchUrl}`);
|
||||
logger.log(`Using ${cookies.length} cookies for authentication`);
|
||||
if (userCookies.length > 0) {
|
||||
logger.log(`Using ${userCookies.length} cookies for authentication`);
|
||||
}
|
||||
|
||||
let searchHtml: string;
|
||||
let searchResponseUrl = searchUrl;
|
||||
@@ -1100,6 +1156,13 @@ export default async function fetchFacebookItems(
|
||||
return finalizeResults([]);
|
||||
}
|
||||
|
||||
if (classification.kind === "checkpoint") {
|
||||
logger.warn(
|
||||
"Facebook marketplace returned a checkpoint challenge. This may require manual verification.",
|
||||
);
|
||||
return finalizeResults([]);
|
||||
}
|
||||
|
||||
if (classification.unavailable) {
|
||||
logger.warn("Facebook marketplace search returned an unavailable route.");
|
||||
return finalizeResults([]);
|
||||
@@ -1149,15 +1212,8 @@ export default async function fetchFacebookItems(
|
||||
export async function fetchFacebookItem(
|
||||
itemId: string,
|
||||
): Promise<FacebookListingDetails | null> {
|
||||
const cookies = await ensureFacebookCookies();
|
||||
|
||||
// Format cookies for HTTP header
|
||||
const cookiesHeader = formatCookiesForHeader(cookies, "www.facebook.com");
|
||||
if (!cookiesHeader) {
|
||||
throw new Error(
|
||||
"No valid Facebook cookies found. Please check that cookies are not expired and apply to facebook.com domain.",
|
||||
);
|
||||
}
|
||||
const userCookies = await loadCookiesOptional(FACEBOOK_COOKIE_CONFIG);
|
||||
const cookiesHeader = formatCookiesForHeader(userCookies, "www.facebook.com");
|
||||
|
||||
const itemUrl = `https://www.facebook.com/marketplace/item/${itemId}/`;
|
||||
|
||||
@@ -1230,6 +1286,14 @@ export async function fetchFacebookItem(
|
||||
|
||||
const classification = classifyFacebookResponse(itemHtml, itemResponseUrl);
|
||||
|
||||
if (classification.kind === "checkpoint") {
|
||||
logExtractionMetrics(false, itemId);
|
||||
logger.warn(
|
||||
`Checkpoint challenge detected for item ${itemId}. Facebook may be limiting access.`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
if (classification.authGated) {
|
||||
logExtractionMetrics(false, itemId);
|
||||
logger.warn(
|
||||
|
||||
128
packages/core/src/utils/facebook-challenge.ts
Normal file
128
packages/core/src/utils/facebook-challenge.ts
Normal file
@@ -0,0 +1,128 @@
|
||||
// Facebook Marketplace session & challenge utilities
|
||||
|
||||
// ------------------ Types ------------------
|
||||
|
||||
export type ChallengeType =
|
||||
| "login_wall"
|
||||
| "checkpoint"
|
||||
| "bad_headers"
|
||||
| "rate_limited"
|
||||
| "none";
|
||||
|
||||
// ------------------ Constants ------------------
|
||||
|
||||
const FACEBOOK_BROWSER_HEADERS: Record<string, string> = {
|
||||
accept:
|
||||
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
|
||||
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
|
||||
"cache-control": "no-cache",
|
||||
"upgrade-insecure-requests": "1",
|
||||
"sec-fetch-dest": "document",
|
||||
"sec-fetch-mode": "navigate",
|
||||
"sec-fetch-site": "none",
|
||||
"sec-fetch-user": "?1",
|
||||
"sec-ch-ua":
|
||||
'"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
|
||||
"sec-ch-ua-mobile": "?0",
|
||||
"sec-ch-ua-platform": '"Linux"',
|
||||
"user-agent":
|
||||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
|
||||
};
|
||||
|
||||
// ------------------ Cookie Management ------------------
|
||||
|
||||
function parseSetCookies(setCookieHeaders: string[]): Record<string, string> {
|
||||
const cookies: Record<string, string> = {};
|
||||
for (const header of setCookieHeaders) {
|
||||
const parts = header.split(";");
|
||||
const firstPart = parts[0]?.trim();
|
||||
if (!firstPart) continue;
|
||||
const eqIdx = firstPart.indexOf("=");
|
||||
if (eqIdx === -1) continue;
|
||||
const name = firstPart.slice(0, eqIdx).trim();
|
||||
const value = firstPart.slice(eqIdx + 1).trim();
|
||||
if (name && value) {
|
||||
cookies[name] = value;
|
||||
}
|
||||
}
|
||||
return cookies;
|
||||
}
|
||||
|
||||
function cookiesToHeader(cookies: Record<string, string>): string {
|
||||
return Object.entries(cookies)
|
||||
.map(([name, value]) => `${name}=${value}`)
|
||||
.join("; ");
|
||||
}
|
||||
|
||||
// ------------------ Session Warmup ------------------
|
||||
|
||||
export async function warmFacebookSession(): Promise<Record<string, string>> {
|
||||
try {
|
||||
const res = await fetch("https://www.facebook.com/", {
|
||||
method: "GET",
|
||||
headers: FACEBOOK_BROWSER_HEADERS,
|
||||
redirect: "manual",
|
||||
signal: AbortSignal.timeout(10000),
|
||||
});
|
||||
|
||||
const setCookies = res.headers.getSetCookie?.() ?? [];
|
||||
return parseSetCookies(setCookies);
|
||||
} catch {
|
||||
return {};
|
||||
}
|
||||
}
|
||||
|
||||
// ------------------ Challenge Detection ------------------
|
||||
|
||||
export function detectFacebookChallenge(
|
||||
status: number,
|
||||
html: string,
|
||||
responseUrl: string,
|
||||
): ChallengeType {
|
||||
if (status === 400) {
|
||||
return "bad_headers";
|
||||
}
|
||||
|
||||
if (status === 429) {
|
||||
return "rate_limited";
|
||||
}
|
||||
|
||||
if (responseUrl.includes("/login/")) {
|
||||
return "login_wall";
|
||||
}
|
||||
|
||||
if (html.includes("You must log in") || html.includes("log in to continue")) {
|
||||
return "login_wall";
|
||||
}
|
||||
|
||||
if (
|
||||
responseUrl.includes("/checkpoint/") ||
|
||||
(html.includes("checkpoint") && html.includes("challenge"))
|
||||
) {
|
||||
return "checkpoint";
|
||||
}
|
||||
|
||||
return "none";
|
||||
}
|
||||
|
||||
// ------------------ Header Construction ------------------
|
||||
|
||||
export function buildFacebookHeaders(
|
||||
cookieJar: Record<string, string>,
|
||||
extraHeaders?: Record<string, string>,
|
||||
): Record<string, string> {
|
||||
const headers: Record<string, string> = {
|
||||
...FACEBOOK_BROWSER_HEADERS,
|
||||
};
|
||||
|
||||
const cookieString = cookiesToHeader(cookieJar);
|
||||
if (cookieString) {
|
||||
headers.cookie = cookieString;
|
||||
}
|
||||
|
||||
if (extraHeaders) {
|
||||
Object.assign(headers, extraHeaders);
|
||||
}
|
||||
|
||||
return headers;
|
||||
}
|
||||
@@ -50,11 +50,11 @@ export const tools = [
|
||||
},
|
||||
priceMin: {
|
||||
type: "number",
|
||||
description: "Minimum price in cents",
|
||||
description: "Minimum price in dollars",
|
||||
},
|
||||
priceMax: {
|
||||
type: "number",
|
||||
description: "Maximum price in cents",
|
||||
description: "Maximum price in dollars",
|
||||
},
|
||||
unstableFilter: {
|
||||
type: "boolean",
|
||||
@@ -107,11 +107,11 @@ export const tools = [
|
||||
},
|
||||
minPrice: {
|
||||
type: "number",
|
||||
description: "Minimum price filter",
|
||||
description: "Minimum price in dollars",
|
||||
},
|
||||
maxPrice: {
|
||||
type: "number",
|
||||
description: "Maximum price filter",
|
||||
description: "Maximum price in dollars",
|
||||
},
|
||||
strictMode: {
|
||||
type: "boolean",
|
||||
|
||||
@@ -128,6 +128,46 @@ describe("MCP protocol unstableFilter", () => {
|
||||
expect(String(calledUrl)).toContain("unstableFilter=true");
|
||||
});
|
||||
|
||||
test("search_kijiji should document price filters as dollars", () => {
|
||||
const tool = tools.find((candidate) => candidate.name === "search_kijiji");
|
||||
|
||||
const priceMin = tool?.inputSchema.properties.priceMin as {
|
||||
description: string;
|
||||
};
|
||||
const priceMax = tool?.inputSchema.properties.priceMax as {
|
||||
description: string;
|
||||
};
|
||||
|
||||
expect(priceMin.description).toContain("dollars");
|
||||
expect(priceMax.description).toContain("dollars");
|
||||
});
|
||||
|
||||
test("handler should forward Kijiji dollar price filters to API", async () => {
|
||||
await handleMcpRequest(
|
||||
new Request("http://localhost", {
|
||||
method: "POST",
|
||||
body: JSON.stringify({
|
||||
jsonrpc: "2.0",
|
||||
id: 1,
|
||||
method: "tools/call",
|
||||
params: {
|
||||
name: "search_kijiji",
|
||||
arguments: {
|
||||
query: "macbook",
|
||||
priceMin: 999.99,
|
||||
priceMax: 1000,
|
||||
},
|
||||
},
|
||||
}),
|
||||
}),
|
||||
);
|
||||
|
||||
const calledUrl = (global.fetch as unknown as ReturnType<typeof mock>).mock
|
||||
.calls[0]?.[0];
|
||||
expect(String(calledUrl)).toContain("priceMin=999.99");
|
||||
expect(String(calledUrl)).toContain("priceMax=1000");
|
||||
});
|
||||
|
||||
test("handler should forward unstableFilter=true for search_facebook", async () => {
|
||||
await handleMcpRequest(
|
||||
new Request("http://localhost", {
|
||||
@@ -204,4 +244,44 @@ describe("MCP protocol unstableFilter", () => {
|
||||
.calls[0]?.[0];
|
||||
expect(String(calledUrl)).toContain("unstableFilter=true");
|
||||
});
|
||||
|
||||
test("search_ebay should document price filters as dollars", () => {
|
||||
const tool = tools.find((candidate) => candidate.name === "search_ebay");
|
||||
|
||||
const minPrice = tool?.inputSchema.properties.minPrice as {
|
||||
description: string;
|
||||
};
|
||||
const maxPrice = tool?.inputSchema.properties.maxPrice as {
|
||||
description: string;
|
||||
};
|
||||
|
||||
expect(minPrice.description).toContain("dollars");
|
||||
expect(maxPrice.description).toContain("dollars");
|
||||
});
|
||||
|
||||
test("handler should forward eBay dollar price filters to API", async () => {
|
||||
await handleMcpRequest(
|
||||
new Request("http://localhost", {
|
||||
method: "POST",
|
||||
body: JSON.stringify({
|
||||
jsonrpc: "2.0",
|
||||
id: 1,
|
||||
method: "tools/call",
|
||||
params: {
|
||||
name: "search_ebay",
|
||||
arguments: {
|
||||
query: "macbook",
|
||||
minPrice: 999.99,
|
||||
maxPrice: 1000,
|
||||
},
|
||||
},
|
||||
}),
|
||||
}),
|
||||
);
|
||||
|
||||
const calledUrl = (global.fetch as unknown as ReturnType<typeof mock>).mock
|
||||
.calls[0]?.[0];
|
||||
expect(String(calledUrl)).toContain("minPrice=999.99");
|
||||
expect(String(calledUrl)).toContain("maxPrice=1000");
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user