Files
ca-marketplace-scraper/FMARKETPLACE.md
Dmytro Stanchiev 50d56201af feat: port upstream scraper improvements to monorepo
Kijiji improvements:
- Add error classes: NetworkError, ParseError, RateLimitError, ValidationError
- Add exponential backoff with jitter for retries
- Add request timeout (30s abort)
- Add pagination support (SearchOptions.maxPages)
- Add location/category mappings and resolution functions
- Add enhanced DetailedListing interface with images, seller info, attributes
- Add GraphQL client for seller details

Facebook improvements:
- Add parseFacebookCookieString() for parsing cookie strings
- Add ensureFacebookCookies() with env var fallback
- Add extractFacebookItemData() with multiple extraction paths
- Add fetchFacebookItem() for individual item fetching
- Add extraction metrics and API stability monitoring
- Add vehicle-specific field extraction
- Improve error handling with specific guidance for auth errors

Shared utilities:
- Update http.ts with new error classes and improved fetchHtml

Documentation:
- Port KIJIJI.md, FMARKETPLACE.md, AGENTS.md from upstream

Tests:
- Port kijiji-core, kijiji-integration, kijiji-utils tests
- Port facebook-core, facebook-integration tests
- Add test setup file

Scripts:
- Port parse-facebook-cookies.ts script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 00:34:50 -05:00

382 lines
14 KiB
Markdown

# Facebook Marketplace API Reverse Engineering
## Overview
This document tracks findings from reverse-engineering Facebook Marketplace APIs for listing details.
## Current Implementation Status
- Search functionality: Implemented in `src/facebook.ts`
- Individual listing details: Not yet implemented
## Findings
### Step 1: Initial Setup
- Using Chrome DevTools to inspect Facebook Marketplace
- Need to authenticate with Facebook account to access marketplace data
- Cookies required for full access
- Current status: Successfully logged in and accessed marketplace data
### Step 2: Individual Listing Details Analysis - COMPLETED
- **Data Location**: Embedded in HTML script tags within `require` array structure
- **Path**: `require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Authentication**: Required for full data access
- **Current Status**: Successfully reverse-engineered the API structure and data extraction method
### API Endpoints Discovered
#### Search Endpoint
- URL: `https://www.facebook.com/marketplace/{location}/search`
- Parameters: `query`, `sortBy`, `exact`
- Data embedded in HTML script tags with `require` structure
- Authentication: Required (cookies)
#### Listing Details Endpoint
- **URL Structure**: `https://www.facebook.com/marketplace/item/{listing_id}/`
- **Data Source**: Server-side rendered HTML with embedded JSON data in script tags
- **Data Structure**: Relay/GraphQL style data structure under `require[0][3].__bbox.require[...].__bbox.result.data.viewer.marketplace_product_details_page.target`
- **Extraction Method**: Parse JSON from script tags containing marketplace data, navigate to the target object
- **Authentication**: Required (cookies)
### Listing Data Structure Discovered (Current - 2026)
The current Facebook Marketplace API returns a comprehensive `GroupCommerceProductItem` object with the following key properties:
```typescript
interface FacebookMarketplaceItem {
// Basic identification
id: string;
__typename: "GroupCommerceProductItem";
// Listing content
marketplace_listing_title: string;
redacted_description: {
text: string;
};
custom_title?: string;
// Pricing
formatted_price: {
text: string;
};
listing_price: {
amount: string;
currency: string;
amount_with_offset: string;
};
// Location
location_text: {
text: string;
};
location: {
latitude: number;
longitude: number;
reverse_geocode_detailed: {
country_alpha_two: string;
postal_code_trimmed: string;
};
};
// Status flags
is_live: boolean;
is_sold: boolean;
is_pending: boolean;
is_hidden: boolean;
is_draft: boolean;
// Timing
creation_time: number;
// Seller information
marketplace_listing_seller: {
__typename: "User";
id: string;
name: string;
profile_picture?: {
uri: string;
};
join_time?: number;
};
// Vehicle-specific fields (for automotive listings)
vehicle_make_display_name?: string;
vehicle_model_display_name?: string;
vehicle_odometer_data?: {
unit: "KILOMETERS" | "MILES";
value: number;
};
vehicle_transmission_type?: "AUTOMATIC" | "MANUAL";
vehicle_exterior_color?: string;
vehicle_interior_color?: string;
vehicle_condition?: "EXCELLENT" | "GOOD" | "FAIR" | "POOR";
vehicle_fuel_type?: string;
vehicle_trim_display_name?: string;
// Category and commerce
marketplace_listing_category_id: string;
condition?: string;
// Commerce features
delivery_types?: string[];
is_shipping_offered?: boolean;
is_buy_now_enabled?: boolean;
can_buyer_make_checkout_offer?: boolean;
// Communication
messaging_enabled?: boolean;
first_message_suggested_value?: string;
// Metadata
logging_id: string;
reportable_ent_id: string;
origin_target?: {
__typename: "Marketplace";
id: string;
};
// Related listings (for part-out sellers)
marketplace_listing_sets?: {
edges: Array<{
node: {
canonical_listing: {
id: string;
marketplace_listing_title: string;
is_live: boolean;
is_sold: boolean;
formatted_price: { text: string };
};
};
}>;
};
}
```
### Example Data Extracted (Current Structure)
```json
{
"__typename": "GroupCommerceProductItem",
"marketplace_listing_title": "2012 Mazda MAZDA 3 PART-OUT",
"id": "1211645920845312",
"redacted_description": {
"text": "FOR PARTS ONLY!!!"
},
"custom_title": "2012 Mazda 3 part-out",
"creation_time": 1760450080,
"location_text": {
"text": "Toronto, ON"
},
"is_live": true,
"is_sold": false,
"is_pending": false,
"is_hidden": false,
"formatted_price": {
"text": "FREE"
},
"listing_price": {
"amount_with_offset": "0",
"currency": "CAD",
"amount": "0.00"
},
"condition": "USED",
"logging_id": "24676483845336407",
"marketplace_listing_category_id": "807311116002614",
"marketplace_listing_seller": {
"__typename": "User",
"id": "61570613529010",
"name": "Jay Heshin",
"profile_picture": {
"uri": "https://scontent-yyz1-1.xx.fbcdn.net/v/t39.30808-1/480952111_122133462296687117_4145652046222010716_n.jpg?stp=cp6_dst-jpg_s50x50_tt6&_nc_cat=108&ccb=1-7&_nc_sid=e99d92&_nc_ohc=x_DTkeriVbgQ7kNvwEqT_x3&_nc_oc=Adnqnqf4YsZxgMIkR2mSFrdLb6-BDw4omCWqG_cqB-H0uXGgK1l4-T-fLSGB_CQJEKo&_nc_zt=24&_nc_ht=scontent-yyz1-1.xx&_nc_gid=7GnSwn4MSbllAgGWJy0RTQ&oh=00_AfpY66l8w-LvHvZ6tTgiD9Qh-Or_Udc-OaFiVL9pQ0YXsg&oe=697797CD"
}
},
"vehicle_condition": "FAIR",
"vehicle_exterior_color": "white",
"vehicle_interior_color": "",
"vehicle_make_display_name": "Mazda",
"vehicle_model_display_name": "3 part-out",
"vehicle_odometer_data": {
"unit": "KILOMETERS",
"value": 999999
},
"vehicle_transmission_type": "AUTOMATIC",
"location": {
"latitude": 43.651428222656,
"longitude": -79.436645507812,
"reverse_geocode_detailed": {
"country_alpha_two": "CA",
"postal_code_trimmed": "M6H 1C1"
}
},
"delivery_types": ["IN_PERSON"],
"messaging_enabled": true,
"first_message_suggested_value": "Hi, is this available?",
"marketplace_listing_sets": {
"edges": [
{
"node": {
"canonical_listing": {
"id": "1435935788228627",
"marketplace_listing_title": "2004 Land Rover LR2 PART-OUT",
"is_live": true,
"formatted_price": {"text": "FREE"}
}
}
}
]
}
}
```
## Data Extraction Method
### Current Method (2026)
Facebook Marketplace listing data is embedded in JSON within `<script>` tags in the HTML response. The extraction process:
1. **Find the Correct Script**: Look for script tags containing marketplace listing data by searching for key fields like `marketplace_listing_title`, `redacted_description`, and `formatted_price`.
2. **Parse JSON Structure**: The data is nested within a `require` array structure:
```
require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target
```
3. **Navigate to Target Object**: The actual listing data is a `GroupCommerceProductItem` object containing comprehensive information about the listing, seller, and vehicle details.
4. **Handle Dynamic Structure**: Facebook may change the exact path, so robust extraction should search for the target object recursively within the parsed JSON.
### Authentication Requirements
- Valid Facebook session cookies are required
- User must be logged in to Facebook
- Marketplace access may be location-restricted
## Tools Used
- Chrome DevTools Protocol
- Network monitoring
- HTML/script parsing
- JSON structure analysis
## Implementation Status
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
- ✅ Identified current data structure and extraction method (2026)
- ✅ Documented comprehensive GroupCommerceProductItem interface
- ✅ Implemented `extractFacebookItemData()` function with script parsing logic
- ✅ Implemented `parseFacebookItem()` function to convert GroupCommerceProductItem to ListingDetails
- ✅ Implemented `fetchFacebookItem()` function with authentication and error handling
- ✅ Updated TypeScript interfaces to match current API structure
- ✅ Added robust extraction with fallback methods for changing API paths
## Implementation Details
### Core Functions Implemented
1. **`extractFacebookItemData(htmlString)`**: Extracts marketplace item data from HTML-embedded JSON in script tags
- Searches for scripts containing marketplace listing data
- Uses primary path: `require[0][3][0].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
- Falls back to recursive search for GroupCommerceProductItem objects
2. **`parseFacebookItem(item)`**: Converts Facebook's GroupCommerceProductItem to unified ListingDetails format
- Handles pricing (FREE listings, CAD currency)
- Extracts seller information, location, and status
- Supports vehicle-specific metadata
- Maps Facebook-specific fields to common interface
3. **`fetchFacebookItem(itemId, cookiesSource?)`**: Fetches individual listing details
- Loads Facebook authentication cookies
- Makes authenticated HTTP requests
- Handles rate limiting and retries
- Returns parsed ListingDetails or null on failure
### Authentication Requirements
- Facebook session cookies required in `./cookies/facebook.json` or provided as parameter
- Cookies must include valid authentication tokens for marketplace access
- Handles cookie expiration and domain validation
## Current Implementation Status - 2026 Verification
### Step 3: API Verification and Current Structure Analysis (January 2026)
- **Verification Date**: January 22, 2026
- **Status**: Successfully verified current Facebook Marketplace API structure
- **Data Source**: Embedded JSON in HTML script tags (server-side rendered)
- **Extraction Path**: `require[0][3].__bbox.require[3][3][1].__bbox.result.data.viewer.marketplace_product_details_page.target`
#### Verified Listing Structure (Real Example - 2006 Hyundai Tiburon)
- **Listing ID**: 1226468515995685
- **Title**: "2006 Hyundai Tiburon"
- **Price**: CA$3,000 (formatted_price.text)
- **Raw Price Data**: {"amount_with_offset": "300000", "currency": "CAD", "amount": "3000.00"}
- **Location**: Hamilton, ON (with coordinates: 43.250427246094, -79.963989257812)
- **Description**: "As is" (redacted_description.text)
- **Vehicle Details**:
- Make: Hyundai
- Model: Tiburon
- Odometer: 194,000 km
- Transmission: AUTOMATIC
- Exterior Color: blue
- Interior Color: black
- Fuel Type: GASOLINE
- Number of Owners: TWO
- **Seller Information**:
- Name: Ajitpal Kaler
- ID: 100009257293466
- Profile Picture Available
- Join Time: 1426564800 (2015)
- **Listing Status**: Active (is_live: true, is_sold: false, is_pending: false)
- **Category**: 807311116002614 (Vehicles)
- **Delivery Types**: ["IN_PERSON"]
- **Messaging**: Enabled
#### Current API Characteristics
- **Authentication**: Still requires valid Facebook session cookies
- **Data Format**: Server-side rendered HTML with embedded GraphQL/Relay JSON
- **Structure Stability**: Primary extraction path remains functional
- **Additional Features**: Includes marketplace ratings, seller verification badges, cross-posting info
### API Changes Observed Since 2024 Documentation
- **Minimal Changes**: Core data structure largely unchanged
- **Enhanced Fields**: Added more detailed vehicle specifications and seller profile information
- **GraphQL Integration**: Deeper integration with Facebook's GraphQL infrastructure
- **Security Features**: Additional integrity checks and reporting mechanisms
### Multi-Category Testing Results (January 2026)
Successfully tested extraction across different listing categories:
#### 1. Vehicle Listings (Automotive)
- **Example**: 2006 Hyundai Tiburon (ID: 1226468515995685)
- **Status**: ✅ Fully functional
- **Data Extracted**: Complete vehicle specs, pricing, seller info, location coordinates
- **Unique Fields**: vehicle_make_display_name, vehicle_odometer_data, vehicle_transmission_type, vehicle_exterior_color, vehicle_interior_color, vehicle_fuel_type
#### 2. Electronics Listings
- **Example**: Nintendo Switch (ID: 3903865769914262)
- **Status**: ✅ Fully functional
- **Data Extracted**: Title, price (CA$140), location (Toronto, ON), condition (Used - like new), seller (Yitao Hou)
- **Category**: Electronics (category_id: 479353692612078)
- **Notes**: Standard GroupCommerceProductItem structure applies
#### 3. Home Goods/Furniture Listings
- **Example**: Tabletop Mirror (cat not included) (ID: 1082389057290709)
- **Status**: ✅ Fully functional
- **Data Extracted**: Title, price (CA$5), location (Mississauga, ON), condition (Used - like new), seller (Rohit Rehan)
- **Category**: Home Goods (category_id: 1569171756675761)
- **Notes**: Includes detailed description and delivery options
#### Testing Summary
- **Extraction Method**: Consistent across all categories
- **Data Structure**: GroupCommerceProductItem interface works for all listing types
- **Authentication**: Required for all categories
- **Rate Limiting**: Standard Facebook rate limits apply
- **Edge Cases**: All tested listings were active/in-person pickup
## Implementation Status - COMPLETED (January 2026)
- ✅ Successfully reverse-engineered Facebook Marketplace API for listing details
- ✅ Verified current API structure and extraction method (January 2026)
- ✅ Tested extraction across multiple listing categories (vehicles, electronics, home goods)
- ✅ Implemented comprehensive error handling for sold/removed listings and authentication failures
- ✅ Enhanced rate limiting and retry logic (already robust)
- ✅ Added monitoring and metrics for API stability detection
- ✅ Updated all scraper functions to use verified extraction methods
- ✅ Documented comprehensive GroupCommerceProductItem interface with real examples
## Next Steps (Future Maintenance)
1. Monitor extraction success rates for API change detection
2. Update extraction paths if Facebook changes their API structure
3. Add support for additional marketplace features as they become available
4. Implement caching mechanisms for improved performance
5. Add support for marketplace messaging and negotiation features