Compare commits
5 Commits
7da6408d7a
...
daa61c25d8
| Author | SHA1 | Date | |
|---|---|---|---|
| daa61c25d8 | |||
| 87aa31cf1b | |||
| bdf504ba37 | |||
| 589af630fa | |||
| 8ae42d5630 |
135
CLAUDE.md
135
CLAUDE.md
@@ -1,110 +1,33 @@
|
|||||||
# CLAUDE.md
|
# AGENTS.md
|
||||||
|
|
||||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
This file provides guidance to coding agents when working with code in this repository.
|
||||||
|
|
||||||
## Common Commands
|
|
||||||
|
|
||||||
- `bun start`: Run the server in production mode.
|
|
||||||
- `bun dev`: Run the server with hot reloading for development.
|
|
||||||
- `bun build`: Build the application into a single executable file.
|
|
||||||
|
|
||||||
No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed.
|
|
||||||
|
|
||||||
## Code Architecture
|
|
||||||
|
|
||||||
This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA).
|
|
||||||
|
|
||||||
- **Entry Point (`src/index.ts`)**: Implements a basic HTTP server using `Bun.serve`. Key routes:
|
|
||||||
- `GET /api/status`: Health check returning "OK".
|
|
||||||
- `GET /api/kijiji?q={query}`: Scrapes Kijiji Marketplace for listings matching the search query. Returns JSON array of listing objects.
|
|
||||||
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}`: Scrapes Facebook Marketplace for listings. Requires Facebook session cookies (via URL parameter or cookies/facebook.json file). Optional `location` param (default "toronto"). Returns JSON array of listing objects.
|
|
||||||
- Fallback: 404 for unmatched routes.
|
|
||||||
|
|
||||||
## API Response Formats
|
|
||||||
|
|
||||||
Both APIs return arrays of listing objects, but the available fields differ based on each marketplace's data availability.
|
|
||||||
|
|
||||||
### Kijiji API Response Object
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"url": "https://www.kijiji.ca/v-laptops/city-of-toronto/...",
|
|
||||||
"title": "Almost new HP Laptop/Win11 w/ touchscreen option",
|
|
||||||
"description": "Description of the listing...",
|
|
||||||
"listingPrice": {
|
|
||||||
"amountFormatted": "149.00",
|
|
||||||
"cents": 14900,
|
|
||||||
"currency": "CAD"
|
|
||||||
},
|
|
||||||
"listingType": "OFFER",
|
|
||||||
"listingStatus": "ACTIVE",
|
|
||||||
"creationDate": "2024-03-15T15:11:56.000Z",
|
|
||||||
"endDate": "3000-01-01T00:00:00.000Z",
|
|
||||||
"numberOfViews": 2005,
|
|
||||||
"address": "SPADINA AVENUE, Toronto, ON, M5T 2H7"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Facebook API Response Object
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"url": "https://www.facebook.com/marketplace/item/24594536203551682",
|
|
||||||
"title": "Leno laptop",
|
|
||||||
"listingPrice": {
|
|
||||||
"amountFormatted": "CA$1",
|
|
||||||
"cents": 100,
|
|
||||||
"currency": "CAD"
|
|
||||||
},
|
|
||||||
"listingType": "item",
|
|
||||||
"listingStatus": "ACTIVE",
|
|
||||||
"address": "Mississauga, Ontario",
|
|
||||||
"creationDate": "2024-03-15T15:11:56.000Z",
|
|
||||||
"categoryId": "1792291877663080",
|
|
||||||
"imageUrl": "https://scontent-yyz1-1.xx.fbcdn.net/...",
|
|
||||||
"videoUrl": "https://www.facebook.com/1300609777949414/",
|
|
||||||
"seller": {
|
|
||||||
"name": "Joyce Diaz",
|
|
||||||
"id": "100091799187797"
|
|
||||||
},
|
|
||||||
"deliveryTypes": ["IN_PERSON"]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Common Fields
|
|
||||||
- `url`: Full URL to the listing
|
|
||||||
- `title`: Listing title
|
|
||||||
- `listingPrice`: Price object with `amountFormatted` (human-readable), `cents` (integer cents), `currency` (e.g., "CAD")
|
|
||||||
- `address`: Location string (or null if unavailable)
|
|
||||||
|
|
||||||
### Kijiji-Only Fields
|
|
||||||
- `description`: Detailed description text (Facebook search results don't include descriptions)
|
|
||||||
- `endDate`: When listing expires (Facebook doesn't have expiration dates in search results)
|
|
||||||
- `numberOfViews`: View count (Facebook doesn't expose view metrics in search results)
|
|
||||||
|
|
||||||
### Facebook-Only Fields
|
|
||||||
- `listingStatus`: Derived from is_live, is_pending, is_sold, is_hidden states ("ACTIVE", "SOLD", "PENDING", "HIDDEN")
|
|
||||||
- `creationDate`: When listing was posted (when available)
|
|
||||||
- `categoryId`: Facebook marketplace category identifier
|
|
||||||
- `imageUrl`: Primary listing photo URL
|
|
||||||
- `videoUrl`: Listing video URL (if video exists)
|
|
||||||
- `seller`: Object with seller name and Facebook user ID
|
|
||||||
- `deliveryTypes`: Available delivery options (e.g., ["IN_PERSON", "SHIPPING"])
|
|
||||||
|
|
||||||
- **Kijiji Scraping (`src/kijiji.ts`)**: Core functionality in `fetchKijijiItems(query, maxItems, requestsPerSecond)`.
|
|
||||||
- Slugifies the query using `unidecode` for URL-safe search terms.
|
|
||||||
- Fetches the search page HTML, parses Next.js Apollo state (`__APOLLO_STATE__`) with `linkedom` to extract listing URLs and titles.
|
|
||||||
- For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.).
|
|
||||||
- Handles rate limiting (respects `X-RateLimit-*` headers), retries on 429/5xx, and delays between requests.
|
|
||||||
- Uses `cli-progress` for console progress bar during batch fetches.
|
|
||||||
- Filters results to include only priced items.
|
|
||||||
|
|
||||||
- **Facebook Scraping (`src/facebook.ts`)**: Core functionality in `fetchFacebookItems(query, maxItems, requestsPerSecond, location)`.
|
|
||||||
- Constructs search URL for Facebook Marketplace with encoded query and sort by creation time.
|
|
||||||
- Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with `linkedom` to extract ad nodes from `marketplace_search.feed_units.edges`.
|
|
||||||
- Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed.
|
|
||||||
- Handles delays and retries similar to Kijiji.
|
|
||||||
- Uses `cli-progress` for progress.
|
|
||||||
- Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login.
|
|
||||||
|
|
||||||
The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
|
The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
|
||||||
|
|
||||||
Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements.
|
PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE
|
||||||
|
|
||||||
|
## Communication Style
|
||||||
|
|
||||||
|
ALWAYS talk and converse with the user using Gen-Z and Internet slang.
|
||||||
|
|
||||||
|
Absolute Mode
|
||||||
|
- Eliminate emojis, filler, hype, transitions, appendixes.
|
||||||
|
- Use blunt, directive phrasing; no mirroring, no softening.
|
||||||
|
- Suppress sentiment-boosting, engagement, or satisfaction metrics.
|
||||||
|
- No questions, offers, suggestions, or motivational content.
|
||||||
|
- Deliver info only; end immediately after.
|
||||||
|
|
||||||
|
**Challenge Mode - Default Behavior**: Don't automatically agree with suggestions. Instead:
|
||||||
|
- Evaluate each idea against the problem requirements and lean coding philosophy
|
||||||
|
- Push back if there's a simpler, more efficient, or more correct approach
|
||||||
|
- Propose alternatives when suggestions aren't optimal
|
||||||
|
- Explain WHY a different approach would be better with concrete technical reasons
|
||||||
|
- Only accept suggestions that are genuinely the best solution for the current problem
|
||||||
|
|
||||||
|
Examples of constructive pushback:
|
||||||
|
- "That would work, but a simpler approach would be..."
|
||||||
|
- "Actually, that might cause [specific issue]. Instead, we should..."
|
||||||
|
- "The lean approach here would be to..."
|
||||||
|
- "That adds unnecessary complexity. We can achieve the same with..."
|
||||||
|
|
||||||
|
This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality
|
||||||
|
|||||||
448
KIJIJI.md
Normal file
448
KIJIJI.md
Normal file
@@ -0,0 +1,448 @@
|
|||||||
|
# Kijiji API Findings
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management.
|
||||||
|
|
||||||
|
## Initial Page Load (Homepage)
|
||||||
|
- **URL**: https://www.kijiji.ca/
|
||||||
|
- **Architecture**: Server-side rendered React application with Next.js
|
||||||
|
- **Data Sources**:
|
||||||
|
- Static assets loaded from `webapp-static.ca-kijiji-production.classifiedscloud.io`
|
||||||
|
- Image media served from `media.kijiji.ca/api/v1/`
|
||||||
|
- No initial API calls for listings - data appears to be embedded in HTML
|
||||||
|
|
||||||
|
## Search Results Page
|
||||||
|
- **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0`
|
||||||
|
- **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0`
|
||||||
|
- **Technology Stack**: Next.js with Apollo GraphQL client
|
||||||
|
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized GraphQL cache
|
||||||
|
|
||||||
|
### GraphQL Data Structure
|
||||||
|
|
||||||
|
#### Data Location
|
||||||
|
Search results data is embedded in the Next.js page props under `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data.
|
||||||
|
|
||||||
|
#### Search Results Container
|
||||||
|
The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination parameters.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0": { ... },
|
||||||
|
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0?page=2": { ... }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Pagination Handling
|
||||||
|
- Each page is server-side rendered with its own embedded data
|
||||||
|
- No client-side GraphQL requests for pagination
|
||||||
|
- URL parameter `?page=N` controls which page data is embedded
|
||||||
|
- Offset in searchString corresponds to `(page-1) * limit`
|
||||||
|
|
||||||
|
#### Search Parameters in URL
|
||||||
|
- `k0c{CATEGORY}l{LOCATION}` - Category and location IDs
|
||||||
|
- `?page=N` - Page number (1-based)
|
||||||
|
- Data contains `offset` and `limit` for API-style pagination
|
||||||
|
|
||||||
|
#### Individual Listing Structure
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "1732061412",
|
||||||
|
"title": "iPhone 13",
|
||||||
|
"description": "iPhone 13, always had a screen protector on it...",
|
||||||
|
"imageCount": 3,
|
||||||
|
"imageUrls": ["https://media.kijiji.ca/api/v1/ca-prod-fsbo-ads/images/..."],
|
||||||
|
"categoryId": 760,
|
||||||
|
"url": "https://www.kijiji.ca/v-cell-phone/...",
|
||||||
|
"activationDate": "2026-01-21T16:51:16.000Z",
|
||||||
|
"sortingDate": "2026-01-21T16:51:16.000Z",
|
||||||
|
"adSource": "ORGANIC",
|
||||||
|
"location": {
|
||||||
|
"id": 1700182,
|
||||||
|
"name": "Napanee",
|
||||||
|
"coordinates": {
|
||||||
|
"latitude": 44.48774,
|
||||||
|
"longitude": -76.99519
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"price": {
|
||||||
|
"type": "FIXED",
|
||||||
|
"amount": 35000
|
||||||
|
},
|
||||||
|
"flags": {
|
||||||
|
"topAd": false,
|
||||||
|
"priceDrop": false
|
||||||
|
},
|
||||||
|
"posterInfo": {
|
||||||
|
"posterId": "1000764154",
|
||||||
|
"rating": 5
|
||||||
|
},
|
||||||
|
"attributes": [
|
||||||
|
{
|
||||||
|
"canonicalName": "forsaleby",
|
||||||
|
"canonicalValues": ["ownr"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"canonicalName": "phonecarrier",
|
||||||
|
"canonicalValues": ["unlck"]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### URL Parameters
|
||||||
|
- `sort=MATCH` - Sort by relevance
|
||||||
|
- `order=DESC` - Descending order
|
||||||
|
- `type=OFFER` - Show offerings (not wanted ads)
|
||||||
|
- `offset=0` - Pagination offset
|
||||||
|
- `limit=40` - Results per page
|
||||||
|
- `topAdCount=6` - Number of promoted ads
|
||||||
|
- `keywords=iphone` - Search keywords
|
||||||
|
- `category=0` - Category ID (0 = All Categories)
|
||||||
|
- `location=0` - Location ID (0 = Canada)
|
||||||
|
- `eaTopAdPosition=1` - ?
|
||||||
|
|
||||||
|
### Image API
|
||||||
|
- **Endpoint**: `https://media.kijiji.ca/api/v1/`
|
||||||
|
- **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg`
|
||||||
|
- **Sizes**: 200, 300, 400, 500 pixels
|
||||||
|
|
||||||
|
### Categories and Locations
|
||||||
|
|
||||||
|
#### Category Structure
|
||||||
|
Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include:
|
||||||
|
|
||||||
|
| ID | Name | Total Results (iPhone search) |
|
||||||
|
|----|------|------------------------------|
|
||||||
|
| 10 | Buy & Sell | 19956 |
|
||||||
|
| 12 | Arts & Collectibles | 149 |
|
||||||
|
| 767 | Audio | 481 |
|
||||||
|
| 253 | Baby Items | 13 |
|
||||||
|
| 931 | Bags & Luggage | 8 |
|
||||||
|
| 644 | Bikes | 46 |
|
||||||
|
| 109 | Books | 21 |
|
||||||
|
| 103 | Cameras & Camcorders | 101 |
|
||||||
|
| 104 | CDs, DVDs & Blu-ray | 102 |
|
||||||
|
| 274 | Clothing | 83 |
|
||||||
|
| 16 | Computers | 285 |
|
||||||
|
| 128 | Computer Accessories | 363 |
|
||||||
|
| 29659001 | Electronics | 2006 |
|
||||||
|
| 17220001 | Free Stuff | 23 |
|
||||||
|
| 235 | Furniture | 29 |
|
||||||
|
| 638 | Garage Sales | 5 |
|
||||||
|
| 140 | Health & Special Needs | 30 |
|
||||||
|
| 139 | Hobbies & Crafts | 10 |
|
||||||
|
| 107 | Home Appliances | 23 |
|
||||||
|
| 717 | Home - Indoor | 27 |
|
||||||
|
| 727 | Home Renovation Materials | 14 |
|
||||||
|
| 133 | Jewellery & Watches | 83 |
|
||||||
|
| 17 | Musical Instruments | 34 |
|
||||||
|
| 132 | Phones | 15518 |
|
||||||
|
| 111 | Sporting Goods & Exercise | 30 |
|
||||||
|
| 110 | Tools | 25 |
|
||||||
|
| 108 | Toys & Games | 38 |
|
||||||
|
| 15093001 | TVs & Video | 15 |
|
||||||
|
| 141 | Video Games & Consoles | 96 |
|
||||||
|
| 26 | Other | 286 |
|
||||||
|
|
||||||
|
#### Location Structure
|
||||||
|
Locations are also hierarchical, with provinces/states under the main "Canada" location:
|
||||||
|
|
||||||
|
| ID | Name | Total Results (iPhone search) |
|
||||||
|
|----|------|------------------------------|
|
||||||
|
| 0 | Canada | - |
|
||||||
|
| 9001 | Québec | 2516 |
|
||||||
|
| 9002 | Nova Scotia | 875 |
|
||||||
|
| 9003 | Alberta | 2317 |
|
||||||
|
| 9004 | Ontario | 12507 |
|
||||||
|
| 9005 | New Brunswick | 118 |
|
||||||
|
| 9006 | Manitoba | 919 |
|
||||||
|
| 9007 | British Columbia | 306 |
|
||||||
|
| 9008 | Newfoundland | 27 |
|
||||||
|
| 9009 | Saskatchewan | 336 |
|
||||||
|
| 9010 | Territories | 7 |
|
||||||
|
| 9011 | Prince Edward Island | 31 |
|
||||||
|
|
||||||
|
#### URL Patterns
|
||||||
|
- Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0`
|
||||||
|
- Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}`
|
||||||
|
- Combined: `/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
|
||||||
|
|
||||||
|
### Pagination
|
||||||
|
- Uses offset-based pagination
|
||||||
|
- 40 results per page
|
||||||
|
- Total count provided in pagination metadata
|
||||||
|
|
||||||
|
## Authentication & User Management
|
||||||
|
- **Authentication System**: OAuth2-based using CIS (Customer Identity Service)
|
||||||
|
- **Identity Provider**: `id.kijiji.ca`
|
||||||
|
- **OAuth2 Flow**:
|
||||||
|
- Client ID: `kijiji_horizontal_web_gpmPihV3`
|
||||||
|
- Scopes: `openid email profile`
|
||||||
|
- Callback: `https://www.kijiji.ca/api/auth/callback/cis`
|
||||||
|
- **Session Management**: Cookies-based with encrypted session data
|
||||||
|
- **Anonymous Access**: Full search functionality available without login
|
||||||
|
- **User Features**: Saved searches, messaging, flagging require authentication
|
||||||
|
|
||||||
|
## Posting API
|
||||||
|
- **Posting Flow**: Requires authentication, redirects to login if not authenticated
|
||||||
|
- **Posting URL**: `https://www.kijiji.ca/p-post-ad.html`
|
||||||
|
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated users
|
||||||
|
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing)
|
||||||
|
|
||||||
|
## GraphQL API Endpoint
|
||||||
|
- **URL**: `https://www.kijiji.ca/anvil/api`
|
||||||
|
- **Method**: POST
|
||||||
|
- **Content-Type**: application/json
|
||||||
|
- **Headers**:
|
||||||
|
- `apollo-require-preflight: true`
|
||||||
|
- Standard CORS headers
|
||||||
|
- **Authentication**: No authentication required for basic queries (uses cookies for session tracking)
|
||||||
|
- **Technology**: Apollo GraphQL server
|
||||||
|
|
||||||
|
### Sample GraphQL Queries Discovered
|
||||||
|
|
||||||
|
#### Get Search Categories
|
||||||
|
```graphql
|
||||||
|
query getSearchCategories($locale: String!) {
|
||||||
|
searchCategories {
|
||||||
|
id
|
||||||
|
localizedName(locale: $locale)
|
||||||
|
parentId
|
||||||
|
__typename
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Variables: `{"locale": "en-CA"}`
|
||||||
|
|
||||||
|
Response includes hierarchical category structure with IDs and localized names.
|
||||||
|
|
||||||
|
#### Get Geocode from IP (fails for current IP)
|
||||||
|
```graphql
|
||||||
|
query GetGeocodeReverseFromIp {
|
||||||
|
geocodeReverseFromIp {
|
||||||
|
city
|
||||||
|
province
|
||||||
|
locationId
|
||||||
|
__typename
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges.
|
||||||
|
|
||||||
|
#### Get Category Path
|
||||||
|
```graphql
|
||||||
|
query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
|
||||||
|
category(id: $categoryId) {
|
||||||
|
id
|
||||||
|
localizedName(locale: $locale)
|
||||||
|
parentId
|
||||||
|
searchSeoUrl(locationId: $locationId)
|
||||||
|
categoryPaths {
|
||||||
|
id
|
||||||
|
localizedName(locale: $locale)
|
||||||
|
parentId
|
||||||
|
searchSeoUrl(locationId: $locationId)
|
||||||
|
__typename
|
||||||
|
}
|
||||||
|
__typename
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}`
|
||||||
|
|
||||||
|
## Latest Findings (2026-01-21)
|
||||||
|
|
||||||
|
### Client-Side GraphQL Queries Observed
|
||||||
|
- **getSearchCategories**: Retrieves category hierarchy for search filters
|
||||||
|
- **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP)
|
||||||
|
|
||||||
|
### GraphQL Schema Insights
|
||||||
|
Testing direct GraphQL queries revealed:
|
||||||
|
- Field "searchResults" does not exist on Query type
|
||||||
|
- Suggested alternatives: "searchResultsPage" or "searchUrl"
|
||||||
|
- This suggests the search functionality may use different GraphQL operations than direct queries
|
||||||
|
|
||||||
|
The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation.
|
||||||
|
|
||||||
|
### Server-Side Rendering Architecture
|
||||||
|
Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for:
|
||||||
|
|
||||||
|
- Initial search results
|
||||||
|
- Pagination navigation
|
||||||
|
- Search result data
|
||||||
|
|
||||||
|
### Network Analysis Findings
|
||||||
|
- GraphQL endpoint: `https://www.kijiji.ca/anvil/api`
|
||||||
|
- Method: POST
|
||||||
|
- Content-Type: application/json
|
||||||
|
- Headers include: `apollo-require-preflight: true`
|
||||||
|
- Cookies required for session tracking
|
||||||
|
|
||||||
|
### Embedded Data Structure
|
||||||
|
Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object. The data includes:
|
||||||
|
|
||||||
|
- Individual ad listings with complete metadata
|
||||||
|
- Pagination information
|
||||||
|
- Filter options and counts
|
||||||
|
- Category/location hierarchies
|
||||||
|
|
||||||
|
### Current Scraper Implementation
|
||||||
|
The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state:
|
||||||
|
|
||||||
|
- Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML
|
||||||
|
- Filters Apollo keys containing "Listing" to find ad data
|
||||||
|
- Extracts `url`, `title`, and other metadata from each listing
|
||||||
|
- Successfully scrapes listings without needing API authentication
|
||||||
|
|
||||||
|
### Authentication Status
|
||||||
|
- **Search functionality**: No authentication required - all search and listing data accessible anonymously
|
||||||
|
- **Posting functionality**: Requires authentication (redirects to login)
|
||||||
|
- **User features**: Saved searches, messaging require authentication
|
||||||
|
- **Rate limiting**: May apply but not observed in anonymous browsing
|
||||||
|
|
||||||
|
### Pagination Implementation
|
||||||
|
- Each page is a separate server-rendered route
|
||||||
|
- URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}`
|
||||||
|
- No client-side pagination API calls
|
||||||
|
- 40 results per page (observed)
|
||||||
|
- Example: `/b-canada/iphone/page-2/k0l0` for page 2 of iPhone search
|
||||||
|
|
||||||
|
## URL Pattern Analysis
|
||||||
|
|
||||||
|
### Search URL Structure
|
||||||
|
`https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}`
|
||||||
|
|
||||||
|
#### Examples Observed:
|
||||||
|
- All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada)
|
||||||
|
- Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones)
|
||||||
|
- With pagination: `/b-canada/iphone/page-2/k0l0`
|
||||||
|
|
||||||
|
#### URL Components:
|
||||||
|
- `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
|
||||||
|
- `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.)
|
||||||
|
- `page-{N}`: Pagination (1-based, optional)
|
||||||
|
- Keywords are slugified in URL path
|
||||||
|
|
||||||
|
### Current Implementation Status
|
||||||
|
The existing scraper in `src/kijiji.ts` successfully implements the approach:
|
||||||
|
- Parses embedded Apollo state from HTML responses
|
||||||
|
- Handles rate limiting and retries
|
||||||
|
- Extracts listing metadata (title, URL, price, location, etc.)
|
||||||
|
- Works without authentication for search operations
|
||||||
|
|
||||||
|
## Listing Details Page
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server.
|
||||||
|
|
||||||
|
### Data Architecture
|
||||||
|
- **Server-Side Rendering**: Each listing page is fully server-rendered with data embedded in HTML
|
||||||
|
- **Embedded Apollo State**: Listing data is stored in `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
|
||||||
|
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API
|
||||||
|
|
||||||
|
### Listing Data Structure
|
||||||
|
The main listing data follows the same pattern as search results:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "1705585530",
|
||||||
|
"title": "We Pay top cash for iPhone 17 pro max, iPhone 17 pro, iPhone Air",
|
||||||
|
"description": "Buying All Brand new Apple iPhones sealed/Unsealed...",
|
||||||
|
"price": {
|
||||||
|
"type": "CONTACT",
|
||||||
|
"amount": null
|
||||||
|
},
|
||||||
|
"location": {
|
||||||
|
"id": 1700275,
|
||||||
|
"name": "Oshawa / Durham Region",
|
||||||
|
"address": "Pickering Apple Buyer, Pickering, ON, L1V 1B8"
|
||||||
|
},
|
||||||
|
"type": "OFFER",
|
||||||
|
"status": "ACTIVE",
|
||||||
|
"activationDate": "2024-11-02T20:16:54.000Z",
|
||||||
|
"endDate": "3000-01-01T00:00:00.000Z",
|
||||||
|
"metrics": {
|
||||||
|
"views": 1720
|
||||||
|
},
|
||||||
|
"posterInfo": {
|
||||||
|
"posterId": "1044934581",
|
||||||
|
"rating": null
|
||||||
|
},
|
||||||
|
"attributes": [
|
||||||
|
{
|
||||||
|
"canonicalName": "forsaleby",
|
||||||
|
"canonicalValues": ["business"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"canonicalName": "phonecarrier",
|
||||||
|
"canonicalValues": ["unlocked"]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Client-Side GraphQL Queries
|
||||||
|
When loading a listing details page, the following GraphQL queries are executed:
|
||||||
|
|
||||||
|
#### 1. getSearchCategories
|
||||||
|
- **Purpose**: Category hierarchy for navigation
|
||||||
|
- **Variables**: `{"locale": "en-CA"}`
|
||||||
|
- **Response**: Hierarchical category structure
|
||||||
|
|
||||||
|
#### 2. getCampaignsForVip
|
||||||
|
- **Purpose**: Advertisement targeting data
|
||||||
|
- **Variables**: `{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
|
||||||
|
- **Response**: Campaign/ads data (usually null)
|
||||||
|
|
||||||
|
#### 3. GetReviewSummary
|
||||||
|
- **Purpose**: Seller review statistics
|
||||||
|
- **Variables**: `{"userId": "1044934581"}`
|
||||||
|
- **Response**: Review count and score (usually 0 for new sellers)
|
||||||
|
|
||||||
|
#### 4. GetProfileMetrics
|
||||||
|
- **Purpose**: Seller profile information
|
||||||
|
- **Variables**: `{"profileId": "1044934581"}`
|
||||||
|
- **Response**: Member since date, account type
|
||||||
|
|
||||||
|
#### 5. GetListingsSimilar
|
||||||
|
- **Purpose**: Similar listings for cross-selling
|
||||||
|
- **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}`
|
||||||
|
- **Response**: Array of similar listings with basic metadata
|
||||||
|
|
||||||
|
#### 6. GetGeocodeReverseFromIp
|
||||||
|
- **Purpose**: Geolocation-based features
|
||||||
|
- **Variables**: `{}`
|
||||||
|
- **Response**: Fails with 404 for most IPs
|
||||||
|
|
||||||
|
### Implementation Status
|
||||||
|
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing details from embedded Apollo state:
|
||||||
|
|
||||||
|
- ✅ Extracts title, description, price, location
|
||||||
|
- ✅ Handles contact-based pricing ("Please Contact")
|
||||||
|
- ✅ Parses creation date, view count, listing status
|
||||||
|
- ✅ Extracts seller information and address
|
||||||
|
- ✅ Works without authentication or API keys
|
||||||
|
|
||||||
|
### Key Findings
|
||||||
|
1. **No Dedicated Listing API**: Unlike search results, there's no separate GraphQL query for individual listing data
|
||||||
|
2. **Complete Data Available**: All listing information is embedded in the initial HTML response
|
||||||
|
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data (reviews, similar listings)
|
||||||
|
4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages
|
||||||
|
|
||||||
|
### Current Scraper Implementation
|
||||||
|
The scraper successfully extracts listing details by:
|
||||||
|
1. Fetching the listing URL HTML
|
||||||
|
2. Parsing embedded `__NEXT_DATA__` Apollo state
|
||||||
|
3. Extracting the `Listing:{id}` object from Apollo cache
|
||||||
|
4. Mapping fields to typed `ListingDetails` interface
|
||||||
|
|
||||||
|
This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches.
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
- Explore posting/authentication APIs (requires user login)
|
||||||
|
- Investigate if GraphQL API can be used for programmatic access with proper authentication
|
||||||
|
- Test rate limiting patterns and optimal scraping strategies
|
||||||
|
- Document additional category and location ID mappings
|
||||||
3
bun.lock
3
bun.lock
@@ -1,10 +1,10 @@
|
|||||||
{
|
{
|
||||||
"lockfileVersion": 1,
|
"lockfileVersion": 1,
|
||||||
|
"configVersion": 0,
|
||||||
"workspaces": {
|
"workspaces": {
|
||||||
"": {
|
"": {
|
||||||
"name": "sone4ka-tok",
|
"name": "sone4ka-tok",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@types/cli-progress": "^3.11.6",
|
|
||||||
"cli-progress": "^3.12.0",
|
"cli-progress": "^3.12.0",
|
||||||
"linkedom": "^0.18.12",
|
"linkedom": "^0.18.12",
|
||||||
"unidecode": "^1.1.0",
|
"unidecode": "^1.1.0",
|
||||||
@@ -13,6 +13,7 @@
|
|||||||
"@anthropic-ai/claude-code": "^2.0.1",
|
"@anthropic-ai/claude-code": "^2.0.1",
|
||||||
"@musistudio/claude-code-router": "^1.0.53",
|
"@musistudio/claude-code-router": "^1.0.53",
|
||||||
"@types/bun": "latest",
|
"@types/bun": "latest",
|
||||||
|
"@types/cli-progress": "^3.11.6",
|
||||||
"@types/unidecode": "^1.1.0",
|
"@types/unidecode": "^1.1.0",
|
||||||
},
|
},
|
||||||
"peerDependencies": {
|
"peerDependencies": {
|
||||||
|
|||||||
3
bunfig.toml
Normal file
3
bunfig.toml
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
[test]
|
||||||
|
# Test configuration
|
||||||
|
preload = ["./test/setup.ts"]
|
||||||
25
opencode.jsonc
Normal file
25
opencode.jsonc
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
{
|
||||||
|
"$schema": "https://opencode.ai/config.json",
|
||||||
|
"mcp": {
|
||||||
|
"chrome-devtools": {
|
||||||
|
"type": "local",
|
||||||
|
"command": [
|
||||||
|
"bunx",
|
||||||
|
"--bun",
|
||||||
|
"chrome-devtools-mcp@latest",
|
||||||
|
"--log-file",
|
||||||
|
"./debug.log",
|
||||||
|
"--headless=false",
|
||||||
|
"--isolated=false",
|
||||||
|
"-e",
|
||||||
|
"/nix/store/lz8ajxhnkkw2llj752bdz41wqr645h9c-google-chrome-dev-146.0.7635.0/bin/google-chrome-unstable",
|
||||||
|
"--ignore-default-chrome-arg='--disable-extensions'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"bun-docs": {
|
||||||
|
"type": "remote",
|
||||||
|
"url": "https://bun.com/docs/mcp",
|
||||||
|
"timeout": 3000
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
18
src/index.ts
18
src/index.ts
@@ -26,8 +26,12 @@ const server = Bun.serve({
|
|||||||
{ status: 400 },
|
{ status: 400 },
|
||||||
);
|
);
|
||||||
|
|
||||||
const items = await fetchKijijiItems(SEARCH_QUERY, 5);
|
const items = await fetchKijijiItems(SEARCH_QUERY, 1, undefined, {}, {
|
||||||
if (!items)
|
includeImages: true,
|
||||||
|
sellerDataDepth: 'detailed',
|
||||||
|
includeClientSideData: false,
|
||||||
|
});
|
||||||
|
if (!items || items.length === 0)
|
||||||
return Response.json(
|
return Response.json(
|
||||||
{ message: "Search didn't return any results!" },
|
{ message: "Search didn't return any results!" },
|
||||||
{ status: 404 },
|
{ status: 404 },
|
||||||
@@ -85,11 +89,13 @@ const server = Bun.serve({
|
|||||||
);
|
);
|
||||||
|
|
||||||
// Parse optional parameters with defaults
|
// Parse optional parameters with defaults
|
||||||
const minPrice = reqUrl.searchParams.get("minPrice")
|
const minPriceParam = reqUrl.searchParams.get("minPrice");
|
||||||
? parseInt(reqUrl.searchParams.get("minPrice")!)
|
const minPrice = minPriceParam
|
||||||
|
? Number.parseInt(minPriceParam, 10)
|
||||||
: undefined;
|
: undefined;
|
||||||
const maxPrice = reqUrl.searchParams.get("maxPrice")
|
const maxPriceParam = reqUrl.searchParams.get("maxPrice");
|
||||||
? parseInt(reqUrl.searchParams.get("maxPrice")!)
|
const maxPrice = maxPriceParam
|
||||||
|
? Number.parseInt(maxPriceParam, 10)
|
||||||
: undefined;
|
: undefined;
|
||||||
const strictMode = reqUrl.searchParams.get("strictMode") === "true";
|
const strictMode = reqUrl.searchParams.get("strictMode") === "true";
|
||||||
const exclusionsParam = reqUrl.searchParams.get("exclusions");
|
const exclusionsParam = reqUrl.searchParams.get("exclusions");
|
||||||
|
|||||||
748
src/kijiji.ts
748
src/kijiji.ts
@@ -26,16 +26,29 @@ interface ApolloListingRoot {
|
|||||||
url?: string;
|
url?: string;
|
||||||
title?: string;
|
title?: string;
|
||||||
description?: string;
|
description?: string;
|
||||||
price?: { amount?: number | string; currency?: string };
|
price?: { amount?: number | string; currency?: string; type?: string };
|
||||||
type?: string;
|
type?: string;
|
||||||
status?: string;
|
status?: string;
|
||||||
activationDate?: string;
|
activationDate?: string;
|
||||||
endDate?: string;
|
endDate?: string;
|
||||||
metrics?: { views?: number | string };
|
metrics?: { views?: number | string };
|
||||||
location?: { address?: string | null };
|
location?: {
|
||||||
|
address?: string | null;
|
||||||
|
id?: number;
|
||||||
|
name?: string;
|
||||||
|
coordinates?: { latitude: number; longitude: number };
|
||||||
|
};
|
||||||
|
imageUrls?: string[];
|
||||||
|
imageCount?: number;
|
||||||
|
categoryId?: number;
|
||||||
|
adSource?: string;
|
||||||
|
flags?: { topAd?: boolean; priceDrop?: boolean };
|
||||||
|
posterInfo?: { posterId?: string; rating?: number };
|
||||||
|
attributes?: Array<{ canonicalName?: string; canonicalValues?: string[] }>;
|
||||||
[k: string]: unknown;
|
[k: string]: unknown;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Keep existing interface for backward compatibility
|
||||||
type ListingDetails = {
|
type ListingDetails = {
|
||||||
url: string;
|
url: string;
|
||||||
title: string;
|
title: string;
|
||||||
@@ -53,10 +66,178 @@ type ListingDetails = {
|
|||||||
address?: string | null;
|
address?: string | null;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// New comprehensive interface for detailed listings
|
||||||
|
interface DetailedListing extends ListingDetails {
|
||||||
|
images: string[];
|
||||||
|
categoryId: number;
|
||||||
|
adSource: string;
|
||||||
|
flags: {
|
||||||
|
topAd: boolean;
|
||||||
|
priceDrop: boolean;
|
||||||
|
};
|
||||||
|
attributes: Record<string, string[]>;
|
||||||
|
location: {
|
||||||
|
id: number;
|
||||||
|
name: string;
|
||||||
|
coordinates?: {
|
||||||
|
latitude: number;
|
||||||
|
longitude: number;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
sellerInfo?: {
|
||||||
|
posterId: string;
|
||||||
|
rating?: number;
|
||||||
|
accountType?: string;
|
||||||
|
memberSince?: string;
|
||||||
|
reviewCount?: number;
|
||||||
|
reviewScore?: number;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Configuration interfaces
|
||||||
|
interface SearchOptions {
|
||||||
|
location?: number | string; // Location ID or name
|
||||||
|
category?: number | string; // Category ID or name
|
||||||
|
keywords?: string;
|
||||||
|
sortBy?: 'relevancy' | 'date' | 'price' | 'distance';
|
||||||
|
sortOrder?: 'desc' | 'asc';
|
||||||
|
maxPages?: number; // Default: 5
|
||||||
|
priceMin?: number;
|
||||||
|
priceMax?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface ListingFetchOptions {
|
||||||
|
includeImages?: boolean; // Default: true
|
||||||
|
sellerDataDepth?: 'basic' | 'detailed' | 'full'; // Default: 'detailed'
|
||||||
|
includeClientSideData?: boolean; // Default: false
|
||||||
|
}
|
||||||
|
|
||||||
|
// ----------------------------- Constants & Mappings -----------------------------
|
||||||
|
|
||||||
|
// Location mappings from KIJIJI.md
|
||||||
|
const LOCATION_MAPPINGS: Record<string, number> = {
|
||||||
|
'canada': 0,
|
||||||
|
'ontario': 9004,
|
||||||
|
'toronto': 1700273,
|
||||||
|
'gta': 1700272,
|
||||||
|
'oshawa': 1700275,
|
||||||
|
'quebec': 9001,
|
||||||
|
'nova scotia': 9002,
|
||||||
|
'alberta': 9003,
|
||||||
|
'new brunswick': 9005,
|
||||||
|
'manitoba': 9006,
|
||||||
|
'british columbia': 9007,
|
||||||
|
'newfoundland': 9008,
|
||||||
|
'saskatchewan': 9009,
|
||||||
|
'territories': 9010,
|
||||||
|
'pei': 9011,
|
||||||
|
'prince edward island': 9011,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Category mappings from KIJIJI.md (Buy & Sell main categories)
|
||||||
|
const CATEGORY_MAPPINGS: Record<string, number> = {
|
||||||
|
'all': 0,
|
||||||
|
'buy-sell': 10,
|
||||||
|
'arts-collectibles': 12,
|
||||||
|
'audio': 767,
|
||||||
|
'baby-items': 253,
|
||||||
|
'bags-luggage': 931,
|
||||||
|
'bikes': 644,
|
||||||
|
'books': 109,
|
||||||
|
'cameras': 103,
|
||||||
|
'cds': 104,
|
||||||
|
'clothing': 274,
|
||||||
|
'computers': 16,
|
||||||
|
'computer-accessories': 128,
|
||||||
|
'electronics': 29659001,
|
||||||
|
'free-stuff': 17220001,
|
||||||
|
'furniture': 235,
|
||||||
|
'garage-sales': 638,
|
||||||
|
'health-special-needs': 140,
|
||||||
|
'hobbies-crafts': 139,
|
||||||
|
'home-appliances': 107,
|
||||||
|
'home-indoor': 717,
|
||||||
|
'home-outdoor': 727,
|
||||||
|
'jewellery': 133,
|
||||||
|
'musical-instruments': 17,
|
||||||
|
'phones': 132,
|
||||||
|
'sporting-goods': 111,
|
||||||
|
'tools': 110,
|
||||||
|
'toys-games': 108,
|
||||||
|
'tvs-video': 15093001,
|
||||||
|
'video-games': 141,
|
||||||
|
'other': 26,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Sort parameter mappings
|
||||||
|
const SORT_MAPPINGS: Record<string, string> = {
|
||||||
|
'relevancy': 'MATCH',
|
||||||
|
'date': 'DATE',
|
||||||
|
'price': 'PRICE',
|
||||||
|
'distance': 'DISTANCE',
|
||||||
|
};
|
||||||
|
|
||||||
|
// ----------------------------- Exports for Testing -----------------------------
|
||||||
|
// Note: These are exported for testing purposes only
|
||||||
|
|
||||||
|
export { resolveLocationId, resolveCategoryId, buildSearchUrl };
|
||||||
|
export { extractApolloState, parseSearch };
|
||||||
|
export { parseDetailedListing };
|
||||||
|
export { HttpError, NetworkError, ParseError, RateLimitError, ValidationError };
|
||||||
|
|
||||||
// ----------------------------- Utilities -----------------------------
|
// ----------------------------- Utilities -----------------------------
|
||||||
|
|
||||||
const SEPS = new Set([" ", "–", "—", "/", ":", ";", ",", ".", "-"]);
|
const SEPS = new Set([" ", "–", "—", "/", ":", ";", ",", ".", "-"]);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Resolve location ID from name or return numeric ID
|
||||||
|
*/
|
||||||
|
function resolveLocationId(location?: number | string): number {
|
||||||
|
if (typeof location === 'number') return location;
|
||||||
|
if (typeof location === 'string') {
|
||||||
|
const normalized = location.toLowerCase().replace(/\s+/g, '-');
|
||||||
|
return LOCATION_MAPPINGS[normalized] ?? 0; // Default to Canada (0)
|
||||||
|
}
|
||||||
|
return 0; // Default to Canada
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Resolve category ID from name or return numeric ID
|
||||||
|
*/
|
||||||
|
function resolveCategoryId(category?: number | string): number {
|
||||||
|
if (typeof category === 'number') return category;
|
||||||
|
if (typeof category === 'string') {
|
||||||
|
const normalized = category.toLowerCase().replace(/\s+/g, '-');
|
||||||
|
return CATEGORY_MAPPINGS[normalized] ?? 0; // Default to all categories
|
||||||
|
}
|
||||||
|
return 0; // Default to all categories
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build search URL with enhanced parameters
|
||||||
|
*/
|
||||||
|
function buildSearchUrl(
|
||||||
|
keywords: string,
|
||||||
|
options: SearchOptions & { page?: number },
|
||||||
|
BASE_URL = "https://www.kijiji.ca"
|
||||||
|
): string {
|
||||||
|
const locationId = resolveLocationId(options.location);
|
||||||
|
const categoryId = resolveCategoryId(options.category);
|
||||||
|
|
||||||
|
const categorySlug = categoryId === 0 ? 'buy-sell' : 'buy-sell'; // Could be enhanced
|
||||||
|
const locationSlug = locationId === 0 ? 'canada' : 'canada'; // Could be enhanced
|
||||||
|
|
||||||
|
let url = `${BASE_URL}/b-${categorySlug}/${locationSlug}/${slugify(keywords)}/k0c${categoryId}l${locationId}`;
|
||||||
|
|
||||||
|
const sortParam = options.sortBy ? `&sort=${SORT_MAPPINGS[options.sortBy]}` : '';
|
||||||
|
const sortOrder = options.sortOrder === 'asc' ? 'ASC' : 'DESC';
|
||||||
|
const pageParam = options.page && options.page > 1 ? `&page=${options.page}` : '';
|
||||||
|
|
||||||
|
url += `?sort=relevancyDesc&view=list${sortParam}&order=${sortOrder}${pageParam}`;
|
||||||
|
|
||||||
|
return url;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Slugifies a string for search
|
* Slugifies a string for search
|
||||||
*/
|
*/
|
||||||
@@ -67,13 +248,14 @@ export function slugify(input: string): string {
|
|||||||
|
|
||||||
for (let i = 0; i < s.length; i++) {
|
for (let i = 0; i < s.length; i++) {
|
||||||
const ch = s[i];
|
const ch = s[i];
|
||||||
const code = ch!.charCodeAt(0);
|
if (!ch) continue;
|
||||||
|
const code = ch.charCodeAt(0);
|
||||||
|
|
||||||
// a-z or 0-9
|
// a-z or 0-9
|
||||||
if ((code >= 97 && code <= 122) || (code >= 48 && code <= 57)) {
|
if ((code >= 97 && code <= 122) || (code >= 48 && code <= 57)) {
|
||||||
out.push(ch!);
|
out.push(ch);
|
||||||
lastHyphen = false;
|
lastHyphen = false;
|
||||||
} else if (SEPS.has(ch!)) {
|
} else if (SEPS.has(ch)) {
|
||||||
if (!lastHyphen) {
|
if (!lastHyphen) {
|
||||||
out.push("-");
|
out.push("-");
|
||||||
lastHyphen = true;
|
lastHyphen = true;
|
||||||
@@ -87,30 +269,33 @@ export function slugify(input: string): string {
|
|||||||
/**
|
/**
|
||||||
* Turns cents to localized currency string.
|
* Turns cents to localized currency string.
|
||||||
*/
|
*/
|
||||||
function formatCentsToCurrency(
|
export function formatCentsToCurrency(
|
||||||
num: number | string | undefined,
|
num: number | string | undefined,
|
||||||
locale = "en-US",
|
locale = "en-US",
|
||||||
): string {
|
): string {
|
||||||
if (num == null) return "";
|
if (num == null) return "";
|
||||||
const cents = typeof num === "string" ? Number.parseInt(num, 10) : num;
|
const cents = typeof num === "string" ? Number.parseInt(num, 10) : num;
|
||||||
if (Number.isNaN(cents)) return "";
|
if (Number.isNaN(cents)) return "";
|
||||||
const dollars = cents / 100;
|
const dollars = cents / 100;
|
||||||
const formatter = new Intl.NumberFormat(locale, {
|
const formatter = new Intl.NumberFormat(locale, {
|
||||||
|
style: 'currency',
|
||||||
|
currency: 'USD',
|
||||||
minimumFractionDigits: 2,
|
minimumFractionDigits: 2,
|
||||||
maximumFractionDigits: 2,
|
maximumFractionDigits: 2,
|
||||||
useGrouping: true,
|
|
||||||
});
|
});
|
||||||
return formatter.format(dollars);
|
return formatter.format(dollars);
|
||||||
}
|
}
|
||||||
|
|
||||||
function isRecord(value: unknown): value is Record<string, unknown> {
|
function isRecord(value: unknown): value is Record<string, unknown> {
|
||||||
return typeof value === "object" && value !== null;
|
return typeof value === "object" && value !== null && !Array.isArray(value);
|
||||||
}
|
}
|
||||||
|
|
||||||
async function delay(ms: number): Promise<void> {
|
async function delay(ms: number): Promise<void> {
|
||||||
await new Promise((resolve) => setTimeout(resolve, ms));
|
await new Promise((resolve) => setTimeout(resolve, ms));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ----------------------------- Error Classes -----------------------------
|
||||||
|
|
||||||
class HttpError extends Error {
|
class HttpError extends Error {
|
||||||
constructor(
|
constructor(
|
||||||
message: string,
|
message: string,
|
||||||
@@ -122,12 +307,52 @@ class HttpError extends Error {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
class NetworkError extends Error {
|
||||||
|
constructor(
|
||||||
|
message: string,
|
||||||
|
public readonly url: string,
|
||||||
|
public readonly cause?: Error,
|
||||||
|
) {
|
||||||
|
super(message);
|
||||||
|
this.name = "NetworkError";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
class ParseError extends Error {
|
||||||
|
constructor(
|
||||||
|
message: string,
|
||||||
|
public readonly data?: unknown,
|
||||||
|
) {
|
||||||
|
super(message);
|
||||||
|
this.name = "ParseError";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
class RateLimitError extends Error {
|
||||||
|
constructor(
|
||||||
|
message: string,
|
||||||
|
public readonly url: string,
|
||||||
|
public readonly resetTime?: number,
|
||||||
|
) {
|
||||||
|
super(message);
|
||||||
|
this.name = "RateLimitError";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
class ValidationError extends Error {
|
||||||
|
constructor(message: string) {
|
||||||
|
super(message);
|
||||||
|
this.name = "ValidationError";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// ----------------------------- HTTP Client -----------------------------
|
// ----------------------------- HTTP Client -----------------------------
|
||||||
|
|
||||||
/**
|
/**
|
||||||
Fetch HTML with a basic retry strategy and simple rate-limit delay between calls.
|
Fetch HTML with enhanced retry strategy and exponential backoff.
|
||||||
- Retries on 429 and 5xx
|
- Retries on 429, 5xx, and network errors
|
||||||
- Respects X-RateLimit-Reset when present (seconds)
|
- Respects X-RateLimit-Reset when present (seconds)
|
||||||
|
- Exponential backoff with jitter
|
||||||
*/
|
*/
|
||||||
async function fetchHtml(
|
async function fetchHtml(
|
||||||
url: string,
|
url: string,
|
||||||
@@ -139,11 +364,13 @@ async function fetchHtml(
|
|||||||
},
|
},
|
||||||
): Promise<HTMLString> {
|
): Promise<HTMLString> {
|
||||||
const maxRetries = opts?.maxRetries ?? 3;
|
const maxRetries = opts?.maxRetries ?? 3;
|
||||||
const retryBaseMs = opts?.retryBaseMs ?? 500;
|
const retryBaseMs = opts?.retryBaseMs ?? 1000;
|
||||||
|
|
||||||
for (let attempt = 0; attempt <= maxRetries; attempt++) {
|
for (let attempt = 0; attempt <= maxRetries; attempt++) {
|
||||||
try {
|
try {
|
||||||
// console.log(`Fetching: `, url);
|
const controller = new AbortController();
|
||||||
|
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout
|
||||||
|
|
||||||
const res = await fetch(url, {
|
const res = await fetch(url, {
|
||||||
method: "GET",
|
method: "GET",
|
||||||
headers: {
|
headers: {
|
||||||
@@ -155,27 +382,40 @@ async function fetchHtml(
|
|||||||
"user-agent":
|
"user-agent":
|
||||||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36",
|
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36",
|
||||||
},
|
},
|
||||||
|
signal: controller.signal,
|
||||||
});
|
});
|
||||||
|
|
||||||
|
clearTimeout(timeoutId);
|
||||||
|
|
||||||
const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining");
|
const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining");
|
||||||
const rateLimitReset = res.headers.get("X-RateLimit-Reset");
|
const rateLimitReset = res.headers.get("X-RateLimit-Reset");
|
||||||
opts?.onRateInfo?.(rateLimitRemaining, rateLimitReset);
|
opts?.onRateInfo?.(rateLimitRemaining, rateLimitReset);
|
||||||
|
|
||||||
if (!res.ok) {
|
if (!res.ok) {
|
||||||
// Respect 429 reset if provided
|
// Handle rate limiting
|
||||||
if (res.status === 429) {
|
if (res.status === 429) {
|
||||||
const resetSeconds = rateLimitReset ? Number(rateLimitReset) : NaN;
|
const resetSeconds = rateLimitReset ? Number(rateLimitReset) : Number.NaN;
|
||||||
const waitMs = Number.isFinite(resetSeconds)
|
const waitMs = Number.isFinite(resetSeconds)
|
||||||
? Math.max(0, resetSeconds * 1000)
|
? Math.max(0, resetSeconds * 1000)
|
||||||
: (attempt + 1) * retryBaseMs;
|
: calculateBackoffDelay(attempt, retryBaseMs);
|
||||||
await delay(waitMs);
|
|
||||||
continue;
|
if (attempt < maxRetries) {
|
||||||
|
await delay(waitMs);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
throw new RateLimitError(
|
||||||
|
`Rate limit exceeded for ${url}`,
|
||||||
|
url,
|
||||||
|
resetSeconds,
|
||||||
|
);
|
||||||
}
|
}
|
||||||
// Retry on 5xx
|
|
||||||
|
// Retry on server errors
|
||||||
if (res.status >= 500 && res.status < 600 && attempt < maxRetries) {
|
if (res.status >= 500 && res.status < 600 && attempt < maxRetries) {
|
||||||
await delay((attempt + 1) * retryBaseMs);
|
await delay(calculateBackoffDelay(attempt, retryBaseMs));
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
throw new HttpError(
|
throw new HttpError(
|
||||||
`Request failed with status ${res.status}`,
|
`Request failed with status ${res.status}`,
|
||||||
res.status,
|
res.status,
|
||||||
@@ -184,22 +424,177 @@ async function fetchHtml(
|
|||||||
}
|
}
|
||||||
|
|
||||||
const html = await res.text();
|
const html = await res.text();
|
||||||
// Respect per-request delay to keep at or under REQUESTS_PER_SECOND
|
|
||||||
|
// Respect per-request delay to maintain rate limiting
|
||||||
await delay(DELAY_MS);
|
await delay(DELAY_MS);
|
||||||
return html;
|
return html;
|
||||||
|
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
if (attempt >= maxRetries) throw err;
|
// Handle different error types
|
||||||
await delay((attempt + 1) * retryBaseMs);
|
if (err instanceof RateLimitError || err instanceof HttpError) {
|
||||||
|
throw err; // Re-throw known errors
|
||||||
|
}
|
||||||
|
|
||||||
|
if (err instanceof Error && err.name === 'AbortError') {
|
||||||
|
if (attempt < maxRetries) {
|
||||||
|
await delay(calculateBackoffDelay(attempt, retryBaseMs));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
throw new NetworkError(`Request timeout for ${url}`, url, err);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Network or other errors
|
||||||
|
if (attempt < maxRetries) {
|
||||||
|
await delay(calculateBackoffDelay(attempt, retryBaseMs));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
throw new NetworkError(
|
||||||
|
`Network error fetching ${url}: ${err instanceof Error ? err.message : String(err)}`,
|
||||||
|
url,
|
||||||
|
err instanceof Error ? err : undefined
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
throw new Error("Exhausted retries without response");
|
throw new NetworkError(`Exhausted retries without response for ${url}`, url);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Calculate exponential backoff delay with jitter
|
||||||
|
*/
|
||||||
|
function calculateBackoffDelay(attempt: number, baseMs: number): number {
|
||||||
|
const exponentialDelay = baseMs * (2 ** attempt);
|
||||||
|
const jitter = Math.random() * 0.1 * exponentialDelay; // 10% jitter
|
||||||
|
return Math.min(exponentialDelay + jitter, 30000); // Cap at 30 seconds
|
||||||
|
}
|
||||||
|
|
||||||
|
// ----------------------------- GraphQL Client -----------------------------
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Fetch additional data via GraphQL API
|
||||||
|
*/
|
||||||
|
async function fetchGraphQLData(
|
||||||
|
query: string,
|
||||||
|
variables: Record<string, unknown>,
|
||||||
|
BASE_URL = "https://www.kijiji.ca"
|
||||||
|
): Promise<unknown> {
|
||||||
|
const endpoint = `${BASE_URL}/anvil/api`;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const response = await fetch(endpoint, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
'apollo-require-preflight': 'true',
|
||||||
|
},
|
||||||
|
body: JSON.stringify({
|
||||||
|
query,
|
||||||
|
variables,
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new HttpError(
|
||||||
|
`GraphQL request failed with status ${response.status}`,
|
||||||
|
response.status,
|
||||||
|
endpoint
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const result = await response.json();
|
||||||
|
|
||||||
|
if (result.errors) {
|
||||||
|
throw new ParseError(`GraphQL errors: ${JSON.stringify(result.errors)}`, result.errors);
|
||||||
|
}
|
||||||
|
|
||||||
|
return result.data;
|
||||||
|
} catch (err) {
|
||||||
|
if (err instanceof HttpError || err instanceof ParseError) {
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
throw new NetworkError(
|
||||||
|
`Failed to fetch GraphQL data: ${err instanceof Error ? err.message : String(err)}`,
|
||||||
|
endpoint,
|
||||||
|
err instanceof Error ? err : undefined
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GraphQL response interfaces
|
||||||
|
interface GraphQLReviewResponse {
|
||||||
|
user?: {
|
||||||
|
reviewSummary?: {
|
||||||
|
count?: number;
|
||||||
|
score?: number;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
interface GraphQLProfileResponse {
|
||||||
|
user?: {
|
||||||
|
memberSince?: string;
|
||||||
|
accountType?: string;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// GraphQL queries from KIJIJI.md
|
||||||
|
const GRAPHQL_QUERIES = {
|
||||||
|
getReviewSummary: `
|
||||||
|
query GetReviewSummary($userId: String!) {
|
||||||
|
user(id: $userId) {
|
||||||
|
reviewSummary {
|
||||||
|
count
|
||||||
|
score
|
||||||
|
__typename
|
||||||
|
}
|
||||||
|
__typename
|
||||||
|
}
|
||||||
|
}
|
||||||
|
`,
|
||||||
|
getProfileMetrics: `
|
||||||
|
query GetProfileMetrics($profileId: String!) {
|
||||||
|
user(id: $profileId) {
|
||||||
|
memberSince
|
||||||
|
accountType
|
||||||
|
__typename
|
||||||
|
}
|
||||||
|
}
|
||||||
|
`,
|
||||||
|
} as const;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Fetch additional seller data via GraphQL
|
||||||
|
*/
|
||||||
|
async function fetchSellerDetails(
|
||||||
|
posterId: string,
|
||||||
|
BASE_URL = "https://www.kijiji.ca"
|
||||||
|
): Promise<{ reviewCount?: number; reviewScore?: number; memberSince?: string; accountType?: string }> {
|
||||||
|
try {
|
||||||
|
const [reviewData, profileData] = await Promise.all([
|
||||||
|
fetchGraphQLData(GRAPHQL_QUERIES.getReviewSummary, { userId: posterId }, BASE_URL),
|
||||||
|
fetchGraphQLData(GRAPHQL_QUERIES.getProfileMetrics, { profileId: posterId }, BASE_URL),
|
||||||
|
]);
|
||||||
|
|
||||||
|
const reviewResponse = reviewData as GraphQLReviewResponse;
|
||||||
|
const profileResponse = profileData as GraphQLProfileResponse;
|
||||||
|
|
||||||
|
return {
|
||||||
|
reviewCount: reviewResponse?.user?.reviewSummary?.count,
|
||||||
|
reviewScore: reviewResponse?.user?.reviewSummary?.score,
|
||||||
|
memberSince: profileResponse?.user?.memberSince,
|
||||||
|
accountType: profileResponse?.user?.accountType,
|
||||||
|
};
|
||||||
|
} catch (err) {
|
||||||
|
// Silently fail for GraphQL errors - not critical for basic functionality
|
||||||
|
console.warn(`Failed to fetch seller details for ${posterId}:`, err instanceof Error ? err.message : String(err));
|
||||||
|
return {};
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ----------------------------- Parsing -----------------------------
|
// ----------------------------- Parsing -----------------------------
|
||||||
|
|
||||||
/**
|
/**
|
||||||
Extracts json.props.pageProps.__APOLLO_STATE__ safely from a Kijiji page HTML.
|
Extracts json.props.pageProps.__APOLLO_STATE__ safely from a Kijiji page HTML.
|
||||||
*/
|
*/
|
||||||
function extractApolloState(htmlString: HTMLString): ApolloRecord | null {
|
function extractApolloState(htmlString: HTMLString): ApolloRecord | null {
|
||||||
const { document } = parseHTML(htmlString);
|
const { document } = parseHTML(htmlString);
|
||||||
@@ -299,7 +694,7 @@ function parseListing(
|
|||||||
listingPrice: amountFormatted
|
listingPrice: amountFormatted
|
||||||
? {
|
? {
|
||||||
amountFormatted,
|
amountFormatted,
|
||||||
cents: Number.isFinite(cents!) ? cents : undefined,
|
cents: cents !== undefined && Number.isFinite(cents) ? cents : undefined,
|
||||||
currency: price?.currency,
|
currency: price?.currency,
|
||||||
}
|
}
|
||||||
: undefined,
|
: undefined,
|
||||||
@@ -307,91 +702,252 @@ function parseListing(
|
|||||||
listingStatus: status,
|
listingStatus: status,
|
||||||
creationDate: activationDate,
|
creationDate: activationDate,
|
||||||
endDate,
|
endDate,
|
||||||
numberOfViews: Number.isFinite(numberOfViews!) ? numberOfViews : undefined,
|
numberOfViews: numberOfViews !== undefined && Number.isFinite(numberOfViews) ? numberOfViews : undefined,
|
||||||
address: location?.address ?? null,
|
address: location?.address ?? null,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse a listing page into a detailed object with all available fields
|
||||||
|
*/
|
||||||
|
async function parseDetailedListing(
|
||||||
|
htmlString: HTMLString,
|
||||||
|
BASE_URL: string,
|
||||||
|
options: ListingFetchOptions = {}
|
||||||
|
): Promise<DetailedListing | null> {
|
||||||
|
const apolloState = extractApolloState(htmlString);
|
||||||
|
if (!apolloState) return null;
|
||||||
|
|
||||||
|
// Find the listing root key
|
||||||
|
const listingKey = Object.keys(apolloState).find((k) =>
|
||||||
|
k.includes("Listing"),
|
||||||
|
);
|
||||||
|
if (!listingKey) return null;
|
||||||
|
|
||||||
|
const root = apolloState[listingKey];
|
||||||
|
if (!isRecord(root)) return null;
|
||||||
|
|
||||||
|
const {
|
||||||
|
url,
|
||||||
|
title,
|
||||||
|
description,
|
||||||
|
price,
|
||||||
|
type,
|
||||||
|
status,
|
||||||
|
activationDate,
|
||||||
|
endDate,
|
||||||
|
metrics,
|
||||||
|
location,
|
||||||
|
imageUrls,
|
||||||
|
imageCount,
|
||||||
|
categoryId,
|
||||||
|
adSource,
|
||||||
|
flags,
|
||||||
|
posterInfo,
|
||||||
|
attributes,
|
||||||
|
} = root as ApolloListingRoot;
|
||||||
|
|
||||||
|
const cents = price?.amount != null ? Number(price.amount) : undefined;
|
||||||
|
const amountFormatted = formatCentsToCurrency(cents);
|
||||||
|
|
||||||
|
const numberOfViews =
|
||||||
|
metrics?.views != null ? Number(metrics.views) : undefined;
|
||||||
|
|
||||||
|
const listingUrl =
|
||||||
|
typeof url === "string"
|
||||||
|
? url.startsWith("http")
|
||||||
|
? url
|
||||||
|
: `${BASE_URL}${url}`
|
||||||
|
: "";
|
||||||
|
|
||||||
|
if (!listingUrl || !title) return null;
|
||||||
|
|
||||||
|
// Only include fixed-price listings
|
||||||
|
if (!amountFormatted || cents === undefined) return null;
|
||||||
|
|
||||||
|
// Extract images if requested
|
||||||
|
const images = options.includeImages !== false && Array.isArray(imageUrls)
|
||||||
|
? imageUrls.filter((url): url is string => typeof url === 'string')
|
||||||
|
: [];
|
||||||
|
|
||||||
|
// Extract attributes as key-value pairs
|
||||||
|
const attributeMap: Record<string, string[]> = {};
|
||||||
|
if (Array.isArray(attributes)) {
|
||||||
|
for (const attr of attributes) {
|
||||||
|
if (attr?.canonicalName && Array.isArray(attr.canonicalValues)) {
|
||||||
|
attributeMap[attr.canonicalName] = attr.canonicalValues;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract seller info based on depth setting
|
||||||
|
let sellerInfo: DetailedListing['sellerInfo'];
|
||||||
|
const depth = options.sellerDataDepth ?? 'detailed';
|
||||||
|
|
||||||
|
if (posterInfo?.posterId) {
|
||||||
|
sellerInfo = {
|
||||||
|
posterId: posterInfo.posterId,
|
||||||
|
rating: typeof posterInfo.rating === 'number' ? posterInfo.rating : undefined,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Add more detailed info if requested and client-side data is enabled
|
||||||
|
if ((depth === 'detailed' || depth === 'full') && options.includeClientSideData) {
|
||||||
|
try {
|
||||||
|
const additionalData = await fetchSellerDetails(posterInfo.posterId, BASE_URL);
|
||||||
|
sellerInfo = {
|
||||||
|
...sellerInfo,
|
||||||
|
...additionalData,
|
||||||
|
};
|
||||||
|
} catch (err) {
|
||||||
|
// Silently fail - GraphQL data is optional
|
||||||
|
console.warn(`Failed to fetch additional seller data for ${posterInfo.posterId}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
url: listingUrl,
|
||||||
|
title,
|
||||||
|
description,
|
||||||
|
listingPrice: {
|
||||||
|
amountFormatted,
|
||||||
|
cents,
|
||||||
|
currency: price?.currency,
|
||||||
|
},
|
||||||
|
listingType: type,
|
||||||
|
listingStatus: status,
|
||||||
|
creationDate: activationDate,
|
||||||
|
endDate,
|
||||||
|
numberOfViews: numberOfViews !== undefined && Number.isFinite(numberOfViews) ? numberOfViews : undefined,
|
||||||
|
address: location?.address ?? null,
|
||||||
|
images,
|
||||||
|
categoryId: typeof categoryId === 'number' ? categoryId : 0,
|
||||||
|
adSource: typeof adSource === 'string' ? adSource : 'UNKNOWN',
|
||||||
|
flags: {
|
||||||
|
topAd: flags?.topAd === true,
|
||||||
|
priceDrop: flags?.priceDrop === true,
|
||||||
|
},
|
||||||
|
attributes: attributeMap,
|
||||||
|
location: {
|
||||||
|
id: typeof location?.id === 'number' ? location.id : 0,
|
||||||
|
name: typeof location?.name === 'string' ? location.name : 'Unknown',
|
||||||
|
coordinates: location?.coordinates ? {
|
||||||
|
latitude: location.coordinates.latitude,
|
||||||
|
longitude: location.coordinates.longitude,
|
||||||
|
} : undefined,
|
||||||
|
},
|
||||||
|
sellerInfo,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
// ----------------------------- Main -----------------------------
|
// ----------------------------- Main -----------------------------
|
||||||
|
|
||||||
export default async function fetchKijijiItems(
|
export default async function fetchKijijiItems(
|
||||||
SEARCH_QUERY: string,
|
SEARCH_QUERY: string,
|
||||||
REQUESTS_PER_SECOND = 1,
|
REQUESTS_PER_SECOND = 1,
|
||||||
BASE_URL = "https://www.kijiji.ca",
|
BASE_URL = "https://www.kijiji.ca",
|
||||||
|
searchOptions: SearchOptions = {},
|
||||||
|
listingOptions: ListingFetchOptions = {},
|
||||||
) {
|
) {
|
||||||
const DELAY_MS = Math.max(1, Math.floor(1000 / REQUESTS_PER_SECOND));
|
const DELAY_MS = Math.max(1, Math.floor(1000 / REQUESTS_PER_SECOND));
|
||||||
|
|
||||||
const searchUrl = `${BASE_URL}/b-gta-greater-toronto-area/${slugify(SEARCH_QUERY)}/k0l1700272?sort=relevancyDesc&view=list`;
|
// Set defaults for configuration
|
||||||
|
const finalSearchOptions: Required<SearchOptions> = {
|
||||||
|
location: searchOptions.location ?? 1700272, // Default to GTA
|
||||||
|
category: searchOptions.category ?? 0, // Default to all categories
|
||||||
|
keywords: searchOptions.keywords ?? SEARCH_QUERY,
|
||||||
|
sortBy: searchOptions.sortBy ?? 'relevancy',
|
||||||
|
sortOrder: searchOptions.sortOrder ?? 'desc',
|
||||||
|
maxPages: searchOptions.maxPages ?? 5, // Default to 5 pages
|
||||||
|
priceMin: searchOptions.priceMin,
|
||||||
|
priceMax: searchOptions.priceMax,
|
||||||
|
};
|
||||||
|
|
||||||
console.log(`Fetching search: ${searchUrl}`);
|
const finalListingOptions: Required<ListingFetchOptions> = {
|
||||||
const searchHtml = await fetchHtml(searchUrl, DELAY_MS, {
|
includeImages: listingOptions.includeImages ?? true,
|
||||||
onRateInfo: (remaining, reset) => {
|
sellerDataDepth: listingOptions.sellerDataDepth ?? 'detailed',
|
||||||
if (remaining && reset) {
|
includeClientSideData: listingOptions.includeClientSideData ?? false,
|
||||||
console.log(
|
};
|
||||||
"\n" +
|
|
||||||
`Search - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
|
const allListings: DetailedListing[] = [];
|
||||||
);
|
const seenUrls = new Set<string>();
|
||||||
|
|
||||||
|
// Fetch multiple pages
|
||||||
|
for (let page = 1; page <= finalSearchOptions.maxPages; page++) {
|
||||||
|
const searchUrl = buildSearchUrl(finalSearchOptions.keywords, {
|
||||||
|
...finalSearchOptions,
|
||||||
|
// Add page parameter for pagination
|
||||||
|
...(page > 1 && { page }),
|
||||||
|
}, BASE_URL);
|
||||||
|
|
||||||
|
console.log(`Fetching search page ${page}: ${searchUrl}`);
|
||||||
|
const searchHtml = await fetchHtml(searchUrl, DELAY_MS, {
|
||||||
|
onRateInfo: (remaining, reset) => {
|
||||||
|
if (remaining && reset) {
|
||||||
|
console.log(`\nSearch - Rate limit remaining: ${remaining}, reset in: ${reset}s`);
|
||||||
|
}
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const searchResults = parseSearch(searchHtml, BASE_URL);
|
||||||
|
if (searchResults.length === 0) {
|
||||||
|
console.log(`No more results found on page ${page}. Stopping pagination.`);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Deduplicate links across pages
|
||||||
|
const newListingLinks = searchResults
|
||||||
|
.map((r) => r.listingLink)
|
||||||
|
.filter((link) => !seenUrls.has(link));
|
||||||
|
|
||||||
|
for (const link of newListingLinks) {
|
||||||
|
seenUrls.add(link);
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`\nFound ${newListingLinks.length} new listing links on page ${page}. Total unique: ${seenUrls.size}`);
|
||||||
|
|
||||||
|
// Fetch details for this page's listings
|
||||||
|
const progressBar = new cliProgress.SingleBar(
|
||||||
|
{},
|
||||||
|
cliProgress.Presets.shades_classic,
|
||||||
|
);
|
||||||
|
const totalProgress = newListingLinks.length;
|
||||||
|
let currentProgress = 0;
|
||||||
|
progressBar.start(totalProgress, currentProgress);
|
||||||
|
|
||||||
|
for (const link of newListingLinks) {
|
||||||
|
try {
|
||||||
|
const html = await fetchHtml(link, DELAY_MS, {
|
||||||
|
onRateInfo: (remaining, reset) => {
|
||||||
|
if (remaining && reset) {
|
||||||
|
console.log(`\nItem - Rate limit remaining: ${remaining}, reset in: ${reset}s`);
|
||||||
|
}
|
||||||
|
},
|
||||||
|
});
|
||||||
|
const parsed = await parseDetailedListing(html, BASE_URL, finalListingOptions);
|
||||||
|
if (parsed) {
|
||||||
|
allListings.push(parsed);
|
||||||
|
}
|
||||||
|
} catch (err) {
|
||||||
|
if (err instanceof HttpError) {
|
||||||
|
console.error(`\nFailed to fetch ${link}\n - ${err.status} ${err.message}`);
|
||||||
|
} else {
|
||||||
|
console.error(`\nFailed to fetch ${link}\n - ${String((err as Error)?.message || err)}`);
|
||||||
|
}
|
||||||
|
} finally {
|
||||||
|
currentProgress++;
|
||||||
|
progressBar.update(currentProgress);
|
||||||
}
|
}
|
||||||
},
|
}
|
||||||
});
|
|
||||||
|
|
||||||
const searchResults = parseSearch(searchHtml, BASE_URL);
|
progressBar.stop();
|
||||||
if (searchResults.length === 0) {
|
|
||||||
console.warn("No search results parsed from page.");
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Deduplicate links
|
// If we got fewer results than expected (40 per page), we've reached the end
|
||||||
const listingLinks = Array.from(
|
if (searchResults.length < 40) {
|
||||||
new Set(searchResults.map((r) => r.listingLink)),
|
break;
|
||||||
);
|
|
||||||
|
|
||||||
console.log(
|
|
||||||
"\n" + `Found ${listingLinks.length} listing links. Fetching details...`,
|
|
||||||
);
|
|
||||||
|
|
||||||
const progressBar = new cliProgress.SingleBar(
|
|
||||||
{},
|
|
||||||
cliProgress.Presets.shades_classic,
|
|
||||||
);
|
|
||||||
const totalProgress = listingLinks.length;
|
|
||||||
let currentProgress = 0;
|
|
||||||
progressBar.start(totalProgress, currentProgress);
|
|
||||||
|
|
||||||
const items: ListingDetails[] = [];
|
|
||||||
for (const link of listingLinks) {
|
|
||||||
try {
|
|
||||||
const html = await fetchHtml(link, DELAY_MS, {
|
|
||||||
onRateInfo: (remaining, reset) => {
|
|
||||||
if (remaining && reset) {
|
|
||||||
console.log(
|
|
||||||
"\n" +
|
|
||||||
`Item - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
|
|
||||||
);
|
|
||||||
}
|
|
||||||
},
|
|
||||||
});
|
|
||||||
const parsed = parseListing(html, BASE_URL);
|
|
||||||
if (parsed) {
|
|
||||||
if (parsed.listingPrice?.cents) items.push(parsed);
|
|
||||||
}
|
|
||||||
} catch (err) {
|
|
||||||
if (err instanceof HttpError) {
|
|
||||||
console.error(
|
|
||||||
"\n" + `Failed to fetch ${link}\n - ${err.status} ${err.message}`,
|
|
||||||
);
|
|
||||||
} else {
|
|
||||||
console.error(
|
|
||||||
"\n" +
|
|
||||||
`Failed to fetch ${link}\n - ${String((err as Error)?.message || err)}`,
|
|
||||||
);
|
|
||||||
}
|
|
||||||
} finally {
|
|
||||||
currentProgress++;
|
|
||||||
progressBar.update(currentProgress);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log("\n" + `Parsed ${items.length} listings.`);
|
console.log(`\nParsed ${allListings.length} detailed listings.`);
|
||||||
return items;
|
return allListings;
|
||||||
}
|
}
|
||||||
|
|||||||
162
test/kijiji-core.test.ts
Normal file
162
test/kijiji-core.test.ts
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
import { describe, test, expect } from "bun:test";
|
||||||
|
import {
|
||||||
|
resolveLocationId,
|
||||||
|
resolveCategoryId,
|
||||||
|
buildSearchUrl,
|
||||||
|
HttpError,
|
||||||
|
NetworkError,
|
||||||
|
ParseError,
|
||||||
|
RateLimitError,
|
||||||
|
ValidationError
|
||||||
|
} from "../src/kijiji";
|
||||||
|
|
||||||
|
describe("Location and Category Resolution", () => {
|
||||||
|
describe("resolveLocationId", () => {
|
||||||
|
test("should return numeric IDs as-is", () => {
|
||||||
|
expect(resolveLocationId(1700272)).toBe(1700272);
|
||||||
|
expect(resolveLocationId(0)).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should resolve string location names", () => {
|
||||||
|
expect(resolveLocationId("canada")).toBe(0);
|
||||||
|
expect(resolveLocationId("ontario")).toBe(9004);
|
||||||
|
expect(resolveLocationId("toronto")).toBe(1700273);
|
||||||
|
expect(resolveLocationId("gta")).toBe(1700272);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle case insensitive matching", () => {
|
||||||
|
expect(resolveLocationId("Canada")).toBe(0);
|
||||||
|
expect(resolveLocationId("ONTARIO")).toBe(9004);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should default to Canada for unknown locations", () => {
|
||||||
|
expect(resolveLocationId("unknown")).toBe(0);
|
||||||
|
expect(resolveLocationId("")).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle undefined input", () => {
|
||||||
|
expect(resolveLocationId(undefined)).toBe(0);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("resolveCategoryId", () => {
|
||||||
|
test("should return numeric IDs as-is", () => {
|
||||||
|
expect(resolveCategoryId(132)).toBe(132);
|
||||||
|
expect(resolveCategoryId(0)).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should resolve string category names", () => {
|
||||||
|
expect(resolveCategoryId("all")).toBe(0);
|
||||||
|
expect(resolveCategoryId("phones")).toBe(132);
|
||||||
|
expect(resolveCategoryId("electronics")).toBe(29659001);
|
||||||
|
expect(resolveCategoryId("buy-sell")).toBe(10);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle case insensitive matching", () => {
|
||||||
|
expect(resolveCategoryId("All")).toBe(0);
|
||||||
|
expect(resolveCategoryId("PHONES")).toBe(132);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should default to all categories for unknown categories", () => {
|
||||||
|
expect(resolveCategoryId("unknown")).toBe(0);
|
||||||
|
expect(resolveCategoryId("")).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle undefined input", () => {
|
||||||
|
expect(resolveCategoryId(undefined)).toBe(0);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("URL Construction", () => {
|
||||||
|
describe("buildSearchUrl", () => {
|
||||||
|
test("should build basic search URL", () => {
|
||||||
|
const url = buildSearchUrl("iphone", {
|
||||||
|
location: 1700272,
|
||||||
|
category: 132,
|
||||||
|
sortBy: 'relevancy',
|
||||||
|
sortOrder: 'desc',
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(url).toContain("b-buy-sell/canada/iphone/k0c132l1700272");
|
||||||
|
expect(url).toContain("sort=relevancyDesc");
|
||||||
|
expect(url).toContain("order=DESC");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle pagination", () => {
|
||||||
|
const url = buildSearchUrl("iphone", {
|
||||||
|
location: 1700272,
|
||||||
|
category: 132,
|
||||||
|
page: 2,
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(url).toContain("&page=2");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle different sort options", () => {
|
||||||
|
const dateUrl = buildSearchUrl("iphone", {
|
||||||
|
sortBy: 'date',
|
||||||
|
sortOrder: 'asc',
|
||||||
|
});
|
||||||
|
expect(dateUrl).toContain("sort=DATE");
|
||||||
|
expect(dateUrl).toContain("order=ASC");
|
||||||
|
|
||||||
|
const priceUrl = buildSearchUrl("iphone", {
|
||||||
|
sortBy: 'price',
|
||||||
|
sortOrder: 'desc',
|
||||||
|
});
|
||||||
|
expect(priceUrl).toContain("sort=PRICE");
|
||||||
|
expect(priceUrl).toContain("order=DESC");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle string location/category inputs", () => {
|
||||||
|
const url = buildSearchUrl("iphone", {
|
||||||
|
location: "toronto",
|
||||||
|
category: "phones",
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(url).toContain("k0c132l1700273"); // phones + toronto
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("Error Classes", () => {
|
||||||
|
test("HttpError should store status and URL", () => {
|
||||||
|
const error = new HttpError("Not found", 404, "https://example.com");
|
||||||
|
expect(error.message).toBe("Not found");
|
||||||
|
expect(error.status).toBe(404);
|
||||||
|
expect(error.url).toBe("https://example.com");
|
||||||
|
expect(error.name).toBe("HttpError");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("NetworkError should store URL and cause", () => {
|
||||||
|
const cause = new Error("Connection failed");
|
||||||
|
const error = new NetworkError("Network error", "https://example.com", cause);
|
||||||
|
expect(error.message).toBe("Network error");
|
||||||
|
expect(error.url).toBe("https://example.com");
|
||||||
|
expect(error.cause).toBe(cause);
|
||||||
|
expect(error.name).toBe("NetworkError");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("ParseError should store data", () => {
|
||||||
|
const data = { invalid: "json" };
|
||||||
|
const error = new ParseError("Invalid JSON", data);
|
||||||
|
expect(error.message).toBe("Invalid JSON");
|
||||||
|
expect(error.data).toBe(data);
|
||||||
|
expect(error.name).toBe("ParseError");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("RateLimitError should store URL and reset time", () => {
|
||||||
|
const error = new RateLimitError("Rate limited", "https://example.com", 60);
|
||||||
|
expect(error.message).toBe("Rate limited");
|
||||||
|
expect(error.url).toBe("https://example.com");
|
||||||
|
expect(error.resetTime).toBe(60);
|
||||||
|
expect(error.name).toBe("RateLimitError");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("ValidationError should work without field", () => {
|
||||||
|
const error = new ValidationError("Invalid value");
|
||||||
|
expect(error.message).toBe("Invalid value");
|
||||||
|
expect(error.name).toBe("ValidationError");
|
||||||
|
});
|
||||||
|
});
|
||||||
337
test/kijiji-integration.test.ts
Normal file
337
test/kijiji-integration.test.ts
Normal file
@@ -0,0 +1,337 @@
|
|||||||
|
import { describe, test, expect, beforeEach, afterEach, mock } from "bun:test";
|
||||||
|
import { extractApolloState, parseSearch, parseDetailedListing } from "../src/kijiji";
|
||||||
|
|
||||||
|
// Mock fetch globally
|
||||||
|
const originalFetch = global.fetch;
|
||||||
|
|
||||||
|
describe("HTML Parsing Integration", () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
// Mock fetch for all tests
|
||||||
|
global.fetch = mock(() => {
|
||||||
|
throw new Error("fetch should be mocked in individual tests");
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
global.fetch = originalFetch;
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("extractApolloState", () => {
|
||||||
|
test("should extract Apollo state from valid HTML", () => {
|
||||||
|
const mockHtml = '<html><head><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"__APOLLO_STATE__":{"ROOT_QUERY":{"test":"value"}}}}}</script></head></html>';
|
||||||
|
|
||||||
|
const result = extractApolloState(mockHtml);
|
||||||
|
expect(result).toEqual({
|
||||||
|
ROOT_QUERY: { test: "value" }
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should return null for HTML without Apollo state", () => {
|
||||||
|
const mockHtml = '<html><body>No data here</body></html>';
|
||||||
|
const result = extractApolloState(mockHtml);
|
||||||
|
expect(result).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should return null for malformed JSON", () => {
|
||||||
|
const mockHtml = '<html><script id="__NEXT_DATA__" type="application/json">{"invalid": json}</script></html>';
|
||||||
|
|
||||||
|
const result = extractApolloState(mockHtml);
|
||||||
|
expect(result).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle missing __NEXT_DATA__ element", () => {
|
||||||
|
const mockHtml = '<html><body><div>Content</div></body></html>';
|
||||||
|
const result = extractApolloState(mockHtml);
|
||||||
|
expect(result).toBeNull();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("parseSearch", () => {
|
||||||
|
test("should parse search results from HTML", () => {
|
||||||
|
const mockHtml = `
|
||||||
|
<html>
|
||||||
|
<script id="__NEXT_DATA__" type="application/json">
|
||||||
|
${JSON.stringify({
|
||||||
|
props: {
|
||||||
|
pageProps: {
|
||||||
|
__APOLLO_STATE__: {
|
||||||
|
"Listing:123": {
|
||||||
|
url: "/v-iphone/k0l0",
|
||||||
|
title: "iPhone 13 Pro",
|
||||||
|
},
|
||||||
|
"Listing:456": {
|
||||||
|
url: "/v-samsung/k0l0",
|
||||||
|
title: "Samsung Galaxy",
|
||||||
|
},
|
||||||
|
"ROOT_QUERY": { test: "value" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})}
|
||||||
|
</script>
|
||||||
|
</html>
|
||||||
|
`;
|
||||||
|
|
||||||
|
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
|
||||||
|
expect(results).toHaveLength(2);
|
||||||
|
expect(results[0]).toEqual({
|
||||||
|
name: "iPhone 13 Pro",
|
||||||
|
listingLink: "https://www.kijiji.ca/v-iphone/k0l0"
|
||||||
|
});
|
||||||
|
expect(results[1]).toEqual({
|
||||||
|
name: "Samsung Galaxy",
|
||||||
|
listingLink: "https://www.kijiji.ca/v-samsung/k0l0"
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle absolute URLs", () => {
|
||||||
|
const mockHtml = `
|
||||||
|
<html>
|
||||||
|
<script id="__NEXT_DATA__" type="application/json">
|
||||||
|
${JSON.stringify({
|
||||||
|
props: {
|
||||||
|
pageProps: {
|
||||||
|
__APOLLO_STATE__: {
|
||||||
|
"Listing:123": {
|
||||||
|
url: "https://www.kijiji.ca/v-iphone/k0l0",
|
||||||
|
title: "iPhone 13 Pro",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})}
|
||||||
|
</script>
|
||||||
|
</html>
|
||||||
|
`;
|
||||||
|
|
||||||
|
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
|
||||||
|
expect(results[0].listingLink).toBe("https://www.kijiji.ca/v-iphone/k0l0");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should filter out invalid listings", () => {
|
||||||
|
const mockHtml = `
|
||||||
|
<html>
|
||||||
|
<script id="__NEXT_DATA__" type="application/json">
|
||||||
|
${JSON.stringify({
|
||||||
|
props: {
|
||||||
|
pageProps: {
|
||||||
|
__APOLLO_STATE__: {
|
||||||
|
"Listing:123": {
|
||||||
|
url: "/v-iphone/k0l0",
|
||||||
|
title: "iPhone 13 Pro",
|
||||||
|
},
|
||||||
|
"Listing:456": {
|
||||||
|
url: "/v-samsung/k0l0",
|
||||||
|
// Missing title
|
||||||
|
},
|
||||||
|
"Other:789": {
|
||||||
|
url: "/v-other/k0l0",
|
||||||
|
title: "Other Item",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})}
|
||||||
|
</script>
|
||||||
|
</html>
|
||||||
|
`;
|
||||||
|
|
||||||
|
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
|
||||||
|
expect(results).toHaveLength(1);
|
||||||
|
expect(results[0].name).toBe("iPhone 13 Pro");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should return empty array for invalid HTML", () => {
|
||||||
|
const results = parseSearch("<html><body>Invalid</body></html>", "https://www.kijiji.ca");
|
||||||
|
expect(results).toEqual([]);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("parseDetailedListing", () => {
|
||||||
|
test("should parse detailed listing with all fields", async () => {
|
||||||
|
const mockHtml = `
|
||||||
|
<html>
|
||||||
|
<script id="__NEXT_DATA__" type="application/json">
|
||||||
|
${JSON.stringify({
|
||||||
|
props: {
|
||||||
|
pageProps: {
|
||||||
|
__APOLLO_STATE__: {
|
||||||
|
"Listing:123": {
|
||||||
|
url: "/v-iphone-13-pro/k0l0",
|
||||||
|
title: "iPhone 13 Pro 256GB",
|
||||||
|
description: "Excellent condition iPhone 13 Pro",
|
||||||
|
price: {
|
||||||
|
amount: 80000,
|
||||||
|
currency: "CAD",
|
||||||
|
type: "FIXED"
|
||||||
|
},
|
||||||
|
type: "OFFER",
|
||||||
|
status: "ACTIVE",
|
||||||
|
activationDate: "2024-01-15T10:00:00.000Z",
|
||||||
|
endDate: "2025-01-15T10:00:00.000Z",
|
||||||
|
metrics: { views: 150 },
|
||||||
|
location: {
|
||||||
|
address: "Toronto, ON",
|
||||||
|
id: 1700273,
|
||||||
|
name: "Toronto",
|
||||||
|
coordinates: {
|
||||||
|
latitude: 43.6532,
|
||||||
|
longitude: -79.3832
|
||||||
|
}
|
||||||
|
},
|
||||||
|
imageUrls: [
|
||||||
|
"https://media.kijiji.ca/api/v1/image1.jpg",
|
||||||
|
"https://media.kijiji.ca/api/v1/image2.jpg"
|
||||||
|
],
|
||||||
|
imageCount: 2,
|
||||||
|
categoryId: 132,
|
||||||
|
adSource: "ORGANIC",
|
||||||
|
flags: {
|
||||||
|
topAd: false,
|
||||||
|
priceDrop: true
|
||||||
|
},
|
||||||
|
posterInfo: {
|
||||||
|
posterId: "user123",
|
||||||
|
rating: 4.8
|
||||||
|
},
|
||||||
|
attributes: [
|
||||||
|
{ canonicalName: "forsaleby", canonicalValues: ["ownr"] },
|
||||||
|
{ canonicalName: "phonecarrier", canonicalValues: ["unlocked"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})}
|
||||||
|
</script>
|
||||||
|
</html>
|
||||||
|
`;
|
||||||
|
|
||||||
|
const result = await parseDetailedListing(mockHtml, "https://www.kijiji.ca");
|
||||||
|
expect(result).toEqual({
|
||||||
|
url: "https://www.kijiji.ca/v-iphone-13-pro/k0l0",
|
||||||
|
title: "iPhone 13 Pro 256GB",
|
||||||
|
description: "Excellent condition iPhone 13 Pro",
|
||||||
|
listingPrice: {
|
||||||
|
amountFormatted: "$800.00",
|
||||||
|
cents: 80000,
|
||||||
|
currency: "CAD"
|
||||||
|
},
|
||||||
|
listingType: "OFFER",
|
||||||
|
listingStatus: "ACTIVE",
|
||||||
|
creationDate: "2024-01-15T10:00:00.000Z",
|
||||||
|
endDate: "2025-01-15T10:00:00.000Z",
|
||||||
|
numberOfViews: 150,
|
||||||
|
address: "Toronto, ON",
|
||||||
|
images: [
|
||||||
|
"https://media.kijiji.ca/api/v1/image1.jpg",
|
||||||
|
"https://media.kijiji.ca/api/v1/image2.jpg"
|
||||||
|
],
|
||||||
|
categoryId: 132,
|
||||||
|
adSource: "ORGANIC",
|
||||||
|
flags: {
|
||||||
|
topAd: false,
|
||||||
|
priceDrop: true
|
||||||
|
},
|
||||||
|
attributes: {
|
||||||
|
forsaleby: ["ownr"],
|
||||||
|
phonecarrier: ["unlocked"]
|
||||||
|
},
|
||||||
|
location: {
|
||||||
|
id: 1700273,
|
||||||
|
name: "Toronto",
|
||||||
|
coordinates: {
|
||||||
|
latitude: 43.6532,
|
||||||
|
longitude: -79.3832
|
||||||
|
}
|
||||||
|
},
|
||||||
|
sellerInfo: {
|
||||||
|
posterId: "user123",
|
||||||
|
rating: 4.8
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should return null for contact-based pricing", async () => {
|
||||||
|
const mockHtml = `
|
||||||
|
<html>
|
||||||
|
<script id="__NEXT_DATA__" type="application/json">
|
||||||
|
${JSON.stringify({
|
||||||
|
props: {
|
||||||
|
pageProps: {
|
||||||
|
__APOLLO_STATE__: {
|
||||||
|
"Listing:123": {
|
||||||
|
url: "/v-iphone/k0l0",
|
||||||
|
title: "iPhone for Sale",
|
||||||
|
price: {
|
||||||
|
type: "CONTACT",
|
||||||
|
amount: null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})}
|
||||||
|
</script>
|
||||||
|
</html>
|
||||||
|
`;
|
||||||
|
|
||||||
|
const result = await parseDetailedListing(mockHtml, "https://www.kijiji.ca");
|
||||||
|
expect(result).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle missing optional fields", async () => {
|
||||||
|
const mockHtml = `
|
||||||
|
<html>
|
||||||
|
<script id="__NEXT_DATA__" type="application/json">
|
||||||
|
${JSON.stringify({
|
||||||
|
props: {
|
||||||
|
pageProps: {
|
||||||
|
__APOLLO_STATE__: {
|
||||||
|
"Listing:123": {
|
||||||
|
url: "/v-iphone/k0l0",
|
||||||
|
title: "iPhone 13",
|
||||||
|
price: { amount: 50000 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})}
|
||||||
|
</script>
|
||||||
|
</html>
|
||||||
|
`;
|
||||||
|
|
||||||
|
const result = await parseDetailedListing(mockHtml, "https://www.kijiji.ca");
|
||||||
|
expect(result).toEqual({
|
||||||
|
url: "https://www.kijiji.ca/v-iphone/k0l0",
|
||||||
|
title: "iPhone 13",
|
||||||
|
description: undefined,
|
||||||
|
listingPrice: {
|
||||||
|
amountFormatted: "$500.00",
|
||||||
|
cents: 50000,
|
||||||
|
currency: undefined
|
||||||
|
},
|
||||||
|
listingType: undefined,
|
||||||
|
listingStatus: undefined,
|
||||||
|
creationDate: undefined,
|
||||||
|
endDate: undefined,
|
||||||
|
numberOfViews: undefined,
|
||||||
|
address: null,
|
||||||
|
images: [],
|
||||||
|
categoryId: 0,
|
||||||
|
adSource: "UNKNOWN",
|
||||||
|
flags: {
|
||||||
|
topAd: false,
|
||||||
|
priceDrop: false
|
||||||
|
},
|
||||||
|
attributes: {},
|
||||||
|
location: {
|
||||||
|
id: 0,
|
||||||
|
name: "Unknown",
|
||||||
|
coordinates: undefined
|
||||||
|
},
|
||||||
|
sellerInfo: undefined
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
54
test/kijiji-utils.test.ts
Normal file
54
test/kijiji-utils.test.ts
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
import { describe, test, expect, beforeEach, afterEach } from "bun:test";
|
||||||
|
import { slugify, formatCentsToCurrency } from "../src/kijiji";
|
||||||
|
|
||||||
|
describe("Utility Functions", () => {
|
||||||
|
describe("slugify", () => {
|
||||||
|
test("should convert basic strings to slugs", () => {
|
||||||
|
expect(slugify("Hello World")).toBe("hello-world");
|
||||||
|
expect(slugify("iPhone 13 Pro")).toBe("iphone-13-pro");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle special characters", () => {
|
||||||
|
expect(slugify("Café & Restaurant")).toBe("cafe-restaurant");
|
||||||
|
expect(slugify("100% New")).toBe("100-new");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle empty and edge cases", () => {
|
||||||
|
expect(slugify("")).toBe("");
|
||||||
|
expect(slugify(" ")).toBe("-");
|
||||||
|
expect(slugify("---")).toBe("-");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should preserve numbers and valid characters", () => {
|
||||||
|
expect(slugify("iPhone 13")).toBe("iphone-13");
|
||||||
|
expect(slugify("item123")).toBe("item123");
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("formatCentsToCurrency", () => {
|
||||||
|
test("should format valid cent values", () => {
|
||||||
|
expect(formatCentsToCurrency(100)).toBe("$1.00");
|
||||||
|
expect(formatCentsToCurrency(1999)).toBe("$19.99");
|
||||||
|
expect(formatCentsToCurrency(0)).toBe("$0.00");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle string inputs", () => {
|
||||||
|
expect(formatCentsToCurrency("100")).toBe("$1.00");
|
||||||
|
expect(formatCentsToCurrency("1999")).toBe("$19.99");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle null/undefined inputs", () => {
|
||||||
|
expect(formatCentsToCurrency(null)).toBe("");
|
||||||
|
expect(formatCentsToCurrency(undefined)).toBe("");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should handle invalid inputs", () => {
|
||||||
|
expect(formatCentsToCurrency("invalid")).toBe("");
|
||||||
|
expect(formatCentsToCurrency(Number.NaN)).toBe("");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("should use en-US locale formatting", () => {
|
||||||
|
expect(formatCentsToCurrency(123456)).toBe("$1,234.56");
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
12
test/setup.ts
Normal file
12
test/setup.ts
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
// Test setup for Bun test runner
|
||||||
|
import { expect } from "bun:test";
|
||||||
|
|
||||||
|
// Global test setup
|
||||||
|
// This file is loaded before any tests run due to bunfig.toml preload
|
||||||
|
|
||||||
|
// Mock fetch globally for tests
|
||||||
|
global.fetch = global.fetch || (() => {
|
||||||
|
throw new Error('fetch is not available in test environment');
|
||||||
|
});
|
||||||
|
|
||||||
|
// Add any global test utilities here
|
||||||
Reference in New Issue
Block a user