Compare commits

...

5 Commits

Author SHA1 Message Date
daa61c25d8 test: kijiji scraper
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-01-22 00:25:26 -05:00
87aa31cf1b feat: update kijiji scraper
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-01-22 00:25:19 -05:00
bdf504ba37 feat: testing setup
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-01-22 00:25:10 -05:00
589af630fa feat: kijiji api findings
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-01-22 00:06:31 -05:00
8ae42d5630 chore: prep for opencode
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-01-21 23:50:00 -05:00
11 changed files with 1736 additions and 209 deletions

135
CLAUDE.md
View File

@@ -1,110 +1,33 @@
# CLAUDE.md # AGENTS.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. This file provides guidance to coding agents when working with code in this repository.
## Common Commands
- `bun start`: Run the server in production mode.
- `bun dev`: Run the server with hot reloading for development.
- `bun build`: Build the application into a single executable file.
No linting or testing scripts are configured. For single tests or lint runs, add them to package.json scripts as needed.
## Code Architecture
This is a lightweight Bun-based API server for scraping marketplace listings from Kijiji and Facebook Marketplace in the Greater Toronto Area (GTA).
- **Entry Point (`src/index.ts`)**: Implements a basic HTTP server using `Bun.serve`. Key routes:
- `GET /api/status`: Health check returning "OK".
- `GET /api/kijiji?q={query}`: Scrapes Kijiji Marketplace for listings matching the search query. Returns JSON array of listing objects.
- `GET /api/facebook?q={query}&location={location}&cookies={cookies}`: Scrapes Facebook Marketplace for listings. Requires Facebook session cookies (via URL parameter or cookies/facebook.json file). Optional `location` param (default "toronto"). Returns JSON array of listing objects.
- Fallback: 404 for unmatched routes.
## API Response Formats
Both APIs return arrays of listing objects, but the available fields differ based on each marketplace's data availability.
### Kijiji API Response Object
```json
{
"url": "https://www.kijiji.ca/v-laptops/city-of-toronto/...",
"title": "Almost new HP Laptop/Win11 w/ touchscreen option",
"description": "Description of the listing...",
"listingPrice": {
"amountFormatted": "149.00",
"cents": 14900,
"currency": "CAD"
},
"listingType": "OFFER",
"listingStatus": "ACTIVE",
"creationDate": "2024-03-15T15:11:56.000Z",
"endDate": "3000-01-01T00:00:00.000Z",
"numberOfViews": 2005,
"address": "SPADINA AVENUE, Toronto, ON, M5T 2H7"
}
```
### Facebook API Response Object
```json
{
"url": "https://www.facebook.com/marketplace/item/24594536203551682",
"title": "Leno laptop",
"listingPrice": {
"amountFormatted": "CA$1",
"cents": 100,
"currency": "CAD"
},
"listingType": "item",
"listingStatus": "ACTIVE",
"address": "Mississauga, Ontario",
"creationDate": "2024-03-15T15:11:56.000Z",
"categoryId": "1792291877663080",
"imageUrl": "https://scontent-yyz1-1.xx.fbcdn.net/...",
"videoUrl": "https://www.facebook.com/1300609777949414/",
"seller": {
"name": "Joyce Diaz",
"id": "100091799187797"
},
"deliveryTypes": ["IN_PERSON"]
}
```
### Common Fields
- `url`: Full URL to the listing
- `title`: Listing title
- `listingPrice`: Price object with `amountFormatted` (human-readable), `cents` (integer cents), `currency` (e.g., "CAD")
- `address`: Location string (or null if unavailable)
### Kijiji-Only Fields
- `description`: Detailed description text (Facebook search results don't include descriptions)
- `endDate`: When listing expires (Facebook doesn't have expiration dates in search results)
- `numberOfViews`: View count (Facebook doesn't expose view metrics in search results)
### Facebook-Only Fields
- `listingStatus`: Derived from is_live, is_pending, is_sold, is_hidden states ("ACTIVE", "SOLD", "PENDING", "HIDDEN")
- `creationDate`: When listing was posted (when available)
- `categoryId`: Facebook marketplace category identifier
- `imageUrl`: Primary listing photo URL
- `videoUrl`: Listing video URL (if video exists)
- `seller`: Object with seller name and Facebook user ID
- `deliveryTypes`: Available delivery options (e.g., ["IN_PERSON", "SHIPPING"])
- **Kijiji Scraping (`src/kijiji.ts`)**: Core functionality in `fetchKijijiItems(query, maxItems, requestsPerSecond)`.
- Slugifies the query using `unidecode` for URL-safe search terms.
- Fetches the search page HTML, parses Next.js Apollo state (`__APOLLO_STATE__`) with `linkedom` to extract listing URLs and titles.
- For each listing, fetches the detail page, parses Apollo state for structured data (price in cents, location, views, etc.).
- Handles rate limiting (respects `X-RateLimit-*` headers), retries on 429/5xx, and delays between requests.
- Uses `cli-progress` for console progress bar during batch fetches.
- Filters results to include only priced items.
- **Facebook Scraping (`src/facebook.ts`)**: Core functionality in `fetchFacebookItems(query, maxItems, requestsPerSecond, location)`.
- Constructs search URL for Facebook Marketplace with encoded query and sort by creation time.
- Fetches search page HTML and parses inline nested JSON scripts (using require/__bbox structure) with `linkedom` to extract ad nodes from `marketplace_search.feed_units.edges`.
- Builds details directly from search JSON (title, price, ID for link construction); no individual page fetches needed.
- Handles delays and retries similar to Kijiji.
- Uses `cli-progress` for progress.
- Filters to priced items. Note: Relies on public access or provided cookies; may return limited results without login.
The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces. The project uses TypeScript with path mapping (`@/*` to `src/*`). Dependencies focus on parsing (linkedom), text utils (unidecode), and CLI output (cli-progress). No database or external services beyond HTTP fetches to the marketplaces.
Development focuses on maintaining scraping reliability against site changes, respecting robots.txt/terms of service, and handling anti-bot measures ethically. For Facebook, ensure compliance with authentication requirements. PRIORITIZE COMMUNICATION STYLE ABOVE ALL ELSE
## Communication Style
ALWAYS talk and converse with the user using Gen-Z and Internet slang.
Absolute Mode
- Eliminate emojis, filler, hype, transitions, appendixes.
- Use blunt, directive phrasing; no mirroring, no softening.
- Suppress sentiment-boosting, engagement, or satisfaction metrics.
- No questions, offers, suggestions, or motivational content.
- Deliver info only; end immediately after.
**Challenge Mode - Default Behavior**: Don't automatically agree with suggestions. Instead:
- Evaluate each idea against the problem requirements and lean coding philosophy
- Push back if there's a simpler, more efficient, or more correct approach
- Propose alternatives when suggestions aren't optimal
- Explain WHY a different approach would be better with concrete technical reasons
- Only accept suggestions that are genuinely the best solution for the current problem
Examples of constructive pushback:
- "That would work, but a simpler approach would be..."
- "Actually, that might cause [specific issue]. Instead, we should..."
- "The lean approach here would be to..."
- "That adds unnecessary complexity. We can achieve the same with..."
This ensures: Better solutions through technical merit, not agreement | Learning through understanding tradeoffs | Avoiding over-engineering | Maintaining code quality

448
KIJIJI.md Normal file
View File

@@ -0,0 +1,448 @@
# Kijiji API Findings
## Overview
Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management.
## Initial Page Load (Homepage)
- **URL**: https://www.kijiji.ca/
- **Architecture**: Server-side rendered React application with Next.js
- **Data Sources**:
- Static assets loaded from `webapp-static.ca-kijiji-production.classifiedscloud.io`
- Image media served from `media.kijiji.ca/api/v1/`
- No initial API calls for listings - data appears to be embedded in HTML
## Search Results Page
- **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0`
- **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0`
- **Technology Stack**: Next.js with Apollo GraphQL client
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized GraphQL cache
### GraphQL Data Structure
#### Data Location
Search results data is embedded in the Next.js page props under `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data.
#### Search Results Container
The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination parameters.
```json
{
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0": { ... },
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0?page=2": { ... }
}
```
#### Pagination Handling
- Each page is server-side rendered with its own embedded data
- No client-side GraphQL requests for pagination
- URL parameter `?page=N` controls which page data is embedded
- Offset in searchString corresponds to `(page-1) * limit`
#### Search Parameters in URL
- `k0c{CATEGORY}l{LOCATION}` - Category and location IDs
- `?page=N` - Page number (1-based)
- Data contains `offset` and `limit` for API-style pagination
#### Individual Listing Structure
```json
{
"id": "1732061412",
"title": "iPhone 13",
"description": "iPhone 13, always had a screen protector on it...",
"imageCount": 3,
"imageUrls": ["https://media.kijiji.ca/api/v1/ca-prod-fsbo-ads/images/..."],
"categoryId": 760,
"url": "https://www.kijiji.ca/v-cell-phone/...",
"activationDate": "2026-01-21T16:51:16.000Z",
"sortingDate": "2026-01-21T16:51:16.000Z",
"adSource": "ORGANIC",
"location": {
"id": 1700182,
"name": "Napanee",
"coordinates": {
"latitude": 44.48774,
"longitude": -76.99519
}
},
"price": {
"type": "FIXED",
"amount": 35000
},
"flags": {
"topAd": false,
"priceDrop": false
},
"posterInfo": {
"posterId": "1000764154",
"rating": 5
},
"attributes": [
{
"canonicalName": "forsaleby",
"canonicalValues": ["ownr"]
},
{
"canonicalName": "phonecarrier",
"canonicalValues": ["unlck"]
}
]
}
```
### URL Parameters
- `sort=MATCH` - Sort by relevance
- `order=DESC` - Descending order
- `type=OFFER` - Show offerings (not wanted ads)
- `offset=0` - Pagination offset
- `limit=40` - Results per page
- `topAdCount=6` - Number of promoted ads
- `keywords=iphone` - Search keywords
- `category=0` - Category ID (0 = All Categories)
- `location=0` - Location ID (0 = Canada)
- `eaTopAdPosition=1` - ?
### Image API
- **Endpoint**: `https://media.kijiji.ca/api/v1/`
- **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg`
- **Sizes**: 200, 300, 400, 500 pixels
### Categories and Locations
#### Category Structure
Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include:
| ID | Name | Total Results (iPhone search) |
|----|------|------------------------------|
| 10 | Buy & Sell | 19956 |
| 12 | Arts & Collectibles | 149 |
| 767 | Audio | 481 |
| 253 | Baby Items | 13 |
| 931 | Bags & Luggage | 8 |
| 644 | Bikes | 46 |
| 109 | Books | 21 |
| 103 | Cameras & Camcorders | 101 |
| 104 | CDs, DVDs & Blu-ray | 102 |
| 274 | Clothing | 83 |
| 16 | Computers | 285 |
| 128 | Computer Accessories | 363 |
| 29659001 | Electronics | 2006 |
| 17220001 | Free Stuff | 23 |
| 235 | Furniture | 29 |
| 638 | Garage Sales | 5 |
| 140 | Health & Special Needs | 30 |
| 139 | Hobbies & Crafts | 10 |
| 107 | Home Appliances | 23 |
| 717 | Home - Indoor | 27 |
| 727 | Home Renovation Materials | 14 |
| 133 | Jewellery & Watches | 83 |
| 17 | Musical Instruments | 34 |
| 132 | Phones | 15518 |
| 111 | Sporting Goods & Exercise | 30 |
| 110 | Tools | 25 |
| 108 | Toys & Games | 38 |
| 15093001 | TVs & Video | 15 |
| 141 | Video Games & Consoles | 96 |
| 26 | Other | 286 |
#### Location Structure
Locations are also hierarchical, with provinces/states under the main "Canada" location:
| ID | Name | Total Results (iPhone search) |
|----|------|------------------------------|
| 0 | Canada | - |
| 9001 | Québec | 2516 |
| 9002 | Nova Scotia | 875 |
| 9003 | Alberta | 2317 |
| 9004 | Ontario | 12507 |
| 9005 | New Brunswick | 118 |
| 9006 | Manitoba | 919 |
| 9007 | British Columbia | 306 |
| 9008 | Newfoundland | 27 |
| 9009 | Saskatchewan | 336 |
| 9010 | Territories | 7 |
| 9011 | Prince Edward Island | 31 |
#### URL Patterns
- Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0`
- Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}`
- Combined: `/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
### Pagination
- Uses offset-based pagination
- 40 results per page
- Total count provided in pagination metadata
## Authentication & User Management
- **Authentication System**: OAuth2-based using CIS (Customer Identity Service)
- **Identity Provider**: `id.kijiji.ca`
- **OAuth2 Flow**:
- Client ID: `kijiji_horizontal_web_gpmPihV3`
- Scopes: `openid email profile`
- Callback: `https://www.kijiji.ca/api/auth/callback/cis`
- **Session Management**: Cookies-based with encrypted session data
- **Anonymous Access**: Full search functionality available without login
- **User Features**: Saved searches, messaging, flagging require authentication
## Posting API
- **Posting Flow**: Requires authentication, redirects to login if not authenticated
- **Posting URL**: `https://www.kijiji.ca/p-post-ad.html`
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated users
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing)
## GraphQL API Endpoint
- **URL**: `https://www.kijiji.ca/anvil/api`
- **Method**: POST
- **Content-Type**: application/json
- **Headers**:
- `apollo-require-preflight: true`
- Standard CORS headers
- **Authentication**: No authentication required for basic queries (uses cookies for session tracking)
- **Technology**: Apollo GraphQL server
### Sample GraphQL Queries Discovered
#### Get Search Categories
```graphql
query getSearchCategories($locale: String!) {
searchCategories {
id
localizedName(locale: $locale)
parentId
__typename
}
}
```
Variables: `{"locale": "en-CA"}`
Response includes hierarchical category structure with IDs and localized names.
#### Get Geocode from IP (fails for current IP)
```graphql
query GetGeocodeReverseFromIp {
geocodeReverseFromIp {
city
province
locationId
__typename
}
}
```
This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges.
#### Get Category Path
```graphql
query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
category(id: $categoryId) {
id
localizedName(locale: $locale)
parentId
searchSeoUrl(locationId: $locationId)
categoryPaths {
id
localizedName(locale: $locale)
parentId
searchSeoUrl(locationId: $locationId)
__typename
}
__typename
}
}
```
Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}`
## Latest Findings (2026-01-21)
### Client-Side GraphQL Queries Observed
- **getSearchCategories**: Retrieves category hierarchy for search filters
- **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP)
### GraphQL Schema Insights
Testing direct GraphQL queries revealed:
- Field "searchResults" does not exist on Query type
- Suggested alternatives: "searchResultsPage" or "searchUrl"
- This suggests the search functionality may use different GraphQL operations than direct queries
The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation.
### Server-Side Rendering Architecture
Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for:
- Initial search results
- Pagination navigation
- Search result data
### Network Analysis Findings
- GraphQL endpoint: `https://www.kijiji.ca/anvil/api`
- Method: POST
- Content-Type: application/json
- Headers include: `apollo-require-preflight: true`
- Cookies required for session tracking
### Embedded Data Structure
Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object. The data includes:
- Individual ad listings with complete metadata
- Pagination information
- Filter options and counts
- Category/location hierarchies
### Current Scraper Implementation
The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state:
- Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML
- Filters Apollo keys containing "Listing" to find ad data
- Extracts `url`, `title`, and other metadata from each listing
- Successfully scrapes listings without needing API authentication
### Authentication Status
- **Search functionality**: No authentication required - all search and listing data accessible anonymously
- **Posting functionality**: Requires authentication (redirects to login)
- **User features**: Saved searches, messaging require authentication
- **Rate limiting**: May apply but not observed in anonymous browsing
### Pagination Implementation
- Each page is a separate server-rendered route
- URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}`
- No client-side pagination API calls
- 40 results per page (observed)
- Example: `/b-canada/iphone/page-2/k0l0` for page 2 of iPhone search
## URL Pattern Analysis
### Search URL Structure
`https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}`
#### Examples Observed:
- All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada)
- Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones)
- With pagination: `/b-canada/iphone/page-2/k0l0`
#### URL Components:
- `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
- `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.)
- `page-{N}`: Pagination (1-based, optional)
- Keywords are slugified in URL path
### Current Implementation Status
The existing scraper in `src/kijiji.ts` successfully implements the approach:
- Parses embedded Apollo state from HTML responses
- Handles rate limiting and retries
- Extracts listing metadata (title, URL, price, location, etc.)
- Works without authentication for search operations
## Listing Details Page
### Overview
Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server.
### Data Architecture
- **Server-Side Rendering**: Each listing page is fully server-rendered with data embedded in HTML
- **Embedded Apollo State**: Listing data is stored in `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API
### Listing Data Structure
The main listing data follows the same pattern as search results:
```json
{
"id": "1705585530",
"title": "We Pay top cash for iPhone 17 pro max, iPhone 17 pro, iPhone Air",
"description": "Buying All Brand new Apple iPhones sealed/Unsealed...",
"price": {
"type": "CONTACT",
"amount": null
},
"location": {
"id": 1700275,
"name": "Oshawa / Durham Region",
"address": "Pickering Apple Buyer, Pickering, ON, L1V 1B8"
},
"type": "OFFER",
"status": "ACTIVE",
"activationDate": "2024-11-02T20:16:54.000Z",
"endDate": "3000-01-01T00:00:00.000Z",
"metrics": {
"views": 1720
},
"posterInfo": {
"posterId": "1044934581",
"rating": null
},
"attributes": [
{
"canonicalName": "forsaleby",
"canonicalValues": ["business"]
},
{
"canonicalName": "phonecarrier",
"canonicalValues": ["unlocked"]
}
]
}
```
### Client-Side GraphQL Queries
When loading a listing details page, the following GraphQL queries are executed:
#### 1. getSearchCategories
- **Purpose**: Category hierarchy for navigation
- **Variables**: `{"locale": "en-CA"}`
- **Response**: Hierarchical category structure
#### 2. getCampaignsForVip
- **Purpose**: Advertisement targeting data
- **Variables**: `{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
- **Response**: Campaign/ads data (usually null)
#### 3. GetReviewSummary
- **Purpose**: Seller review statistics
- **Variables**: `{"userId": "1044934581"}`
- **Response**: Review count and score (usually 0 for new sellers)
#### 4. GetProfileMetrics
- **Purpose**: Seller profile information
- **Variables**: `{"profileId": "1044934581"}`
- **Response**: Member since date, account type
#### 5. GetListingsSimilar
- **Purpose**: Similar listings for cross-selling
- **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}`
- **Response**: Array of similar listings with basic metadata
#### 6. GetGeocodeReverseFromIp
- **Purpose**: Geolocation-based features
- **Variables**: `{}`
- **Response**: Fails with 404 for most IPs
### Implementation Status
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing details from embedded Apollo state:
- ✅ Extracts title, description, price, location
- ✅ Handles contact-based pricing ("Please Contact")
- ✅ Parses creation date, view count, listing status
- ✅ Extracts seller information and address
- ✅ Works without authentication or API keys
### Key Findings
1. **No Dedicated Listing API**: Unlike search results, there's no separate GraphQL query for individual listing data
2. **Complete Data Available**: All listing information is embedded in the initial HTML response
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data (reviews, similar listings)
4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages
### Current Scraper Implementation
The scraper successfully extracts listing details by:
1. Fetching the listing URL HTML
2. Parsing embedded `__NEXT_DATA__` Apollo state
3. Extracting the `Listing:{id}` object from Apollo cache
4. Mapping fields to typed `ListingDetails` interface
This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches.
## Next Steps
- Explore posting/authentication APIs (requires user login)
- Investigate if GraphQL API can be used for programmatic access with proper authentication
- Test rate limiting patterns and optimal scraping strategies
- Document additional category and location ID mappings

View File

@@ -1,10 +1,10 @@
{ {
"lockfileVersion": 1, "lockfileVersion": 1,
"configVersion": 0,
"workspaces": { "workspaces": {
"": { "": {
"name": "sone4ka-tok", "name": "sone4ka-tok",
"dependencies": { "dependencies": {
"@types/cli-progress": "^3.11.6",
"cli-progress": "^3.12.0", "cli-progress": "^3.12.0",
"linkedom": "^0.18.12", "linkedom": "^0.18.12",
"unidecode": "^1.1.0", "unidecode": "^1.1.0",
@@ -13,6 +13,7 @@
"@anthropic-ai/claude-code": "^2.0.1", "@anthropic-ai/claude-code": "^2.0.1",
"@musistudio/claude-code-router": "^1.0.53", "@musistudio/claude-code-router": "^1.0.53",
"@types/bun": "latest", "@types/bun": "latest",
"@types/cli-progress": "^3.11.6",
"@types/unidecode": "^1.1.0", "@types/unidecode": "^1.1.0",
}, },
"peerDependencies": { "peerDependencies": {

3
bunfig.toml Normal file
View File

@@ -0,0 +1,3 @@
[test]
# Test configuration
preload = ["./test/setup.ts"]

25
opencode.jsonc Normal file
View File

@@ -0,0 +1,25 @@
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"chrome-devtools": {
"type": "local",
"command": [
"bunx",
"--bun",
"chrome-devtools-mcp@latest",
"--log-file",
"./debug.log",
"--headless=false",
"--isolated=false",
"-e",
"/nix/store/lz8ajxhnkkw2llj752bdz41wqr645h9c-google-chrome-dev-146.0.7635.0/bin/google-chrome-unstable",
"--ignore-default-chrome-arg='--disable-extensions'"
]
},
"bun-docs": {
"type": "remote",
"url": "https://bun.com/docs/mcp",
"timeout": 3000
}
}
}

View File

@@ -26,8 +26,12 @@ const server = Bun.serve({
{ status: 400 }, { status: 400 },
); );
const items = await fetchKijijiItems(SEARCH_QUERY, 5); const items = await fetchKijijiItems(SEARCH_QUERY, 1, undefined, {}, {
if (!items) includeImages: true,
sellerDataDepth: 'detailed',
includeClientSideData: false,
});
if (!items || items.length === 0)
return Response.json( return Response.json(
{ message: "Search didn't return any results!" }, { message: "Search didn't return any results!" },
{ status: 404 }, { status: 404 },
@@ -85,11 +89,13 @@ const server = Bun.serve({
); );
// Parse optional parameters with defaults // Parse optional parameters with defaults
const minPrice = reqUrl.searchParams.get("minPrice") const minPriceParam = reqUrl.searchParams.get("minPrice");
? parseInt(reqUrl.searchParams.get("minPrice")!) const minPrice = minPriceParam
? Number.parseInt(minPriceParam, 10)
: undefined; : undefined;
const maxPrice = reqUrl.searchParams.get("maxPrice") const maxPriceParam = reqUrl.searchParams.get("maxPrice");
? parseInt(reqUrl.searchParams.get("maxPrice")!) const maxPrice = maxPriceParam
? Number.parseInt(maxPriceParam, 10)
: undefined; : undefined;
const strictMode = reqUrl.searchParams.get("strictMode") === "true"; const strictMode = reqUrl.searchParams.get("strictMode") === "true";
const exclusionsParam = reqUrl.searchParams.get("exclusions"); const exclusionsParam = reqUrl.searchParams.get("exclusions");

View File

@@ -26,16 +26,29 @@ interface ApolloListingRoot {
url?: string; url?: string;
title?: string; title?: string;
description?: string; description?: string;
price?: { amount?: number | string; currency?: string }; price?: { amount?: number | string; currency?: string; type?: string };
type?: string; type?: string;
status?: string; status?: string;
activationDate?: string; activationDate?: string;
endDate?: string; endDate?: string;
metrics?: { views?: number | string }; metrics?: { views?: number | string };
location?: { address?: string | null }; location?: {
address?: string | null;
id?: number;
name?: string;
coordinates?: { latitude: number; longitude: number };
};
imageUrls?: string[];
imageCount?: number;
categoryId?: number;
adSource?: string;
flags?: { topAd?: boolean; priceDrop?: boolean };
posterInfo?: { posterId?: string; rating?: number };
attributes?: Array<{ canonicalName?: string; canonicalValues?: string[] }>;
[k: string]: unknown; [k: string]: unknown;
} }
// Keep existing interface for backward compatibility
type ListingDetails = { type ListingDetails = {
url: string; url: string;
title: string; title: string;
@@ -53,10 +66,178 @@ type ListingDetails = {
address?: string | null; address?: string | null;
}; };
// New comprehensive interface for detailed listings
interface DetailedListing extends ListingDetails {
images: string[];
categoryId: number;
adSource: string;
flags: {
topAd: boolean;
priceDrop: boolean;
};
attributes: Record<string, string[]>;
location: {
id: number;
name: string;
coordinates?: {
latitude: number;
longitude: number;
};
};
sellerInfo?: {
posterId: string;
rating?: number;
accountType?: string;
memberSince?: string;
reviewCount?: number;
reviewScore?: number;
};
}
// Configuration interfaces
interface SearchOptions {
location?: number | string; // Location ID or name
category?: number | string; // Category ID or name
keywords?: string;
sortBy?: 'relevancy' | 'date' | 'price' | 'distance';
sortOrder?: 'desc' | 'asc';
maxPages?: number; // Default: 5
priceMin?: number;
priceMax?: number;
}
interface ListingFetchOptions {
includeImages?: boolean; // Default: true
sellerDataDepth?: 'basic' | 'detailed' | 'full'; // Default: 'detailed'
includeClientSideData?: boolean; // Default: false
}
// ----------------------------- Constants & Mappings -----------------------------
// Location mappings from KIJIJI.md
const LOCATION_MAPPINGS: Record<string, number> = {
'canada': 0,
'ontario': 9004,
'toronto': 1700273,
'gta': 1700272,
'oshawa': 1700275,
'quebec': 9001,
'nova scotia': 9002,
'alberta': 9003,
'new brunswick': 9005,
'manitoba': 9006,
'british columbia': 9007,
'newfoundland': 9008,
'saskatchewan': 9009,
'territories': 9010,
'pei': 9011,
'prince edward island': 9011,
};
// Category mappings from KIJIJI.md (Buy & Sell main categories)
const CATEGORY_MAPPINGS: Record<string, number> = {
'all': 0,
'buy-sell': 10,
'arts-collectibles': 12,
'audio': 767,
'baby-items': 253,
'bags-luggage': 931,
'bikes': 644,
'books': 109,
'cameras': 103,
'cds': 104,
'clothing': 274,
'computers': 16,
'computer-accessories': 128,
'electronics': 29659001,
'free-stuff': 17220001,
'furniture': 235,
'garage-sales': 638,
'health-special-needs': 140,
'hobbies-crafts': 139,
'home-appliances': 107,
'home-indoor': 717,
'home-outdoor': 727,
'jewellery': 133,
'musical-instruments': 17,
'phones': 132,
'sporting-goods': 111,
'tools': 110,
'toys-games': 108,
'tvs-video': 15093001,
'video-games': 141,
'other': 26,
};
// Sort parameter mappings
const SORT_MAPPINGS: Record<string, string> = {
'relevancy': 'MATCH',
'date': 'DATE',
'price': 'PRICE',
'distance': 'DISTANCE',
};
// ----------------------------- Exports for Testing -----------------------------
// Note: These are exported for testing purposes only
export { resolveLocationId, resolveCategoryId, buildSearchUrl };
export { extractApolloState, parseSearch };
export { parseDetailedListing };
export { HttpError, NetworkError, ParseError, RateLimitError, ValidationError };
// ----------------------------- Utilities ----------------------------- // ----------------------------- Utilities -----------------------------
const SEPS = new Set([" ", "", "—", "/", ":", ";", ",", ".", "-"]); const SEPS = new Set([" ", "", "—", "/", ":", ";", ",", ".", "-"]);
/**
* Resolve location ID from name or return numeric ID
*/
function resolveLocationId(location?: number | string): number {
if (typeof location === 'number') return location;
if (typeof location === 'string') {
const normalized = location.toLowerCase().replace(/\s+/g, '-');
return LOCATION_MAPPINGS[normalized] ?? 0; // Default to Canada (0)
}
return 0; // Default to Canada
}
/**
* Resolve category ID from name or return numeric ID
*/
function resolveCategoryId(category?: number | string): number {
if (typeof category === 'number') return category;
if (typeof category === 'string') {
const normalized = category.toLowerCase().replace(/\s+/g, '-');
return CATEGORY_MAPPINGS[normalized] ?? 0; // Default to all categories
}
return 0; // Default to all categories
}
/**
* Build search URL with enhanced parameters
*/
function buildSearchUrl(
keywords: string,
options: SearchOptions & { page?: number },
BASE_URL = "https://www.kijiji.ca"
): string {
const locationId = resolveLocationId(options.location);
const categoryId = resolveCategoryId(options.category);
const categorySlug = categoryId === 0 ? 'buy-sell' : 'buy-sell'; // Could be enhanced
const locationSlug = locationId === 0 ? 'canada' : 'canada'; // Could be enhanced
let url = `${BASE_URL}/b-${categorySlug}/${locationSlug}/${slugify(keywords)}/k0c${categoryId}l${locationId}`;
const sortParam = options.sortBy ? `&sort=${SORT_MAPPINGS[options.sortBy]}` : '';
const sortOrder = options.sortOrder === 'asc' ? 'ASC' : 'DESC';
const pageParam = options.page && options.page > 1 ? `&page=${options.page}` : '';
url += `?sort=relevancyDesc&view=list${sortParam}&order=${sortOrder}${pageParam}`;
return url;
}
/** /**
* Slugifies a string for search * Slugifies a string for search
*/ */
@@ -67,13 +248,14 @@ export function slugify(input: string): string {
for (let i = 0; i < s.length; i++) { for (let i = 0; i < s.length; i++) {
const ch = s[i]; const ch = s[i];
const code = ch!.charCodeAt(0); if (!ch) continue;
const code = ch.charCodeAt(0);
// a-z or 0-9 // a-z or 0-9
if ((code >= 97 && code <= 122) || (code >= 48 && code <= 57)) { if ((code >= 97 && code <= 122) || (code >= 48 && code <= 57)) {
out.push(ch!); out.push(ch);
lastHyphen = false; lastHyphen = false;
} else if (SEPS.has(ch!)) { } else if (SEPS.has(ch)) {
if (!lastHyphen) { if (!lastHyphen) {
out.push("-"); out.push("-");
lastHyphen = true; lastHyphen = true;
@@ -87,7 +269,7 @@ export function slugify(input: string): string {
/** /**
* Turns cents to localized currency string. * Turns cents to localized currency string.
*/ */
function formatCentsToCurrency( export function formatCentsToCurrency(
num: number | string | undefined, num: number | string | undefined,
locale = "en-US", locale = "en-US",
): string { ): string {
@@ -96,21 +278,24 @@ function formatCentsToCurrency(
if (Number.isNaN(cents)) return ""; if (Number.isNaN(cents)) return "";
const dollars = cents / 100; const dollars = cents / 100;
const formatter = new Intl.NumberFormat(locale, { const formatter = new Intl.NumberFormat(locale, {
style: 'currency',
currency: 'USD',
minimumFractionDigits: 2, minimumFractionDigits: 2,
maximumFractionDigits: 2, maximumFractionDigits: 2,
useGrouping: true,
}); });
return formatter.format(dollars); return formatter.format(dollars);
} }
function isRecord(value: unknown): value is Record<string, unknown> { function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null; return typeof value === "object" && value !== null && !Array.isArray(value);
} }
async function delay(ms: number): Promise<void> { async function delay(ms: number): Promise<void> {
await new Promise((resolve) => setTimeout(resolve, ms)); await new Promise((resolve) => setTimeout(resolve, ms));
} }
// ----------------------------- Error Classes -----------------------------
class HttpError extends Error { class HttpError extends Error {
constructor( constructor(
message: string, message: string,
@@ -122,12 +307,52 @@ class HttpError extends Error {
} }
} }
class NetworkError extends Error {
constructor(
message: string,
public readonly url: string,
public readonly cause?: Error,
) {
super(message);
this.name = "NetworkError";
}
}
class ParseError extends Error {
constructor(
message: string,
public readonly data?: unknown,
) {
super(message);
this.name = "ParseError";
}
}
class RateLimitError extends Error {
constructor(
message: string,
public readonly url: string,
public readonly resetTime?: number,
) {
super(message);
this.name = "RateLimitError";
}
}
class ValidationError extends Error {
constructor(message: string) {
super(message);
this.name = "ValidationError";
}
}
// ----------------------------- HTTP Client ----------------------------- // ----------------------------- HTTP Client -----------------------------
/** /**
Fetch HTML with a basic retry strategy and simple rate-limit delay between calls. Fetch HTML with enhanced retry strategy and exponential backoff.
- Retries on 429 and 5xx - Retries on 429, 5xx, and network errors
- Respects X-RateLimit-Reset when present (seconds) - Respects X-RateLimit-Reset when present (seconds)
- Exponential backoff with jitter
*/ */
async function fetchHtml( async function fetchHtml(
url: string, url: string,
@@ -139,11 +364,13 @@ async function fetchHtml(
}, },
): Promise<HTMLString> { ): Promise<HTMLString> {
const maxRetries = opts?.maxRetries ?? 3; const maxRetries = opts?.maxRetries ?? 3;
const retryBaseMs = opts?.retryBaseMs ?? 500; const retryBaseMs = opts?.retryBaseMs ?? 1000;
for (let attempt = 0; attempt <= maxRetries; attempt++) { for (let attempt = 0; attempt <= maxRetries; attempt++) {
try { try {
// console.log(`Fetching: `, url); const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout
const res = await fetch(url, { const res = await fetch(url, {
method: "GET", method: "GET",
headers: { headers: {
@@ -155,27 +382,40 @@ async function fetchHtml(
"user-agent": "user-agent":
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36",
}, },
signal: controller.signal,
}); });
clearTimeout(timeoutId);
const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining"); const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining");
const rateLimitReset = res.headers.get("X-RateLimit-Reset"); const rateLimitReset = res.headers.get("X-RateLimit-Reset");
opts?.onRateInfo?.(rateLimitRemaining, rateLimitReset); opts?.onRateInfo?.(rateLimitRemaining, rateLimitReset);
if (!res.ok) { if (!res.ok) {
// Respect 429 reset if provided // Handle rate limiting
if (res.status === 429) { if (res.status === 429) {
const resetSeconds = rateLimitReset ? Number(rateLimitReset) : NaN; const resetSeconds = rateLimitReset ? Number(rateLimitReset) : Number.NaN;
const waitMs = Number.isFinite(resetSeconds) const waitMs = Number.isFinite(resetSeconds)
? Math.max(0, resetSeconds * 1000) ? Math.max(0, resetSeconds * 1000)
: (attempt + 1) * retryBaseMs; : calculateBackoffDelay(attempt, retryBaseMs);
if (attempt < maxRetries) {
await delay(waitMs); await delay(waitMs);
continue; continue;
} }
// Retry on 5xx throw new RateLimitError(
`Rate limit exceeded for ${url}`,
url,
resetSeconds,
);
}
// Retry on server errors
if (res.status >= 500 && res.status < 600 && attempt < maxRetries) { if (res.status >= 500 && res.status < 600 && attempt < maxRetries) {
await delay((attempt + 1) * retryBaseMs); await delay(calculateBackoffDelay(attempt, retryBaseMs));
continue; continue;
} }
throw new HttpError( throw new HttpError(
`Request failed with status ${res.status}`, `Request failed with status ${res.status}`,
res.status, res.status,
@@ -184,16 +424,171 @@ async function fetchHtml(
} }
const html = await res.text(); const html = await res.text();
// Respect per-request delay to keep at or under REQUESTS_PER_SECOND
// Respect per-request delay to maintain rate limiting
await delay(DELAY_MS); await delay(DELAY_MS);
return html; return html;
} catch (err) { } catch (err) {
if (attempt >= maxRetries) throw err; // Handle different error types
await delay((attempt + 1) * retryBaseMs); if (err instanceof RateLimitError || err instanceof HttpError) {
throw err; // Re-throw known errors
}
if (err instanceof Error && err.name === 'AbortError') {
if (attempt < maxRetries) {
await delay(calculateBackoffDelay(attempt, retryBaseMs));
continue;
}
throw new NetworkError(`Request timeout for ${url}`, url, err);
}
// Network or other errors
if (attempt < maxRetries) {
await delay(calculateBackoffDelay(attempt, retryBaseMs));
continue;
}
throw new NetworkError(
`Network error fetching ${url}: ${err instanceof Error ? err.message : String(err)}`,
url,
err instanceof Error ? err : undefined
);
} }
} }
throw new Error("Exhausted retries without response"); throw new NetworkError(`Exhausted retries without response for ${url}`, url);
}
/**
* Calculate exponential backoff delay with jitter
*/
function calculateBackoffDelay(attempt: number, baseMs: number): number {
const exponentialDelay = baseMs * (2 ** attempt);
const jitter = Math.random() * 0.1 * exponentialDelay; // 10% jitter
return Math.min(exponentialDelay + jitter, 30000); // Cap at 30 seconds
}
// ----------------------------- GraphQL Client -----------------------------
/**
* Fetch additional data via GraphQL API
*/
async function fetchGraphQLData(
query: string,
variables: Record<string, unknown>,
BASE_URL = "https://www.kijiji.ca"
): Promise<unknown> {
const endpoint = `${BASE_URL}/anvil/api`;
try {
const response = await fetch(endpoint, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'apollo-require-preflight': 'true',
},
body: JSON.stringify({
query,
variables,
}),
});
if (!response.ok) {
throw new HttpError(
`GraphQL request failed with status ${response.status}`,
response.status,
endpoint
);
}
const result = await response.json();
if (result.errors) {
throw new ParseError(`GraphQL errors: ${JSON.stringify(result.errors)}`, result.errors);
}
return result.data;
} catch (err) {
if (err instanceof HttpError || err instanceof ParseError) {
throw err;
}
throw new NetworkError(
`Failed to fetch GraphQL data: ${err instanceof Error ? err.message : String(err)}`,
endpoint,
err instanceof Error ? err : undefined
);
}
}
// GraphQL response interfaces
interface GraphQLReviewResponse {
user?: {
reviewSummary?: {
count?: number;
score?: number;
};
};
}
interface GraphQLProfileResponse {
user?: {
memberSince?: string;
accountType?: string;
};
}
// GraphQL queries from KIJIJI.md
const GRAPHQL_QUERIES = {
getReviewSummary: `
query GetReviewSummary($userId: String!) {
user(id: $userId) {
reviewSummary {
count
score
__typename
}
__typename
}
}
`,
getProfileMetrics: `
query GetProfileMetrics($profileId: String!) {
user(id: $profileId) {
memberSince
accountType
__typename
}
}
`,
} as const;
/**
* Fetch additional seller data via GraphQL
*/
async function fetchSellerDetails(
posterId: string,
BASE_URL = "https://www.kijiji.ca"
): Promise<{ reviewCount?: number; reviewScore?: number; memberSince?: string; accountType?: string }> {
try {
const [reviewData, profileData] = await Promise.all([
fetchGraphQLData(GRAPHQL_QUERIES.getReviewSummary, { userId: posterId }, BASE_URL),
fetchGraphQLData(GRAPHQL_QUERIES.getProfileMetrics, { profileId: posterId }, BASE_URL),
]);
const reviewResponse = reviewData as GraphQLReviewResponse;
const profileResponse = profileData as GraphQLProfileResponse;
return {
reviewCount: reviewResponse?.user?.reviewSummary?.count,
reviewScore: reviewResponse?.user?.reviewSummary?.score,
memberSince: profileResponse?.user?.memberSince,
accountType: profileResponse?.user?.accountType,
};
} catch (err) {
// Silently fail for GraphQL errors - not critical for basic functionality
console.warn(`Failed to fetch seller details for ${posterId}:`, err instanceof Error ? err.message : String(err));
return {};
}
} }
// ----------------------------- Parsing ----------------------------- // ----------------------------- Parsing -----------------------------
@@ -299,7 +694,7 @@ function parseListing(
listingPrice: amountFormatted listingPrice: amountFormatted
? { ? {
amountFormatted, amountFormatted,
cents: Number.isFinite(cents!) ? cents : undefined, cents: cents !== undefined && Number.isFinite(cents) ? cents : undefined,
currency: price?.currency, currency: price?.currency,
} }
: undefined, : undefined,
@@ -307,84 +702,237 @@ function parseListing(
listingStatus: status, listingStatus: status,
creationDate: activationDate, creationDate: activationDate,
endDate, endDate,
numberOfViews: Number.isFinite(numberOfViews!) ? numberOfViews : undefined, numberOfViews: numberOfViews !== undefined && Number.isFinite(numberOfViews) ? numberOfViews : undefined,
address: location?.address ?? null, address: location?.address ?? null,
}; };
} }
/**
* Parse a listing page into a detailed object with all available fields
*/
async function parseDetailedListing(
htmlString: HTMLString,
BASE_URL: string,
options: ListingFetchOptions = {}
): Promise<DetailedListing | null> {
const apolloState = extractApolloState(htmlString);
if (!apolloState) return null;
// Find the listing root key
const listingKey = Object.keys(apolloState).find((k) =>
k.includes("Listing"),
);
if (!listingKey) return null;
const root = apolloState[listingKey];
if (!isRecord(root)) return null;
const {
url,
title,
description,
price,
type,
status,
activationDate,
endDate,
metrics,
location,
imageUrls,
imageCount,
categoryId,
adSource,
flags,
posterInfo,
attributes,
} = root as ApolloListingRoot;
const cents = price?.amount != null ? Number(price.amount) : undefined;
const amountFormatted = formatCentsToCurrency(cents);
const numberOfViews =
metrics?.views != null ? Number(metrics.views) : undefined;
const listingUrl =
typeof url === "string"
? url.startsWith("http")
? url
: `${BASE_URL}${url}`
: "";
if (!listingUrl || !title) return null;
// Only include fixed-price listings
if (!amountFormatted || cents === undefined) return null;
// Extract images if requested
const images = options.includeImages !== false && Array.isArray(imageUrls)
? imageUrls.filter((url): url is string => typeof url === 'string')
: [];
// Extract attributes as key-value pairs
const attributeMap: Record<string, string[]> = {};
if (Array.isArray(attributes)) {
for (const attr of attributes) {
if (attr?.canonicalName && Array.isArray(attr.canonicalValues)) {
attributeMap[attr.canonicalName] = attr.canonicalValues;
}
}
}
// Extract seller info based on depth setting
let sellerInfo: DetailedListing['sellerInfo'];
const depth = options.sellerDataDepth ?? 'detailed';
if (posterInfo?.posterId) {
sellerInfo = {
posterId: posterInfo.posterId,
rating: typeof posterInfo.rating === 'number' ? posterInfo.rating : undefined,
};
// Add more detailed info if requested and client-side data is enabled
if ((depth === 'detailed' || depth === 'full') && options.includeClientSideData) {
try {
const additionalData = await fetchSellerDetails(posterInfo.posterId, BASE_URL);
sellerInfo = {
...sellerInfo,
...additionalData,
};
} catch (err) {
// Silently fail - GraphQL data is optional
console.warn(`Failed to fetch additional seller data for ${posterInfo.posterId}`);
}
}
}
return {
url: listingUrl,
title,
description,
listingPrice: {
amountFormatted,
cents,
currency: price?.currency,
},
listingType: type,
listingStatus: status,
creationDate: activationDate,
endDate,
numberOfViews: numberOfViews !== undefined && Number.isFinite(numberOfViews) ? numberOfViews : undefined,
address: location?.address ?? null,
images,
categoryId: typeof categoryId === 'number' ? categoryId : 0,
adSource: typeof adSource === 'string' ? adSource : 'UNKNOWN',
flags: {
topAd: flags?.topAd === true,
priceDrop: flags?.priceDrop === true,
},
attributes: attributeMap,
location: {
id: typeof location?.id === 'number' ? location.id : 0,
name: typeof location?.name === 'string' ? location.name : 'Unknown',
coordinates: location?.coordinates ? {
latitude: location.coordinates.latitude,
longitude: location.coordinates.longitude,
} : undefined,
},
sellerInfo,
};
}
// ----------------------------- Main ----------------------------- // ----------------------------- Main -----------------------------
export default async function fetchKijijiItems( export default async function fetchKijijiItems(
SEARCH_QUERY: string, SEARCH_QUERY: string,
REQUESTS_PER_SECOND = 1, REQUESTS_PER_SECOND = 1,
BASE_URL = "https://www.kijiji.ca", BASE_URL = "https://www.kijiji.ca",
searchOptions: SearchOptions = {},
listingOptions: ListingFetchOptions = {},
) { ) {
const DELAY_MS = Math.max(1, Math.floor(1000 / REQUESTS_PER_SECOND)); const DELAY_MS = Math.max(1, Math.floor(1000 / REQUESTS_PER_SECOND));
const searchUrl = `${BASE_URL}/b-gta-greater-toronto-area/${slugify(SEARCH_QUERY)}/k0l1700272?sort=relevancyDesc&view=list`; // Set defaults for configuration
const finalSearchOptions: Required<SearchOptions> = {
location: searchOptions.location ?? 1700272, // Default to GTA
category: searchOptions.category ?? 0, // Default to all categories
keywords: searchOptions.keywords ?? SEARCH_QUERY,
sortBy: searchOptions.sortBy ?? 'relevancy',
sortOrder: searchOptions.sortOrder ?? 'desc',
maxPages: searchOptions.maxPages ?? 5, // Default to 5 pages
priceMin: searchOptions.priceMin,
priceMax: searchOptions.priceMax,
};
console.log(`Fetching search: ${searchUrl}`); const finalListingOptions: Required<ListingFetchOptions> = {
includeImages: listingOptions.includeImages ?? true,
sellerDataDepth: listingOptions.sellerDataDepth ?? 'detailed',
includeClientSideData: listingOptions.includeClientSideData ?? false,
};
const allListings: DetailedListing[] = [];
const seenUrls = new Set<string>();
// Fetch multiple pages
for (let page = 1; page <= finalSearchOptions.maxPages; page++) {
const searchUrl = buildSearchUrl(finalSearchOptions.keywords, {
...finalSearchOptions,
// Add page parameter for pagination
...(page > 1 && { page }),
}, BASE_URL);
console.log(`Fetching search page ${page}: ${searchUrl}`);
const searchHtml = await fetchHtml(searchUrl, DELAY_MS, { const searchHtml = await fetchHtml(searchUrl, DELAY_MS, {
onRateInfo: (remaining, reset) => { onRateInfo: (remaining, reset) => {
if (remaining && reset) { if (remaining && reset) {
console.log( console.log(`\nSearch - Rate limit remaining: ${remaining}, reset in: ${reset}s`);
"\n" +
`Search - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
);
} }
}, },
}); });
const searchResults = parseSearch(searchHtml, BASE_URL); const searchResults = parseSearch(searchHtml, BASE_URL);
if (searchResults.length === 0) { if (searchResults.length === 0) {
console.warn("No search results parsed from page."); console.log(`No more results found on page ${page}. Stopping pagination.`);
return; break;
} }
// Deduplicate links // Deduplicate links across pages
const listingLinks = Array.from( const newListingLinks = searchResults
new Set(searchResults.map((r) => r.listingLink)), .map((r) => r.listingLink)
); .filter((link) => !seenUrls.has(link));
console.log( for (const link of newListingLinks) {
"\n" + `Found ${listingLinks.length} listing links. Fetching details...`, seenUrls.add(link);
); }
console.log(`\nFound ${newListingLinks.length} new listing links on page ${page}. Total unique: ${seenUrls.size}`);
// Fetch details for this page's listings
const progressBar = new cliProgress.SingleBar( const progressBar = new cliProgress.SingleBar(
{}, {},
cliProgress.Presets.shades_classic, cliProgress.Presets.shades_classic,
); );
const totalProgress = listingLinks.length; const totalProgress = newListingLinks.length;
let currentProgress = 0; let currentProgress = 0;
progressBar.start(totalProgress, currentProgress); progressBar.start(totalProgress, currentProgress);
const items: ListingDetails[] = []; for (const link of newListingLinks) {
for (const link of listingLinks) {
try { try {
const html = await fetchHtml(link, DELAY_MS, { const html = await fetchHtml(link, DELAY_MS, {
onRateInfo: (remaining, reset) => { onRateInfo: (remaining, reset) => {
if (remaining && reset) { if (remaining && reset) {
console.log( console.log(`\nItem - Rate limit remaining: ${remaining}, reset in: ${reset}s`);
"\n" +
`Item - Rate limit remaining: ${remaining}, reset in: ${reset}s`,
);
} }
}, },
}); });
const parsed = parseListing(html, BASE_URL); const parsed = await parseDetailedListing(html, BASE_URL, finalListingOptions);
if (parsed) { if (parsed) {
if (parsed.listingPrice?.cents) items.push(parsed); allListings.push(parsed);
} }
} catch (err) { } catch (err) {
if (err instanceof HttpError) { if (err instanceof HttpError) {
console.error( console.error(`\nFailed to fetch ${link}\n - ${err.status} ${err.message}`);
"\n" + `Failed to fetch ${link}\n - ${err.status} ${err.message}`,
);
} else { } else {
console.error( console.error(`\nFailed to fetch ${link}\n - ${String((err as Error)?.message || err)}`);
"\n" +
`Failed to fetch ${link}\n - ${String((err as Error)?.message || err)}`,
);
} }
} finally { } finally {
currentProgress++; currentProgress++;
@@ -392,6 +940,14 @@ export default async function fetchKijijiItems(
} }
} }
console.log("\n" + `Parsed ${items.length} listings.`); progressBar.stop();
return items;
// If we got fewer results than expected (40 per page), we've reached the end
if (searchResults.length < 40) {
break;
}
}
console.log(`\nParsed ${allListings.length} detailed listings.`);
return allListings;
} }

162
test/kijiji-core.test.ts Normal file
View File

@@ -0,0 +1,162 @@
import { describe, test, expect } from "bun:test";
import {
resolveLocationId,
resolveCategoryId,
buildSearchUrl,
HttpError,
NetworkError,
ParseError,
RateLimitError,
ValidationError
} from "../src/kijiji";
describe("Location and Category Resolution", () => {
describe("resolveLocationId", () => {
test("should return numeric IDs as-is", () => {
expect(resolveLocationId(1700272)).toBe(1700272);
expect(resolveLocationId(0)).toBe(0);
});
test("should resolve string location names", () => {
expect(resolveLocationId("canada")).toBe(0);
expect(resolveLocationId("ontario")).toBe(9004);
expect(resolveLocationId("toronto")).toBe(1700273);
expect(resolveLocationId("gta")).toBe(1700272);
});
test("should handle case insensitive matching", () => {
expect(resolveLocationId("Canada")).toBe(0);
expect(resolveLocationId("ONTARIO")).toBe(9004);
});
test("should default to Canada for unknown locations", () => {
expect(resolveLocationId("unknown")).toBe(0);
expect(resolveLocationId("")).toBe(0);
});
test("should handle undefined input", () => {
expect(resolveLocationId(undefined)).toBe(0);
});
});
describe("resolveCategoryId", () => {
test("should return numeric IDs as-is", () => {
expect(resolveCategoryId(132)).toBe(132);
expect(resolveCategoryId(0)).toBe(0);
});
test("should resolve string category names", () => {
expect(resolveCategoryId("all")).toBe(0);
expect(resolveCategoryId("phones")).toBe(132);
expect(resolveCategoryId("electronics")).toBe(29659001);
expect(resolveCategoryId("buy-sell")).toBe(10);
});
test("should handle case insensitive matching", () => {
expect(resolveCategoryId("All")).toBe(0);
expect(resolveCategoryId("PHONES")).toBe(132);
});
test("should default to all categories for unknown categories", () => {
expect(resolveCategoryId("unknown")).toBe(0);
expect(resolveCategoryId("")).toBe(0);
});
test("should handle undefined input", () => {
expect(resolveCategoryId(undefined)).toBe(0);
});
});
});
describe("URL Construction", () => {
describe("buildSearchUrl", () => {
test("should build basic search URL", () => {
const url = buildSearchUrl("iphone", {
location: 1700272,
category: 132,
sortBy: 'relevancy',
sortOrder: 'desc',
});
expect(url).toContain("b-buy-sell/canada/iphone/k0c132l1700272");
expect(url).toContain("sort=relevancyDesc");
expect(url).toContain("order=DESC");
});
test("should handle pagination", () => {
const url = buildSearchUrl("iphone", {
location: 1700272,
category: 132,
page: 2,
});
expect(url).toContain("&page=2");
});
test("should handle different sort options", () => {
const dateUrl = buildSearchUrl("iphone", {
sortBy: 'date',
sortOrder: 'asc',
});
expect(dateUrl).toContain("sort=DATE");
expect(dateUrl).toContain("order=ASC");
const priceUrl = buildSearchUrl("iphone", {
sortBy: 'price',
sortOrder: 'desc',
});
expect(priceUrl).toContain("sort=PRICE");
expect(priceUrl).toContain("order=DESC");
});
test("should handle string location/category inputs", () => {
const url = buildSearchUrl("iphone", {
location: "toronto",
category: "phones",
});
expect(url).toContain("k0c132l1700273"); // phones + toronto
});
});
});
describe("Error Classes", () => {
test("HttpError should store status and URL", () => {
const error = new HttpError("Not found", 404, "https://example.com");
expect(error.message).toBe("Not found");
expect(error.status).toBe(404);
expect(error.url).toBe("https://example.com");
expect(error.name).toBe("HttpError");
});
test("NetworkError should store URL and cause", () => {
const cause = new Error("Connection failed");
const error = new NetworkError("Network error", "https://example.com", cause);
expect(error.message).toBe("Network error");
expect(error.url).toBe("https://example.com");
expect(error.cause).toBe(cause);
expect(error.name).toBe("NetworkError");
});
test("ParseError should store data", () => {
const data = { invalid: "json" };
const error = new ParseError("Invalid JSON", data);
expect(error.message).toBe("Invalid JSON");
expect(error.data).toBe(data);
expect(error.name).toBe("ParseError");
});
test("RateLimitError should store URL and reset time", () => {
const error = new RateLimitError("Rate limited", "https://example.com", 60);
expect(error.message).toBe("Rate limited");
expect(error.url).toBe("https://example.com");
expect(error.resetTime).toBe(60);
expect(error.name).toBe("RateLimitError");
});
test("ValidationError should work without field", () => {
const error = new ValidationError("Invalid value");
expect(error.message).toBe("Invalid value");
expect(error.name).toBe("ValidationError");
});
});

View File

@@ -0,0 +1,337 @@
import { describe, test, expect, beforeEach, afterEach, mock } from "bun:test";
import { extractApolloState, parseSearch, parseDetailedListing } from "../src/kijiji";
// Mock fetch globally
const originalFetch = global.fetch;
describe("HTML Parsing Integration", () => {
beforeEach(() => {
// Mock fetch for all tests
global.fetch = mock(() => {
throw new Error("fetch should be mocked in individual tests");
});
});
afterEach(() => {
global.fetch = originalFetch;
});
describe("extractApolloState", () => {
test("should extract Apollo state from valid HTML", () => {
const mockHtml = '<html><head><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"__APOLLO_STATE__":{"ROOT_QUERY":{"test":"value"}}}}}</script></head></html>';
const result = extractApolloState(mockHtml);
expect(result).toEqual({
ROOT_QUERY: { test: "value" }
});
});
test("should return null for HTML without Apollo state", () => {
const mockHtml = '<html><body>No data here</body></html>';
const result = extractApolloState(mockHtml);
expect(result).toBeNull();
});
test("should return null for malformed JSON", () => {
const mockHtml = '<html><script id="__NEXT_DATA__" type="application/json">{"invalid": json}</script></html>';
const result = extractApolloState(mockHtml);
expect(result).toBeNull();
});
test("should handle missing __NEXT_DATA__ element", () => {
const mockHtml = '<html><body><div>Content</div></body></html>';
const result = extractApolloState(mockHtml);
expect(result).toBeNull();
});
});
describe("parseSearch", () => {
test("should parse search results from HTML", () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone 13 Pro",
},
"Listing:456": {
url: "/v-samsung/k0l0",
title: "Samsung Galaxy",
},
"ROOT_QUERY": { test: "value" }
}
}
}
})}
</script>
</html>
`;
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
expect(results).toHaveLength(2);
expect(results[0]).toEqual({
name: "iPhone 13 Pro",
listingLink: "https://www.kijiji.ca/v-iphone/k0l0"
});
expect(results[1]).toEqual({
name: "Samsung Galaxy",
listingLink: "https://www.kijiji.ca/v-samsung/k0l0"
});
});
test("should handle absolute URLs", () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "https://www.kijiji.ca/v-iphone/k0l0",
title: "iPhone 13 Pro",
}
}
}
}
})}
</script>
</html>
`;
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
expect(results[0].listingLink).toBe("https://www.kijiji.ca/v-iphone/k0l0");
});
test("should filter out invalid listings", () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone 13 Pro",
},
"Listing:456": {
url: "/v-samsung/k0l0",
// Missing title
},
"Other:789": {
url: "/v-other/k0l0",
title: "Other Item",
}
}
}
}
})}
</script>
</html>
`;
const results = parseSearch(mockHtml, "https://www.kijiji.ca");
expect(results).toHaveLength(1);
expect(results[0].name).toBe("iPhone 13 Pro");
});
test("should return empty array for invalid HTML", () => {
const results = parseSearch("<html><body>Invalid</body></html>", "https://www.kijiji.ca");
expect(results).toEqual([]);
});
});
describe("parseDetailedListing", () => {
test("should parse detailed listing with all fields", async () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone-13-pro/k0l0",
title: "iPhone 13 Pro 256GB",
description: "Excellent condition iPhone 13 Pro",
price: {
amount: 80000,
currency: "CAD",
type: "FIXED"
},
type: "OFFER",
status: "ACTIVE",
activationDate: "2024-01-15T10:00:00.000Z",
endDate: "2025-01-15T10:00:00.000Z",
metrics: { views: 150 },
location: {
address: "Toronto, ON",
id: 1700273,
name: "Toronto",
coordinates: {
latitude: 43.6532,
longitude: -79.3832
}
},
imageUrls: [
"https://media.kijiji.ca/api/v1/image1.jpg",
"https://media.kijiji.ca/api/v1/image2.jpg"
],
imageCount: 2,
categoryId: 132,
adSource: "ORGANIC",
flags: {
topAd: false,
priceDrop: true
},
posterInfo: {
posterId: "user123",
rating: 4.8
},
attributes: [
{ canonicalName: "forsaleby", canonicalValues: ["ownr"] },
{ canonicalName: "phonecarrier", canonicalValues: ["unlocked"] }
]
}
}
}
}
})}
</script>
</html>
`;
const result = await parseDetailedListing(mockHtml, "https://www.kijiji.ca");
expect(result).toEqual({
url: "https://www.kijiji.ca/v-iphone-13-pro/k0l0",
title: "iPhone 13 Pro 256GB",
description: "Excellent condition iPhone 13 Pro",
listingPrice: {
amountFormatted: "$800.00",
cents: 80000,
currency: "CAD"
},
listingType: "OFFER",
listingStatus: "ACTIVE",
creationDate: "2024-01-15T10:00:00.000Z",
endDate: "2025-01-15T10:00:00.000Z",
numberOfViews: 150,
address: "Toronto, ON",
images: [
"https://media.kijiji.ca/api/v1/image1.jpg",
"https://media.kijiji.ca/api/v1/image2.jpg"
],
categoryId: 132,
adSource: "ORGANIC",
flags: {
topAd: false,
priceDrop: true
},
attributes: {
forsaleby: ["ownr"],
phonecarrier: ["unlocked"]
},
location: {
id: 1700273,
name: "Toronto",
coordinates: {
latitude: 43.6532,
longitude: -79.3832
}
},
sellerInfo: {
posterId: "user123",
rating: 4.8
}
});
});
test("should return null for contact-based pricing", async () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone for Sale",
price: {
type: "CONTACT",
amount: null
}
}
}
}
}
})}
</script>
</html>
`;
const result = await parseDetailedListing(mockHtml, "https://www.kijiji.ca");
expect(result).toBeNull();
});
test("should handle missing optional fields", async () => {
const mockHtml = `
<html>
<script id="__NEXT_DATA__" type="application/json">
${JSON.stringify({
props: {
pageProps: {
__APOLLO_STATE__: {
"Listing:123": {
url: "/v-iphone/k0l0",
title: "iPhone 13",
price: { amount: 50000 }
}
}
}
}
})}
</script>
</html>
`;
const result = await parseDetailedListing(mockHtml, "https://www.kijiji.ca");
expect(result).toEqual({
url: "https://www.kijiji.ca/v-iphone/k0l0",
title: "iPhone 13",
description: undefined,
listingPrice: {
amountFormatted: "$500.00",
cents: 50000,
currency: undefined
},
listingType: undefined,
listingStatus: undefined,
creationDate: undefined,
endDate: undefined,
numberOfViews: undefined,
address: null,
images: [],
categoryId: 0,
adSource: "UNKNOWN",
flags: {
topAd: false,
priceDrop: false
},
attributes: {},
location: {
id: 0,
name: "Unknown",
coordinates: undefined
},
sellerInfo: undefined
});
});
});
});

54
test/kijiji-utils.test.ts Normal file
View File

@@ -0,0 +1,54 @@
import { describe, test, expect, beforeEach, afterEach } from "bun:test";
import { slugify, formatCentsToCurrency } from "../src/kijiji";
describe("Utility Functions", () => {
describe("slugify", () => {
test("should convert basic strings to slugs", () => {
expect(slugify("Hello World")).toBe("hello-world");
expect(slugify("iPhone 13 Pro")).toBe("iphone-13-pro");
});
test("should handle special characters", () => {
expect(slugify("Café & Restaurant")).toBe("cafe-restaurant");
expect(slugify("100% New")).toBe("100-new");
});
test("should handle empty and edge cases", () => {
expect(slugify("")).toBe("");
expect(slugify(" ")).toBe("-");
expect(slugify("---")).toBe("-");
});
test("should preserve numbers and valid characters", () => {
expect(slugify("iPhone 13")).toBe("iphone-13");
expect(slugify("item123")).toBe("item123");
});
});
describe("formatCentsToCurrency", () => {
test("should format valid cent values", () => {
expect(formatCentsToCurrency(100)).toBe("$1.00");
expect(formatCentsToCurrency(1999)).toBe("$19.99");
expect(formatCentsToCurrency(0)).toBe("$0.00");
});
test("should handle string inputs", () => {
expect(formatCentsToCurrency("100")).toBe("$1.00");
expect(formatCentsToCurrency("1999")).toBe("$19.99");
});
test("should handle null/undefined inputs", () => {
expect(formatCentsToCurrency(null)).toBe("");
expect(formatCentsToCurrency(undefined)).toBe("");
});
test("should handle invalid inputs", () => {
expect(formatCentsToCurrency("invalid")).toBe("");
expect(formatCentsToCurrency(Number.NaN)).toBe("");
});
test("should use en-US locale formatting", () => {
expect(formatCentsToCurrency(123456)).toBe("$1,234.56");
});
});
});

12
test/setup.ts Normal file
View File

@@ -0,0 +1,12 @@
// Test setup for Bun test runner
import { expect } from "bun:test";
// Global test setup
// This file is loaded before any tests run due to bunfig.toml preload
// Mock fetch globally for tests
global.fetch = global.fetch || (() => {
throw new Error('fetch is not available in test environment');
});
// Add any global test utilities here