528 lines
16 KiB
Markdown
528 lines
16 KiB
Markdown
# Kijiji API Findings
|
||
|
||
## Overview
|
||
|
||
Kijiji is a Canadian classifieds marketplace that uses a modern web application built
|
||
with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with
|
||
client-side state management.
|
||
|
||
## Initial Page Load (Homepage)
|
||
|
||
- **URL**: https://www.kijiji.ca/
|
||
- **Architecture**: Server-side rendered React application with Next.js
|
||
- **Data Sources**:
|
||
- Static assets loaded from `webapp-static.ca-kijiji-production.classifiedscloud.io`
|
||
- Image media served from `media.kijiji.ca/api/v1/`
|
||
- No initial API calls for listings - data appears to be embedded in HTML
|
||
|
||
## Search Results Page
|
||
|
||
- **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0`
|
||
- **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0`
|
||
- **Technology Stack**: Next.js with Apollo GraphQL client
|
||
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized
|
||
GraphQL cache
|
||
|
||
### GraphQL Data Structure
|
||
|
||
#### Data Location
|
||
|
||
Search results data is embedded in the Next.js page props under
|
||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server
|
||
and sent to the client.
|
||
Each page (including pagination) has its own pre-rendered data.
|
||
|
||
#### Search Results Container
|
||
|
||
The search results are stored directly in the Apollo ROOT_QUERY with keys following the
|
||
pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination
|
||
parameters.
|
||
|
||
```json
|
||
{
|
||
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0": { ... },
|
||
"searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0?page=2": { ... }
|
||
}
|
||
```
|
||
|
||
#### Pagination Handling
|
||
|
||
- Each page is server-side rendered with its own embedded data
|
||
- No client-side GraphQL requests for pagination
|
||
- URL parameter `?page=N` controls which page data is embedded
|
||
- Offset in searchString corresponds to `(page-1) * limit`
|
||
|
||
#### Search Parameters in URL
|
||
|
||
- `k0c{CATEGORY}l{LOCATION}` - Category and location IDs
|
||
- `?page=N` - Page number (1-based)
|
||
- Data contains `offset` and `limit` for API-style pagination
|
||
|
||
#### Individual Listing Structure
|
||
|
||
```json
|
||
{
|
||
"id": "1732061412",
|
||
"title": "iPhone 13",
|
||
"description": "iPhone 13, always had a screen protector on it...",
|
||
"imageCount": 3,
|
||
"imageUrls": ["https://media.kijiji.ca/api/v1/ca-prod-fsbo-ads/images/..."],
|
||
"categoryId": 760,
|
||
"url": "https://www.kijiji.ca/v-cell-phone/...",
|
||
"activationDate": "2026-01-21T16:51:16.000Z",
|
||
"sortingDate": "2026-01-21T16:51:16.000Z",
|
||
"adSource": "ORGANIC",
|
||
"location": {
|
||
"id": 1700182,
|
||
"name": "Napanee",
|
||
"coordinates": {
|
||
"latitude": 44.48774,
|
||
"longitude": -76.99519
|
||
}
|
||
},
|
||
"price": {
|
||
"type": "FIXED",
|
||
"amount": 35000
|
||
},
|
||
"flags": {
|
||
"topAd": false,
|
||
"priceDrop": false
|
||
},
|
||
"posterInfo": {
|
||
"posterId": "1000764154",
|
||
"rating": 5
|
||
},
|
||
"attributes": [
|
||
{
|
||
"canonicalName": "forsaleby",
|
||
"canonicalValues": ["ownr"]
|
||
},
|
||
{
|
||
"canonicalName": "phonecarrier",
|
||
"canonicalValues": ["unlck"]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### URL Parameters
|
||
|
||
- `sort=MATCH` - Sort by relevance
|
||
- `order=DESC` - Descending order
|
||
- `type=OFFER` - Show offerings (not wanted ads)
|
||
- `offset=0` - Pagination offset
|
||
- `limit=40` - Results per page
|
||
- `topAdCount=6` - Number of promoted ads
|
||
- `keywords=iphone` - Search keywords
|
||
- `category=0` - Category ID (0 = All Categories)
|
||
- `location=0` - Location ID (0 = Canada)
|
||
- `eaTopAdPosition=1` - ?
|
||
|
||
### Image API
|
||
|
||
- **Endpoint**: `https://media.kijiji.ca/api/v1/`
|
||
- **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg`
|
||
- **Sizes**: 200, 300, 400, 500 pixels
|
||
|
||
### Categories and Locations
|
||
|
||
#### Category Structure
|
||
|
||
Categories are hierarchical with parent-child relationships.
|
||
The main categories under “Buy & Sell” include:
|
||
|
||
| ID | Name | Total Results (iPhone search) |
|
||
| --- | --- | --- |
|
||
| 10 | Buy & Sell | 19956 |
|
||
| 12 | Arts & Collectibles | 149 |
|
||
| 767 | Audio | 481 |
|
||
| 253 | Baby Items | 13 |
|
||
| 931 | Bags & Luggage | 8 |
|
||
| 644 | Bikes | 46 |
|
||
| 109 | Books | 21 |
|
||
| 103 | Cameras & Camcorders | 101 |
|
||
| 104 | CDs, DVDs & Blu-ray | 102 |
|
||
| 274 | Clothing | 83 |
|
||
| 16 | Computers | 285 |
|
||
| 128 | Computer Accessories | 363 |
|
||
| 29659001 | Electronics | 2006 |
|
||
| 17220001 | Free Stuff | 23 |
|
||
| 235 | Furniture | 29 |
|
||
| 638 | Garage Sales | 5 |
|
||
| 140 | Health & Special Needs | 30 |
|
||
| 139 | Hobbies & Crafts | 10 |
|
||
| 107 | Home Appliances | 23 |
|
||
| 717 | Home - Indoor | 27 |
|
||
| 727 | Home Renovation Materials | 14 |
|
||
| 133 | Jewellery & Watches | 83 |
|
||
| 17 | Musical Instruments | 34 |
|
||
| 132 | Phones | 15518 |
|
||
| 111 | Sporting Goods & Exercise | 30 |
|
||
| 110 | Tools | 25 |
|
||
| 108 | Toys & Games | 38 |
|
||
| 15093001 | TVs & Video | 15 |
|
||
| 141 | Video Games & Consoles | 96 |
|
||
| 26 | Other | 286 |
|
||
|
||
#### Location Structure
|
||
|
||
Locations are also hierarchical, with provinces/states under the main “Canada” location:
|
||
|
||
| ID | Name | Total Results (iPhone search) |
|
||
| --- | --- | --- |
|
||
| 0 | Canada | - |
|
||
| 9001 | Québec | 2516 |
|
||
| 9002 | Nova Scotia | 875 |
|
||
| 9003 | Alberta | 2317 |
|
||
| 9004 | Ontario | 12507 |
|
||
| 9005 | New Brunswick | 118 |
|
||
| 9006 | Manitoba | 919 |
|
||
| 9007 | British Columbia | 306 |
|
||
| 9008 | Newfoundland | 27 |
|
||
| 9009 | Saskatchewan | 336 |
|
||
| 9010 | Territories | 7 |
|
||
| 9011 | Prince Edward Island | 31 |
|
||
|
||
#### URL Patterns
|
||
|
||
- Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0`
|
||
- Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}`
|
||
- Combined:
|
||
`/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
|
||
|
||
### Pagination
|
||
|
||
- Uses offset-based pagination
|
||
- 40 results per page
|
||
- Total count provided in pagination metadata
|
||
|
||
## Authentication & User Management
|
||
|
||
- **Authentication System**: OAuth2-based using CIS (Customer Identity Service)
|
||
- **Identity Provider**: `id.kijiji.ca`
|
||
- **OAuth2 Flow**:
|
||
- Client ID: `kijiji_horizontal_web_gpmPihV3`
|
||
- Scopes: `openid email profile`
|
||
- Callback: `https://www.kijiji.ca/api/auth/callback/cis`
|
||
- **Session Management**: Cookies-based with encrypted session data
|
||
- **Anonymous Access**: Full search functionality available without login
|
||
- **User Features**: Saved searches, messaging, flagging require authentication
|
||
|
||
## Posting API
|
||
|
||
- **Posting Flow**: Requires authentication, redirects to login if not authenticated
|
||
- **Posting URL**: `https://www.kijiji.ca/p-post-ad.html`
|
||
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated
|
||
users
|
||
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in
|
||
anonymous browsing)
|
||
|
||
## GraphQL API Endpoint
|
||
|
||
- **URL**: `https://www.kijiji.ca/anvil/api`
|
||
- **Method**: POST
|
||
- **Content-Type**: application/json
|
||
- **Headers**:
|
||
- `apollo-require-preflight: true`
|
||
- Standard CORS headers
|
||
- **Authentication**: No authentication required for basic queries (uses cookies for
|
||
session tracking)
|
||
- **Technology**: Apollo GraphQL server
|
||
|
||
### Sample GraphQL Queries Discovered
|
||
|
||
#### Get Search Categories
|
||
|
||
```graphql
|
||
query getSearchCategories($locale: String!) {
|
||
searchCategories {
|
||
id
|
||
localizedName(locale: $locale)
|
||
parentId
|
||
__typename
|
||
}
|
||
}
|
||
```
|
||
|
||
Variables: `{"locale": "en-CA"}`
|
||
|
||
Response includes hierarchical category structure with IDs and localized names.
|
||
|
||
#### Get Geocode from IP (fails for current IP)
|
||
|
||
```graphql
|
||
query GetGeocodeReverseFromIp {
|
||
geocodeReverseFromIp {
|
||
city
|
||
province
|
||
locationId
|
||
__typename
|
||
}
|
||
}
|
||
```
|
||
|
||
This query fails for the current IP address, suggesting geolocation-based features may
|
||
not work or require different IP ranges.
|
||
|
||
#### Get Category Path
|
||
|
||
```graphql
|
||
query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
|
||
category(id: $categoryId) {
|
||
id
|
||
localizedName(locale: $locale)
|
||
parentId
|
||
searchSeoUrl(locationId: $locationId)
|
||
categoryPaths {
|
||
id
|
||
localizedName(locale: $locale)
|
||
parentId
|
||
searchSeoUrl(locationId: $locationId)
|
||
__typename
|
||
}
|
||
__typename
|
||
}
|
||
}
|
||
```
|
||
|
||
Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}`
|
||
|
||
## Latest Findings (2026-01-21)
|
||
|
||
### Client-Side GraphQL Queries Observed
|
||
|
||
- **getSearchCategories**: Retrieves category hierarchy for search filters
|
||
- **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP)
|
||
|
||
### GraphQL Schema Insights
|
||
|
||
Testing direct GraphQL queries revealed:
|
||
- Field “searchResults” does not exist on Query type
|
||
- Suggested alternatives: “searchResultsPage” or “searchUrl”
|
||
- This suggests the search functionality may use different GraphQL operations than
|
||
direct queries
|
||
|
||
The embedded Apollo state approach appears to be the primary method for accessing search
|
||
data, with GraphQL used for auxiliary operations like categories and geolocation.
|
||
|
||
### Server-Side Rendering Architecture
|
||
|
||
Search results are fully server-side rendered with data embedded in HTML. Each page
|
||
(including pagination) contains its own pre-rendered data.
|
||
No client-side GraphQL requests are made for:
|
||
|
||
- Initial search results
|
||
- Pagination navigation
|
||
- Search result data
|
||
|
||
### Network Analysis Findings
|
||
|
||
- GraphQL endpoint: `https://www.kijiji.ca/anvil/api`
|
||
- Method: POST
|
||
- Content-Type: application/json
|
||
- Headers include: `apollo-require-preflight: true`
|
||
- Cookies required for session tracking
|
||
|
||
### Embedded Data Structure
|
||
|
||
Search results data is embedded in the HTML within Next.js
|
||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object.
|
||
The data includes:
|
||
|
||
- Individual ad listings with complete metadata
|
||
- Pagination information
|
||
- Filter options and counts
|
||
- Category/location hierarchies
|
||
|
||
### Current Scraper Implementation
|
||
|
||
The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state:
|
||
|
||
- Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML
|
||
- Filters Apollo keys containing “Listing” to find ad data
|
||
- Extracts `url`, `title`, and other metadata from each listing
|
||
- Successfully scrapes listings without needing API authentication
|
||
|
||
### Authentication Status
|
||
|
||
- **Search functionality**: No authentication required - all search and listing data
|
||
accessible anonymously
|
||
- **Posting functionality**: Requires authentication (redirects to login)
|
||
- **User features**: Saved searches, messaging require authentication
|
||
- **Rate limiting**: May apply but not observed in anonymous browsing
|
||
|
||
### Pagination Implementation
|
||
|
||
- Each page is a separate server-rendered route
|
||
- URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}`
|
||
- No client-side pagination API calls
|
||
- 40 results per page (observed)
|
||
- Example: `/b-canada/iphone/page-2/k0l0` for page 2 of iPhone search
|
||
|
||
## URL Pattern Analysis
|
||
|
||
### Search URL Structure
|
||
|
||
`https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}`
|
||
|
||
#### Examples Observed:
|
||
|
||
- All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada)
|
||
- Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones)
|
||
- With pagination: `/b-canada/iphone/page-2/k0l0`
|
||
|
||
#### URL Components:
|
||
|
||
- `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
|
||
- `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.)
|
||
- `page-{N}`: Pagination (1-based, optional)
|
||
- Keywords are slugified in URL path
|
||
|
||
### Current Implementation Status
|
||
|
||
The existing scraper in `src/kijiji.ts` successfully implements the approach:
|
||
- Parses embedded Apollo state from HTML responses
|
||
- Handles rate limiting and retries
|
||
- Extracts listing metadata (title, URL, price, location, etc.)
|
||
- Works without authentication for search operations
|
||
|
||
## Listing Details Page
|
||
|
||
### Overview
|
||
|
||
Similar to search results, listing details pages use server-side rendering with embedded
|
||
Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing
|
||
data - all information is pre-rendered on the server.
|
||
|
||
### Data Architecture
|
||
|
||
- **Server-Side Rendering**: Each listing page is fully server-rendered with data
|
||
embedded in HTML
|
||
- **Embedded Apollo State**: Listing data is stored in
|
||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
|
||
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings,
|
||
user profiles) fetched via GraphQL API
|
||
|
||
### Listing Data Structure
|
||
|
||
The main listing data follows the same pattern as search results:
|
||
|
||
```json
|
||
{
|
||
"id": "1705585530",
|
||
"title": "We Pay top cash for iPhone 17 pro max, iPhone 17 pro, iPhone Air",
|
||
"description": "Buying All Brand new Apple iPhones sealed/Unsealed...",
|
||
"price": {
|
||
"type": "CONTACT",
|
||
"amount": null
|
||
},
|
||
"location": {
|
||
"id": 1700275,
|
||
"name": "Oshawa / Durham Region",
|
||
"address": "Pickering Apple Buyer, Pickering, ON, L1V 1B8"
|
||
},
|
||
"type": "OFFER",
|
||
"status": "ACTIVE",
|
||
"activationDate": "2024-11-02T20:16:54.000Z",
|
||
"endDate": "3000-01-01T00:00:00.000Z",
|
||
"metrics": {
|
||
"views": 1720
|
||
},
|
||
"posterInfo": {
|
||
"posterId": "1044934581",
|
||
"rating": null
|
||
},
|
||
"attributes": [
|
||
{
|
||
"canonicalName": "forsaleby",
|
||
"canonicalValues": ["business"]
|
||
},
|
||
{
|
||
"canonicalName": "phonecarrier",
|
||
"canonicalValues": ["unlocked"]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### Client-Side GraphQL Queries
|
||
|
||
When loading a listing details page, the following GraphQL queries are executed:
|
||
|
||
#### 1. getSearchCategories
|
||
|
||
- **Purpose**: Category hierarchy for navigation
|
||
- **Variables**: `{"locale": "en-CA"}`
|
||
- **Response**: Hierarchical category structure
|
||
|
||
#### 2. getCampaignsForVip
|
||
|
||
- **Purpose**: Advertisement targeting data
|
||
- **Variables**:
|
||
`{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
|
||
- **Response**: Campaign/ads data (usually null)
|
||
|
||
#### 3. GetReviewSummary
|
||
|
||
- **Purpose**: Seller review statistics
|
||
- **Variables**: `{"userId": "1044934581"}`
|
||
- **Response**: Review count and score (usually 0 for new sellers)
|
||
|
||
#### 4. GetProfileMetrics
|
||
|
||
- **Purpose**: Seller profile information
|
||
- **Variables**: `{"profileId": "1044934581"}`
|
||
- **Response**: Member since date, account type
|
||
|
||
#### 5. GetListingsSimilar
|
||
|
||
- **Purpose**: Similar listings for cross-selling
|
||
- **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}`
|
||
- **Response**: Array of similar listings with basic metadata
|
||
|
||
#### 6. GetGeocodeReverseFromIp
|
||
|
||
- **Purpose**: Geolocation-based features
|
||
- **Variables**: `{}`
|
||
- **Response**: Fails with 404 for most IPs
|
||
|
||
### Implementation Status
|
||
|
||
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing
|
||
details from embedded Apollo state:
|
||
|
||
- ✅ Extracts title, description, price, location
|
||
- ✅ Handles contact-based pricing ("Please Contact")
|
||
- ✅ Parses creation date, view count, listing status
|
||
- ✅ Extracts seller information and address
|
||
- ✅ Works without authentication or API keys
|
||
|
||
### Key Findings
|
||
|
||
1. **No Dedicated Listing API**: Unlike search results, there’s no separate GraphQL
|
||
query for individual listing data
|
||
2. **Complete Data Available**: All listing information is embedded in the initial HTML
|
||
response
|
||
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data
|
||
(reviews, similar listings)
|
||
4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages
|
||
|
||
### Current Scraper Implementation
|
||
|
||
The scraper successfully extracts listing details by:
|
||
1. Fetching the listing URL HTML
|
||
2. Parsing embedded `__NEXT_DATA__` Apollo state
|
||
3. Extracting the `Listing:{id}` object from Apollo cache
|
||
4. Mapping fields to typed `ListingDetails` interface
|
||
|
||
This approach works reliably without requiring authentication or dealing with rate
|
||
limiting on individual listing fetches.
|
||
|
||
## Next Steps
|
||
|
||
- Explore posting/authentication APIs (requires user login)
|
||
- Investigate if GraphQL API can be used for programmatic access with proper
|
||
authentication
|
||
- Test rate limiting patterns and optimal scraping strategies
|
||
- Document additional category and location ID mappings
|