chore: format markdown
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
This commit is contained in:
145
KIJIJI.md
145
KIJIJI.md
@@ -1,9 +1,13 @@
|
||||
# Kijiji API Findings
|
||||
|
||||
## Overview
|
||||
Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management.
|
||||
|
||||
Kijiji is a Canadian classifieds marketplace that uses a modern web application built
|
||||
with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with
|
||||
client-side state management.
|
||||
|
||||
## Initial Page Load (Homepage)
|
||||
|
||||
- **URL**: https://www.kijiji.ca/
|
||||
- **Architecture**: Server-side rendered React application with Next.js
|
||||
- **Data Sources**:
|
||||
@@ -12,18 +16,27 @@ Kijiji is a Canadian classifieds marketplace that uses a modern web application
|
||||
- No initial API calls for listings - data appears to be embedded in HTML
|
||||
|
||||
## Search Results Page
|
||||
|
||||
- **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0`
|
||||
- **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0`
|
||||
- **Technology Stack**: Next.js with Apollo GraphQL client
|
||||
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized GraphQL cache
|
||||
- **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized
|
||||
GraphQL cache
|
||||
|
||||
### GraphQL Data Structure
|
||||
|
||||
#### Data Location
|
||||
Search results data is embedded in the Next.js page props under `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data.
|
||||
|
||||
Search results data is embedded in the Next.js page props under
|
||||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server
|
||||
and sent to the client.
|
||||
Each page (including pagination) has its own pre-rendered data.
|
||||
|
||||
#### Search Results Container
|
||||
The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination parameters.
|
||||
|
||||
The search results are stored directly in the Apollo ROOT_QUERY with keys following the
|
||||
pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination
|
||||
parameters.
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -33,17 +46,20 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
```
|
||||
|
||||
#### Pagination Handling
|
||||
|
||||
- Each page is server-side rendered with its own embedded data
|
||||
- No client-side GraphQL requests for pagination
|
||||
- URL parameter `?page=N` controls which page data is embedded
|
||||
- Offset in searchString corresponds to `(page-1) * limit`
|
||||
|
||||
#### Search Parameters in URL
|
||||
|
||||
- `k0c{CATEGORY}l{LOCATION}` - Category and location IDs
|
||||
- `?page=N` - Page number (1-based)
|
||||
- Data contains `offset` and `limit` for API-style pagination
|
||||
|
||||
#### Individual Listing Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "1732061412",
|
||||
@@ -90,6 +106,7 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
```
|
||||
|
||||
### URL Parameters
|
||||
|
||||
- `sort=MATCH` - Sort by relevance
|
||||
- `order=DESC` - Descending order
|
||||
- `type=OFFER` - Show offerings (not wanted ads)
|
||||
@@ -102,6 +119,7 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
- `eaTopAdPosition=1` - ?
|
||||
|
||||
### Image API
|
||||
|
||||
- **Endpoint**: `https://media.kijiji.ca/api/v1/`
|
||||
- **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg`
|
||||
- **Sizes**: 200, 300, 400, 500 pixels
|
||||
@@ -109,10 +127,12 @@ The search results are stored directly in the Apollo ROOT_QUERY with keys follow
|
||||
### Categories and Locations
|
||||
|
||||
#### Category Structure
|
||||
Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include:
|
||||
|
||||
Categories are hierarchical with parent-child relationships.
|
||||
The main categories under “Buy & Sell” include:
|
||||
|
||||
| ID | Name | Total Results (iPhone search) |
|
||||
|----|------|------------------------------|
|
||||
| --- | --- | --- |
|
||||
| 10 | Buy & Sell | 19956 |
|
||||
| 12 | Arts & Collectibles | 149 |
|
||||
| 767 | Audio | 481 |
|
||||
@@ -145,10 +165,11 @@ Categories are hierarchical with parent-child relationships. The main categories
|
||||
| 26 | Other | 286 |
|
||||
|
||||
#### Location Structure
|
||||
Locations are also hierarchical, with provinces/states under the main "Canada" location:
|
||||
|
||||
Locations are also hierarchical, with provinces/states under the main “Canada” location:
|
||||
|
||||
| ID | Name | Total Results (iPhone search) |
|
||||
|----|------|------------------------------|
|
||||
| --- | --- | --- |
|
||||
| 0 | Canada | - |
|
||||
| 9001 | Québec | 2516 |
|
||||
| 9002 | Nova Scotia | 875 |
|
||||
@@ -163,16 +184,20 @@ Locations are also hierarchical, with provinces/states under the main "Canada" l
|
||||
| 9011 | Prince Edward Island | 31 |
|
||||
|
||||
#### URL Patterns
|
||||
|
||||
- Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0`
|
||||
- Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}`
|
||||
- Combined: `/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
|
||||
- Combined:
|
||||
`/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}`
|
||||
|
||||
### Pagination
|
||||
|
||||
- Uses offset-based pagination
|
||||
- 40 results per page
|
||||
- Total count provided in pagination metadata
|
||||
|
||||
## Authentication & User Management
|
||||
|
||||
- **Authentication System**: OAuth2-based using CIS (Customer Identity Service)
|
||||
- **Identity Provider**: `id.kijiji.ca`
|
||||
- **OAuth2 Flow**:
|
||||
@@ -184,24 +209,30 @@ Locations are also hierarchical, with provinces/states under the main "Canada" l
|
||||
- **User Features**: Saved searches, messaging, flagging require authentication
|
||||
|
||||
## Posting API
|
||||
|
||||
- **Posting Flow**: Requires authentication, redirects to login if not authenticated
|
||||
- **Posting URL**: `https://www.kijiji.ca/p-post-ad.html`
|
||||
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated users
|
||||
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing)
|
||||
- **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated
|
||||
users
|
||||
- **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in
|
||||
anonymous browsing)
|
||||
|
||||
## GraphQL API Endpoint
|
||||
|
||||
- **URL**: `https://www.kijiji.ca/anvil/api`
|
||||
- **Method**: POST
|
||||
- **Content-Type**: application/json
|
||||
- **Headers**:
|
||||
- `apollo-require-preflight: true`
|
||||
- Standard CORS headers
|
||||
- **Authentication**: No authentication required for basic queries (uses cookies for session tracking)
|
||||
- **Authentication**: No authentication required for basic queries (uses cookies for
|
||||
session tracking)
|
||||
- **Technology**: Apollo GraphQL server
|
||||
|
||||
### Sample GraphQL Queries Discovered
|
||||
|
||||
#### Get Search Categories
|
||||
|
||||
```graphql
|
||||
query getSearchCategories($locale: String!) {
|
||||
searchCategories {
|
||||
@@ -218,6 +249,7 @@ Variables: `{"locale": "en-CA"}`
|
||||
Response includes hierarchical category structure with IDs and localized names.
|
||||
|
||||
#### Get Geocode from IP (fails for current IP)
|
||||
|
||||
```graphql
|
||||
query GetGeocodeReverseFromIp {
|
||||
geocodeReverseFromIp {
|
||||
@@ -229,9 +261,11 @@ query GetGeocodeReverseFromIp {
|
||||
}
|
||||
```
|
||||
|
||||
This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges.
|
||||
This query fails for the current IP address, suggesting geolocation-based features may
|
||||
not work or require different IP ranges.
|
||||
|
||||
#### Get Category Path
|
||||
|
||||
```graphql
|
||||
query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
|
||||
category(id: $categoryId) {
|
||||
@@ -256,25 +290,33 @@ Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}`
|
||||
## Latest Findings (2026-01-21)
|
||||
|
||||
### Client-Side GraphQL Queries Observed
|
||||
|
||||
- **getSearchCategories**: Retrieves category hierarchy for search filters
|
||||
- **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP)
|
||||
|
||||
### GraphQL Schema Insights
|
||||
Testing direct GraphQL queries revealed:
|
||||
- Field "searchResults" does not exist on Query type
|
||||
- Suggested alternatives: "searchResultsPage" or "searchUrl"
|
||||
- This suggests the search functionality may use different GraphQL operations than direct queries
|
||||
|
||||
The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation.
|
||||
Testing direct GraphQL queries revealed:
|
||||
- Field “searchResults” does not exist on Query type
|
||||
- Suggested alternatives: “searchResultsPage” or “searchUrl”
|
||||
- This suggests the search functionality may use different GraphQL operations than
|
||||
direct queries
|
||||
|
||||
The embedded Apollo state approach appears to be the primary method for accessing search
|
||||
data, with GraphQL used for auxiliary operations like categories and geolocation.
|
||||
|
||||
### Server-Side Rendering Architecture
|
||||
Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for:
|
||||
|
||||
Search results are fully server-side rendered with data embedded in HTML. Each page
|
||||
(including pagination) contains its own pre-rendered data.
|
||||
No client-side GraphQL requests are made for:
|
||||
|
||||
- Initial search results
|
||||
- Pagination navigation
|
||||
- Search result data
|
||||
|
||||
### Network Analysis Findings
|
||||
|
||||
- GraphQL endpoint: `https://www.kijiji.ca/anvil/api`
|
||||
- Method: POST
|
||||
- Content-Type: application/json
|
||||
@@ -282,7 +324,10 @@ Search results are fully server-side rendered with data embedded in HTML. Each p
|
||||
- Cookies required for session tracking
|
||||
|
||||
### Embedded Data Structure
|
||||
Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object. The data includes:
|
||||
|
||||
Search results data is embedded in the HTML within Next.js
|
||||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object.
|
||||
The data includes:
|
||||
|
||||
- Individual ad listings with complete metadata
|
||||
- Pagination information
|
||||
@@ -290,20 +335,24 @@ Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.
|
||||
- Category/location hierarchies
|
||||
|
||||
### Current Scraper Implementation
|
||||
|
||||
The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state:
|
||||
|
||||
- Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML
|
||||
- Filters Apollo keys containing "Listing" to find ad data
|
||||
- Filters Apollo keys containing “Listing” to find ad data
|
||||
- Extracts `url`, `title`, and other metadata from each listing
|
||||
- Successfully scrapes listings without needing API authentication
|
||||
|
||||
### Authentication Status
|
||||
- **Search functionality**: No authentication required - all search and listing data accessible anonymously
|
||||
|
||||
- **Search functionality**: No authentication required - all search and listing data
|
||||
accessible anonymously
|
||||
- **Posting functionality**: Requires authentication (redirects to login)
|
||||
- **User features**: Saved searches, messaging require authentication
|
||||
- **Rate limiting**: May apply but not observed in anonymous browsing
|
||||
|
||||
### Pagination Implementation
|
||||
|
||||
- Each page is a separate server-rendered route
|
||||
- URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}`
|
||||
- No client-side pagination API calls
|
||||
@@ -313,20 +362,24 @@ The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo
|
||||
## URL Pattern Analysis
|
||||
|
||||
### Search URL Structure
|
||||
|
||||
`https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}`
|
||||
|
||||
#### Examples Observed:
|
||||
|
||||
- All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada)
|
||||
- Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones)
|
||||
- With pagination: `/b-canada/iphone/page-2/k0l0`
|
||||
|
||||
#### URL Components:
|
||||
|
||||
- `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
|
||||
- `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.)
|
||||
- `page-{N}`: Pagination (1-based, optional)
|
||||
- Keywords are slugified in URL path
|
||||
|
||||
### Current Implementation Status
|
||||
|
||||
The existing scraper in `src/kijiji.ts` successfully implements the approach:
|
||||
- Parses embedded Apollo state from HTML responses
|
||||
- Handles rate limiting and retries
|
||||
@@ -336,14 +389,22 @@ The existing scraper in `src/kijiji.ts` successfully implements the approach:
|
||||
## Listing Details Page
|
||||
|
||||
### Overview
|
||||
Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server.
|
||||
|
||||
Similar to search results, listing details pages use server-side rendering with embedded
|
||||
Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing
|
||||
data - all information is pre-rendered on the server.
|
||||
|
||||
### Data Architecture
|
||||
- **Server-Side Rendering**: Each listing page is fully server-rendered with data embedded in HTML
|
||||
- **Embedded Apollo State**: Listing data is stored in `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
|
||||
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API
|
||||
|
||||
- **Server-Side Rendering**: Each listing page is fully server-rendered with data
|
||||
embedded in HTML
|
||||
- **Embedded Apollo State**: Listing data is stored in
|
||||
`__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`
|
||||
- **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings,
|
||||
user profiles) fetched via GraphQL API
|
||||
|
||||
### Listing Data Structure
|
||||
|
||||
The main listing data follows the same pattern as search results:
|
||||
|
||||
```json
|
||||
@@ -385,40 +446,50 @@ The main listing data follows the same pattern as search results:
|
||||
```
|
||||
|
||||
### Client-Side GraphQL Queries
|
||||
|
||||
When loading a listing details page, the following GraphQL queries are executed:
|
||||
|
||||
#### 1. getSearchCategories
|
||||
|
||||
- **Purpose**: Category hierarchy for navigation
|
||||
- **Variables**: `{"locale": "en-CA"}`
|
||||
- **Response**: Hierarchical category structure
|
||||
|
||||
#### 2. getCampaignsForVip
|
||||
|
||||
- **Purpose**: Advertisement targeting data
|
||||
- **Variables**: `{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
|
||||
- **Variables**:
|
||||
`{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}`
|
||||
- **Response**: Campaign/ads data (usually null)
|
||||
|
||||
#### 3. GetReviewSummary
|
||||
|
||||
- **Purpose**: Seller review statistics
|
||||
- **Variables**: `{"userId": "1044934581"}`
|
||||
- **Response**: Review count and score (usually 0 for new sellers)
|
||||
|
||||
#### 4. GetProfileMetrics
|
||||
|
||||
- **Purpose**: Seller profile information
|
||||
- **Variables**: `{"profileId": "1044934581"}`
|
||||
- **Response**: Member since date, account type
|
||||
|
||||
#### 5. GetListingsSimilar
|
||||
|
||||
- **Purpose**: Similar listings for cross-selling
|
||||
- **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}`
|
||||
- **Response**: Array of similar listings with basic metadata
|
||||
|
||||
#### 6. GetGeocodeReverseFromIp
|
||||
|
||||
- **Purpose**: Geolocation-based features
|
||||
- **Variables**: `{}`
|
||||
- **Response**: Fails with 404 for most IPs
|
||||
|
||||
### Implementation Status
|
||||
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing details from embedded Apollo state:
|
||||
|
||||
The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing
|
||||
details from embedded Apollo state:
|
||||
|
||||
- ✅ Extracts title, description, price, location
|
||||
- ✅ Handles contact-based pricing ("Please Contact")
|
||||
@@ -427,22 +498,30 @@ The existing `parseListing()` function in `src/kijiji.ts` successfully extracts
|
||||
- ✅ Works without authentication or API keys
|
||||
|
||||
### Key Findings
|
||||
1. **No Dedicated Listing API**: Unlike search results, there's no separate GraphQL query for individual listing data
|
||||
2. **Complete Data Available**: All listing information is embedded in the initial HTML response
|
||||
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data (reviews, similar listings)
|
||||
|
||||
1. **No Dedicated Listing API**: Unlike search results, there’s no separate GraphQL
|
||||
query for individual listing data
|
||||
2. **Complete Data Available**: All listing information is embedded in the initial HTML
|
||||
response
|
||||
3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data
|
||||
(reviews, similar listings)
|
||||
4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages
|
||||
|
||||
### Current Scraper Implementation
|
||||
|
||||
The scraper successfully extracts listing details by:
|
||||
1. Fetching the listing URL HTML
|
||||
2. Parsing embedded `__NEXT_DATA__` Apollo state
|
||||
3. Extracting the `Listing:{id}` object from Apollo cache
|
||||
4. Mapping fields to typed `ListingDetails` interface
|
||||
|
||||
This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches.
|
||||
This approach works reliably without requiring authentication or dealing with rate
|
||||
limiting on individual listing fetches.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Explore posting/authentication APIs (requires user login)
|
||||
- Investigate if GraphQL API can be used for programmatic access with proper authentication
|
||||
- Investigate if GraphQL API can be used for programmatic access with proper
|
||||
authentication
|
||||
- Test rate limiting patterns and optimal scraping strategies
|
||||
- Document additional category and location ID mappings
|
||||
|
||||
Reference in New Issue
Block a user