# Kijiji API Findings ## Overview Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management. ## Initial Page Load (Homepage) - **URL**: https://www.kijiji.ca/ - **Architecture**: Server-side rendered React application with Next.js - **Data Sources**: - Static assets loaded from `webapp-static.ca-kijiji-production.classifiedscloud.io` - Image media served from `media.kijiji.ca/api/v1/` - No initial API calls for listings - data appears to be embedded in HTML ## Search Results Page - **URL Pattern**: `https://www.kijiji.ca/b-[location]/[keywords]/k0l0` - **Example**: `https://www.kijiji.ca/b-canada/iphone/k0l0` - **Technology Stack**: Next.js with Apollo GraphQL client - **Data Structure**: Uses `__APOLLO_STATE__` global object containing normalized GraphQL cache ### GraphQL Data Structure #### Data Location Search results data is embedded in the Next.js page props under `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__`. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data. #### Search Results Container The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern `searchResultsPageByUrl:{url_path}` where `url_path` includes pagination parameters. ```json { "searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0": { ... }, "searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0?page=2": { ... } } ``` #### Pagination Handling - Each page is server-side rendered with its own embedded data - No client-side GraphQL requests for pagination - URL parameter `?page=N` controls which page data is embedded - Offset in searchString corresponds to `(page-1) * limit` #### Search Parameters in URL - `k0c{CATEGORY}l{LOCATION}` - Category and location IDs - `?page=N` - Page number (1-based) - Data contains `offset` and `limit` for API-style pagination #### Individual Listing Structure ```json { "id": "1732061412", "title": "iPhone 13", "description": "iPhone 13, always had a screen protector on it...", "imageCount": 3, "imageUrls": ["https://media.kijiji.ca/api/v1/ca-prod-fsbo-ads/images/..."], "categoryId": 760, "url": "https://www.kijiji.ca/v-cell-phone/...", "activationDate": "2026-01-21T16:51:16.000Z", "sortingDate": "2026-01-21T16:51:16.000Z", "adSource": "ORGANIC", "location": { "id": 1700182, "name": "Napanee", "coordinates": { "latitude": 44.48774, "longitude": -76.99519 } }, "price": { "type": "FIXED", "amount": 35000 }, "flags": { "topAd": false, "priceDrop": false }, "posterInfo": { "posterId": "1000764154", "rating": 5 }, "attributes": [ { "canonicalName": "forsaleby", "canonicalValues": ["ownr"] }, { "canonicalName": "phonecarrier", "canonicalValues": ["unlck"] } ] } ``` ### URL Parameters - `sort=MATCH` - Sort by relevance - `order=DESC` - Descending order - `type=OFFER` - Show offerings (not wanted ads) - `offset=0` - Pagination offset - `limit=40` - Results per page - `topAdCount=6` - Number of promoted ads - `keywords=iphone` - Search keywords - `category=0` - Category ID (0 = All Categories) - `location=0` - Location ID (0 = Canada) - `eaTopAdPosition=1` - ? ### Image API - **Endpoint**: `https://media.kijiji.ca/api/v1/` - **Pattern**: `/ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg` - **Sizes**: 200, 300, 400, 500 pixels ### Categories and Locations #### Category Structure Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include: | ID | Name | Total Results (iPhone search) | |----|------|------------------------------| | 10 | Buy & Sell | 19956 | | 12 | Arts & Collectibles | 149 | | 767 | Audio | 481 | | 253 | Baby Items | 13 | | 931 | Bags & Luggage | 8 | | 644 | Bikes | 46 | | 109 | Books | 21 | | 103 | Cameras & Camcorders | 101 | | 104 | CDs, DVDs & Blu-ray | 102 | | 274 | Clothing | 83 | | 16 | Computers | 285 | | 128 | Computer Accessories | 363 | | 29659001 | Electronics | 2006 | | 17220001 | Free Stuff | 23 | | 235 | Furniture | 29 | | 638 | Garage Sales | 5 | | 140 | Health & Special Needs | 30 | | 139 | Hobbies & Crafts | 10 | | 107 | Home Appliances | 23 | | 717 | Home - Indoor | 27 | | 727 | Home Renovation Materials | 14 | | 133 | Jewellery & Watches | 83 | | 17 | Musical Instruments | 34 | | 132 | Phones | 15518 | | 111 | Sporting Goods & Exercise | 30 | | 110 | Tools | 25 | | 108 | Toys & Games | 38 | | 15093001 | TVs & Video | 15 | | 141 | Video Games & Consoles | 96 | | 26 | Other | 286 | #### Location Structure Locations are also hierarchical, with provinces/states under the main "Canada" location: | ID | Name | Total Results (iPhone search) | |----|------|------------------------------| | 0 | Canada | - | | 9001 | Québec | 2516 | | 9002 | Nova Scotia | 875 | | 9003 | Alberta | 2317 | | 9004 | Ontario | 12507 | | 9005 | New Brunswick | 118 | | 9006 | Manitoba | 919 | | 9007 | British Columbia | 306 | | 9008 | Newfoundland | 27 | | 9009 | Saskatchewan | 336 | | 9010 | Territories | 7 | | 9011 | Prince Edward Island | 31 | #### URL Patterns - Categories: `/b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0` - Locations: `/b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}` - Combined: `/b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}` ### Pagination - Uses offset-based pagination - 40 results per page - Total count provided in pagination metadata ## Authentication & User Management - **Authentication System**: OAuth2-based using CIS (Customer Identity Service) - **Identity Provider**: `id.kijiji.ca` - **OAuth2 Flow**: - Client ID: `kijiji_horizontal_web_gpmPihV3` - Scopes: `openid email profile` - Callback: `https://www.kijiji.ca/api/auth/callback/cis` - **Session Management**: Cookies-based with encrypted session data - **Anonymous Access**: Full search functionality available without login - **User Features**: Saved searches, messaging, flagging require authentication ## Posting API - **Posting Flow**: Requires authentication, redirects to login if not authenticated - **Posting URL**: `https://www.kijiji.ca/p-post-ad.html` - **Authentication Required**: Yes, redirects to `/consumer/login` for unauthenticated users - **Post-Creation**: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing) ## GraphQL API Endpoint - **URL**: `https://www.kijiji.ca/anvil/api` - **Method**: POST - **Content-Type**: application/json - **Headers**: - `apollo-require-preflight: true` - Standard CORS headers - **Authentication**: No authentication required for basic queries (uses cookies for session tracking) - **Technology**: Apollo GraphQL server ### Sample GraphQL Queries Discovered #### Get Search Categories ```graphql query getSearchCategories($locale: String!) { searchCategories { id localizedName(locale: $locale) parentId __typename } } ``` Variables: `{"locale": "en-CA"}` Response includes hierarchical category structure with IDs and localized names. #### Get Geocode from IP (fails for current IP) ```graphql query GetGeocodeReverseFromIp { geocodeReverseFromIp { city province locationId __typename } } ``` This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges. #### Get Category Path ```graphql query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) { category(id: $categoryId) { id localizedName(locale: $locale) parentId searchSeoUrl(locationId: $locationId) categoryPaths { id localizedName(locale: $locale) parentId searchSeoUrl(locationId: $locationId) __typename } __typename } } ``` Variables: `{"categoryId": 10, "locationId": 0, "locale": "en-CA"}` ## Latest Findings (2026-01-21) ### Client-Side GraphQL Queries Observed - **getSearchCategories**: Retrieves category hierarchy for search filters - **GetGeocodeReverseFromIp**: Attempts to geolocate user (fails for current IP) ### GraphQL Schema Insights Testing direct GraphQL queries revealed: - Field "searchResults" does not exist on Query type - Suggested alternatives: "searchResultsPage" or "searchUrl" - This suggests the search functionality may use different GraphQL operations than direct queries The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation. ### Server-Side Rendering Architecture Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for: - Initial search results - Pagination navigation - Search result data ### Network Analysis Findings - GraphQL endpoint: `https://www.kijiji.ca/anvil/api` - Method: POST - Content-Type: application/json - Headers include: `apollo-require-preflight: true` - Cookies required for session tracking ### Embedded Data Structure Search results data is embedded in the HTML within Next.js `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` object. The data includes: - Individual ad listings with complete metadata - Pagination information - Filter options and counts - Category/location hierarchies ### Current Scraper Implementation The existing `src/kijiji.ts` implementation correctly parses the embedded Apollo state: - Uses `extractApolloState()` to parse `__NEXT_DATA__` from HTML - Filters Apollo keys containing "Listing" to find ad data - Extracts `url`, `title`, and other metadata from each listing - Successfully scrapes listings without needing API authentication ### Authentication Status - **Search functionality**: No authentication required - all search and listing data accessible anonymously - **Posting functionality**: Requires authentication (redirects to login) - **User features**: Saved searches, messaging require authentication - **Rate limiting**: May apply but not observed in anonymous browsing ### Pagination Implementation - Each page is a separate server-rendered route - URL pattern: `/b-{location}/{keywords}/page-{number}/k0{category}l{location_id}` - No client-side pagination API calls - 40 results per page (observed) - Example: `/b-canada/iphone/page-2/k0l0` for page 2 of iPhone search ## URL Pattern Analysis ### Search URL Structure `https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}` #### Examples Observed: - All categories, Canada: `/b-canada/iphone/k0l0` (c0 = All Categories, l0 = Canada) - Cell phones category: `/b-cell-phones/canada/iphone/k0c132l0` (c132 = Cell Phones) - With pagination: `/b-canada/iphone/page-2/k0l0` #### URL Components: - `c{CATEGORY_ID}`: Category ID (0 = All Categories, 132 = Cell Phones, etc.) - `l{LOCATION_ID}`: Location ID (0 = Canada, 1700272 = GTA, etc.) - `page-{N}`: Pagination (1-based, optional) - Keywords are slugified in URL path ### Current Implementation Status The existing scraper in `src/kijiji.ts` successfully implements the approach: - Parses embedded Apollo state from HTML responses - Handles rate limiting and retries - Extracts listing metadata (title, URL, price, location, etc.) - Works without authentication for search operations ## Listing Details Page ### Overview Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server. ### Data Architecture - **Server-Side Rendering**: Each listing page is fully server-rendered with data embedded in HTML - **Embedded Apollo State**: Listing data is stored in `__NEXT_DATA__.props.pageProps.__APOLLO_STATE__` - **Client-Side GraphQL**: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API ### Listing Data Structure The main listing data follows the same pattern as search results: ```json { "id": "1705585530", "title": "We Pay top cash for iPhone 17 pro max, iPhone 17 pro, iPhone Air", "description": "Buying All Brand new Apple iPhones sealed/Unsealed...", "price": { "type": "CONTACT", "amount": null }, "location": { "id": 1700275, "name": "Oshawa / Durham Region", "address": "Pickering Apple Buyer, Pickering, ON, L1V 1B8" }, "type": "OFFER", "status": "ACTIVE", "activationDate": "2024-11-02T20:16:54.000Z", "endDate": "3000-01-01T00:00:00.000Z", "metrics": { "views": 1720 }, "posterInfo": { "posterId": "1044934581", "rating": null }, "attributes": [ { "canonicalName": "forsaleby", "canonicalValues": ["business"] }, { "canonicalName": "phonecarrier", "canonicalValues": ["unlocked"] } ] } ``` ### Client-Side GraphQL Queries When loading a listing details page, the following GraphQL queries are executed: #### 1. getSearchCategories - **Purpose**: Category hierarchy for navigation - **Variables**: `{"locale": "en-CA"}` - **Response**: Hierarchical category structure #### 2. getCampaignsForVip - **Purpose**: Advertisement targeting data - **Variables**: `{"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}` - **Response**: Campaign/ads data (usually null) #### 3. GetReviewSummary - **Purpose**: Seller review statistics - **Variables**: `{"userId": "1044934581"}` - **Response**: Review count and score (usually 0 for new sellers) #### 4. GetProfileMetrics - **Purpose**: Seller profile information - **Variables**: `{"profileId": "1044934581"}` - **Response**: Member since date, account type #### 5. GetListingsSimilar - **Purpose**: Similar listings for cross-selling - **Variables**: `{"listingId": "1705585530", "limit": 10, "isExternalId": false}` - **Response**: Array of similar listings with basic metadata #### 6. GetGeocodeReverseFromIp - **Purpose**: Geolocation-based features - **Variables**: `{}` - **Response**: Fails with 404 for most IPs ### Implementation Status The existing `parseListing()` function in `src/kijiji.ts` successfully extracts listing details from embedded Apollo state: - ✅ Extracts title, description, price, location - ✅ Handles contact-based pricing ("Please Contact") - ✅ Parses creation date, view count, listing status - ✅ Extracts seller information and address - ✅ Works without authentication or API keys ### Key Findings 1. **No Dedicated Listing API**: Unlike search results, there's no separate GraphQL query for individual listing data 2. **Complete Data Available**: All listing information is embedded in the initial HTML response 3. **Additional Context Fetched**: Secondary GraphQL queries provide complementary data (reviews, similar listings) 4. **Consistent Architecture**: Same Apollo state embedding pattern as search pages ### Current Scraper Implementation The scraper successfully extracts listing details by: 1. Fetching the listing URL HTML 2. Parsing embedded `__NEXT_DATA__` Apollo state 3. Extracting the `Listing:{id}` object from Apollo cache 4. Mapping fields to typed `ListingDetails` interface This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches. ## Next Steps - Explore posting/authentication APIs (requires user login) - Investigate if GraphQL API can be used for programmatic access with proper authentication - Test rate limiting patterns and optimal scraping strategies - Document additional category and location ID mappings