Files
ca-marketplace-scraper/KIJIJI.md
2026-01-22 00:06:31 -05:00

16 KiB

Kijiji API Findings

Overview

Kijiji is a Canadian classifieds marketplace that uses a modern web application built with Next.js and Apollo GraphQL. The search results are powered by a GraphQL API with client-side state management.

Initial Page Load (Homepage)

  • URL: https://www.kijiji.ca/
  • Architecture: Server-side rendered React application with Next.js
  • Data Sources:
    • Static assets loaded from webapp-static.ca-kijiji-production.classifiedscloud.io
    • Image media served from media.kijiji.ca/api/v1/
    • No initial API calls for listings - data appears to be embedded in HTML

Search Results Page

  • URL Pattern: https://www.kijiji.ca/b-[location]/[keywords]/k0l0
  • Example: https://www.kijiji.ca/b-canada/iphone/k0l0
  • Technology Stack: Next.js with Apollo GraphQL client
  • Data Structure: Uses __APOLLO_STATE__ global object containing normalized GraphQL cache

GraphQL Data Structure

Data Location

Search results data is embedded in the Next.js page props under __NEXT_DATA__.props.pageProps.__APOLLO_STATE__. The data is pre-rendered on the server and sent to the client. Each page (including pagination) has its own pre-rendered data.

Search Results Container

The search results are stored directly in the Apollo ROOT_QUERY with keys following the pattern searchResultsPageByUrl:{url_path} where url_path includes pagination parameters.

{
  "searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0": { ... },
  "searchResultsPageByUrl:/b-buy-sell/canada/iphone/k0c10l0?page=2": { ... }
}

Pagination Handling

  • Each page is server-side rendered with its own embedded data
  • No client-side GraphQL requests for pagination
  • URL parameter ?page=N controls which page data is embedded
  • Offset in searchString corresponds to (page-1) * limit

Search Parameters in URL

  • k0c{CATEGORY}l{LOCATION} - Category and location IDs
  • ?page=N - Page number (1-based)
  • Data contains offset and limit for API-style pagination

Individual Listing Structure

{
  "id": "1732061412",
  "title": "iPhone 13",
  "description": "iPhone 13, always had a screen protector on it...",
  "imageCount": 3,
  "imageUrls": ["https://media.kijiji.ca/api/v1/ca-prod-fsbo-ads/images/..."],
  "categoryId": 760,
  "url": "https://www.kijiji.ca/v-cell-phone/...",
  "activationDate": "2026-01-21T16:51:16.000Z",
  "sortingDate": "2026-01-21T16:51:16.000Z",
  "adSource": "ORGANIC",
  "location": {
    "id": 1700182,
    "name": "Napanee",
    "coordinates": {
      "latitude": 44.48774,
      "longitude": -76.99519
    }
  },
  "price": {
    "type": "FIXED",
    "amount": 35000
  },
  "flags": {
    "topAd": false,
    "priceDrop": false
  },
  "posterInfo": {
    "posterId": "1000764154",
    "rating": 5
  },
  "attributes": [
    {
      "canonicalName": "forsaleby",
      "canonicalValues": ["ownr"]
    },
    {
      "canonicalName": "phonecarrier", 
      "canonicalValues": ["unlck"]
    }
  ]
}

URL Parameters

  • sort=MATCH - Sort by relevance
  • order=DESC - Descending order
  • type=OFFER - Show offerings (not wanted ads)
  • offset=0 - Pagination offset
  • limit=40 - Results per page
  • topAdCount=6 - Number of promoted ads
  • keywords=iphone - Search keywords
  • category=0 - Category ID (0 = All Categories)
  • location=0 - Location ID (0 = Canada)
  • eaTopAdPosition=1 - ?

Image API

  • Endpoint: https://media.kijiji.ca/api/v1/
  • Pattern: /ca-prod-fsbo-ads/images/{uuid}?rule=kijijica-{size}-jpg
  • Sizes: 200, 300, 400, 500 pixels

Categories and Locations

Category Structure

Categories are hierarchical with parent-child relationships. The main categories under "Buy & Sell" include:

ID Name Total Results (iPhone search)
10 Buy & Sell 19956
12 Arts & Collectibles 149
767 Audio 481
253 Baby Items 13
931 Bags & Luggage 8
644 Bikes 46
109 Books 21
103 Cameras & Camcorders 101
104 CDs, DVDs & Blu-ray 102
274 Clothing 83
16 Computers 285
128 Computer Accessories 363
29659001 Electronics 2006
17220001 Free Stuff 23
235 Furniture 29
638 Garage Sales 5
140 Health & Special Needs 30
139 Hobbies & Crafts 10
107 Home Appliances 23
717 Home - Indoor 27
727 Home Renovation Materials 14
133 Jewellery & Watches 83
17 Musical Instruments 34
132 Phones 15518
111 Sporting Goods & Exercise 30
110 Tools 25
108 Toys & Games 38
15093001 TVs & Video 15
141 Video Games & Consoles 96
26 Other 286

Location Structure

Locations are also hierarchical, with provinces/states under the main "Canada" location:

ID Name Total Results (iPhone search)
0 Canada -
9001 Québec 2516
9002 Nova Scotia 875
9003 Alberta 2317
9004 Ontario 12507
9005 New Brunswick 118
9006 Manitoba 919
9007 British Columbia 306
9008 Newfoundland 27
9009 Saskatchewan 336
9010 Territories 7
9011 Prince Edward Island 31

URL Patterns

  • Categories: /b-{category-slug}/canada/{keywords}/k0c{CATEGORY_ID}l0
  • Locations: /b-buy-sell/{location-slug}/iphone/k0c10l{LOCATION_ID}
  • Combined: /b-{category-slug}/{location-slug}/{keywords}/k0c{CATEGORY_ID}l{LOCATION_ID}

Pagination

  • Uses offset-based pagination
  • 40 results per page
  • Total count provided in pagination metadata

Authentication & User Management

  • Authentication System: OAuth2-based using CIS (Customer Identity Service)
  • Identity Provider: id.kijiji.ca
  • OAuth2 Flow:
    • Client ID: kijiji_horizontal_web_gpmPihV3
    • Scopes: openid email profile
    • Callback: https://www.kijiji.ca/api/auth/callback/cis
  • Session Management: Cookies-based with encrypted session data
  • Anonymous Access: Full search functionality available without login
  • User Features: Saved searches, messaging, flagging require authentication

Posting API

  • Posting Flow: Requires authentication, redirects to login if not authenticated
  • Posting URL: https://www.kijiji.ca/p-post-ad.html
  • Authentication Required: Yes, redirects to /consumer/login for unauthenticated users
  • Post-Creation: Likely uses authenticated GraphQL mutations (not observed in anonymous browsing)

GraphQL API Endpoint

  • URL: https://www.kijiji.ca/anvil/api
  • Method: POST
  • Content-Type: application/json
  • Headers:
    • apollo-require-preflight: true
    • Standard CORS headers
  • Authentication: No authentication required for basic queries (uses cookies for session tracking)
  • Technology: Apollo GraphQL server

Sample GraphQL Queries Discovered

Get Search Categories

query getSearchCategories($locale: String!) {
  searchCategories {
    id
    localizedName(locale: $locale)
    parentId
    __typename
  }
}

Variables: {"locale": "en-CA"}

Response includes hierarchical category structure with IDs and localized names.

Get Geocode from IP (fails for current IP)

query GetGeocodeReverseFromIp {
  geocodeReverseFromIp {
    city
    province
    locationId
    __typename
  }
}

This query fails for the current IP address, suggesting geolocation-based features may not work or require different IP ranges.

Get Category Path

query GetCategoryPath($categoryId: Int!, $locale: String, $locationId: Int) {
  category(id: $categoryId) {
    id
    localizedName(locale: $locale)
    parentId
    searchSeoUrl(locationId: $locationId)
    categoryPaths {
      id
      localizedName(locale: $locale)
      parentId
      searchSeoUrl(locationId: $locationId)
      __typename
    }
    __typename
  }
}

Variables: {"categoryId": 10, "locationId": 0, "locale": "en-CA"}

Latest Findings (2026-01-21)

Client-Side GraphQL Queries Observed

  • getSearchCategories: Retrieves category hierarchy for search filters
  • GetGeocodeReverseFromIp: Attempts to geolocate user (fails for current IP)

GraphQL Schema Insights

Testing direct GraphQL queries revealed:

  • Field "searchResults" does not exist on Query type
  • Suggested alternatives: "searchResultsPage" or "searchUrl"
  • This suggests the search functionality may use different GraphQL operations than direct queries

The embedded Apollo state approach appears to be the primary method for accessing search data, with GraphQL used for auxiliary operations like categories and geolocation.

Server-Side Rendering Architecture

Search results are fully server-side rendered with data embedded in HTML. Each page (including pagination) contains its own pre-rendered data. No client-side GraphQL requests are made for:

  • Initial search results
  • Pagination navigation
  • Search result data

Network Analysis Findings

  • GraphQL endpoint: https://www.kijiji.ca/anvil/api
  • Method: POST
  • Content-Type: application/json
  • Headers include: apollo-require-preflight: true
  • Cookies required for session tracking

Embedded Data Structure

Search results data is embedded in the HTML within Next.js __NEXT_DATA__.props.pageProps.__APOLLO_STATE__ object. The data includes:

  • Individual ad listings with complete metadata
  • Pagination information
  • Filter options and counts
  • Category/location hierarchies

Current Scraper Implementation

The existing src/kijiji.ts implementation correctly parses the embedded Apollo state:

  • Uses extractApolloState() to parse __NEXT_DATA__ from HTML
  • Filters Apollo keys containing "Listing" to find ad data
  • Extracts url, title, and other metadata from each listing
  • Successfully scrapes listings without needing API authentication

Authentication Status

  • Search functionality: No authentication required - all search and listing data accessible anonymously
  • Posting functionality: Requires authentication (redirects to login)
  • User features: Saved searches, messaging require authentication
  • Rate limiting: May apply but not observed in anonymous browsing

Pagination Implementation

  • Each page is a separate server-rendered route
  • URL pattern: /b-{location}/{keywords}/page-{number}/k0{category}l{location_id}
  • No client-side pagination API calls
  • 40 results per page (observed)
  • Example: /b-canada/iphone/page-2/k0l0 for page 2 of iPhone search

URL Pattern Analysis

Search URL Structure

https://www.kijiji.ca/b-{category_slug}/{location_slug}/{keywords}/k0c{category_id}l{location_id}

Examples Observed:

  • All categories, Canada: /b-canada/iphone/k0l0 (c0 = All Categories, l0 = Canada)
  • Cell phones category: /b-cell-phones/canada/iphone/k0c132l0 (c132 = Cell Phones)
  • With pagination: /b-canada/iphone/page-2/k0l0

URL Components:

  • c{CATEGORY_ID}: Category ID (0 = All Categories, 132 = Cell Phones, etc.)
  • l{LOCATION_ID}: Location ID (0 = Canada, 1700272 = GTA, etc.)
  • page-{N}: Pagination (1-based, optional)
  • Keywords are slugified in URL path

Current Implementation Status

The existing scraper in src/kijiji.ts successfully implements the approach:

  • Parses embedded Apollo state from HTML responses
  • Handles rate limiting and retries
  • Extracts listing metadata (title, URL, price, location, etc.)
  • Works without authentication for search operations

Listing Details Page

Overview

Similar to search results, listing details pages use server-side rendering with embedded Apollo GraphQL state in the HTML. No dedicated API endpoint serves individual listing data - all information is pre-rendered on the server.

Data Architecture

  • Server-Side Rendering: Each listing page is fully server-rendered with data embedded in HTML
  • Embedded Apollo State: Listing data is stored in __NEXT_DATA__.props.pageProps.__APOLLO_STATE__
  • Client-Side GraphQL: Additional data (categories, campaigns, similar listings, user profiles) fetched via GraphQL API

Listing Data Structure

The main listing data follows the same pattern as search results:

{
  "id": "1705585530",
  "title": "We Pay top cash for iPhone 17 pro max, iPhone 17 pro, iPhone Air",
  "description": "Buying All Brand new Apple iPhones sealed/Unsealed...",
  "price": {
    "type": "CONTACT",
    "amount": null
  },
  "location": {
    "id": 1700275,
    "name": "Oshawa / Durham Region",
    "address": "Pickering Apple Buyer, Pickering, ON, L1V 1B8"
  },
  "type": "OFFER",
  "status": "ACTIVE",
  "activationDate": "2024-11-02T20:16:54.000Z",
  "endDate": "3000-01-01T00:00:00.000Z",
  "metrics": {
    "views": 1720
  },
  "posterInfo": {
    "posterId": "1044934581",
    "rating": null
  },
  "attributes": [
    {
      "canonicalName": "forsaleby",
      "canonicalValues": ["business"]
    },
    {
      "canonicalName": "phonecarrier",
      "canonicalValues": ["unlocked"]
    }
  ]
}

Client-Side GraphQL Queries

When loading a listing details page, the following GraphQL queries are executed:

1. getSearchCategories

  • Purpose: Category hierarchy for navigation
  • Variables: {"locale": "en-CA"}
  • Response: Hierarchical category structure

2. getCampaignsForVip

  • Purpose: Advertisement targeting data
  • Variables: {"placement": "vip", "locationId": 1700275, "categoryId": 760, "platform": "desktop"}
  • Response: Campaign/ads data (usually null)

3. GetReviewSummary

  • Purpose: Seller review statistics
  • Variables: {"userId": "1044934581"}
  • Response: Review count and score (usually 0 for new sellers)

4. GetProfileMetrics

  • Purpose: Seller profile information
  • Variables: {"profileId": "1044934581"}
  • Response: Member since date, account type

5. GetListingsSimilar

  • Purpose: Similar listings for cross-selling
  • Variables: {"listingId": "1705585530", "limit": 10, "isExternalId": false}
  • Response: Array of similar listings with basic metadata

6. GetGeocodeReverseFromIp

  • Purpose: Geolocation-based features
  • Variables: {}
  • Response: Fails with 404 for most IPs

Implementation Status

The existing parseListing() function in src/kijiji.ts successfully extracts listing details from embedded Apollo state:

  • Extracts title, description, price, location
  • Handles contact-based pricing ("Please Contact")
  • Parses creation date, view count, listing status
  • Extracts seller information and address
  • Works without authentication or API keys

Key Findings

  1. No Dedicated Listing API: Unlike search results, there's no separate GraphQL query for individual listing data
  2. Complete Data Available: All listing information is embedded in the initial HTML response
  3. Additional Context Fetched: Secondary GraphQL queries provide complementary data (reviews, similar listings)
  4. Consistent Architecture: Same Apollo state embedding pattern as search pages

Current Scraper Implementation

The scraper successfully extracts listing details by:

  1. Fetching the listing URL HTML
  2. Parsing embedded __NEXT_DATA__ Apollo state
  3. Extracting the Listing:{id} object from Apollo cache
  4. Mapping fields to typed ListingDetails interface

This approach works reliably without requiring authentication or dealing with rate limiting on individual listing fetches.

Next Steps

  • Explore posting/authentication APIs (requires user login)
  • Investigate if GraphQL API can be used for programmatic access with proper authentication
  • Test rate limiting patterns and optimal scraping strategies
  • Document additional category and location ID mappings