Compare commits

..

25 Commits

Author SHA1 Message Date
0873df7e82 chore: merge code-smell-cleanup 2026-04-30 21:08:34 -04:00
24e0a8266e Revert "test: preload core fetch guard"
This reverts commit 28b3267b7d.
2026-04-30 20:58:06 -04:00
db173aef1b Revert "chore: add sentinel file for bun test root"
This reverts commit d1cd028f34.
2026-04-30 20:58:06 -04:00
d1cd028f34 chore: add sentinel file for bun test root 2026-04-30 20:56:14 -04:00
28b3267b7d test: preload core fetch guard 2026-04-30 20:53:31 -04:00
c0dda57f64 test: require explicit fetch mocks 2026-04-30 20:51:13 -04:00
31866de787 refactor: clean kijiji scraper internals 2026-04-30 20:48:15 -04:00
9c4c347933 feat: ebay splashui challenge solver
argon2id pow → /challengesvc/answer → chlgref cookie
warm homepage for akamai cookies, detect 307 redirect,
solve + retry transparently in fetchEbayItems flow
2026-04-30 20:44:37 -04:00
53eafe6d4c chore: agent-browser skills path env
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-04-30 20:44:05 -04:00
84f17fbdfd chore: ebay parser fix
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-04-30 16:56:55 -04:00
3a722a2d11 chore: agent-browser vars
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-04-30 16:56:44 -04:00
f95b974c7e fix: harden shared http helper 2026-04-29 21:09:10 -04:00
f5339cadf1 style: format shared http refactor 2026-04-29 21:05:36 -04:00
5d86a4e54d fix: preserve ebay rate-limit fallback 2026-04-29 14:52:08 -04:00
82e7abc057 fix: keep shared http refactor in scope 2026-04-29 14:48:47 -04:00
6e50ebf901 refactor: share scraper http fetching 2026-04-29 13:14:20 -04:00
5ecb645ee3 docs: smell cleanup plan
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-04-29 13:09:38 -04:00
82e12283de docs: surface Kijiji AND-matching behavior in tool, API, and MCP responses
Kijiji zero-result queries (e.g. 'macbook air m1 apple silicon') are
confusing because the failure mode is non-obvious. Surface the root
cause everywhere the caller can see it:
- MCP tool description warns about AND-matching and gives a concrete
  before/after example
- API 404 body includes the actionable hint via emptySearchResponse(hint)
- Core scraper logs the built URL and tip on page-1 zero results
- MCP handler unwraps the API message field so the hint reaches the LLM
2026-04-29 13:06:31 -04:00
22eb65d4a2 refactor: share mcp api calls 2026-04-29 05:37:24 -04:00
abdd39d65c fix: complete ebay integer validation test coverage 2026-04-29 00:56:37 -04:00
3e4e35c9ae fix: tighten route integer parsing and test coverage 2026-04-29 00:32:23 -04:00
3ea6ee3938 fix: strictly parse route integers 2026-04-29 00:12:26 -04:00
d178f9c9cb fix: remove cookie query forwarding 2026-04-28 23:52:45 -04:00
9cbba9ba13 chore: ignore local worktrees 2026-04-28 23:08:04 -04:00
b6aaec0b65 chore: update ruler docs
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-04-28 22:29:12 -04:00
30 changed files with 2854 additions and 617 deletions

4
.envrc
View File

@@ -1,4 +1,8 @@
export DIRENV_WARN_TIMEOUT=20s export DIRENV_WARN_TIMEOUT=20s
export AGENT_BROWSER_EXECUTABLE_PATH=/run/current-system/sw/bin/google-chrome-unstable
export AGENT_BROWSER_ENGINE=chrome
export AGENT_BROWSER_HEADED=0
export AGENT_BROWSER_SKILLS_DIR=.claude/skills
export OPENCODE_CONFIG_CONTENT="{\"plugin\":[\"superpowers@git+https://github.com/obra/superpowers.git\"]}" export OPENCODE_CONFIG_CONTENT="{\"plugin\":[\"superpowers@git+https://github.com/obra/superpowers.git\"]}"
eval "$(devenv direnvrc)" eval "$(devenv direnvrc)"

1
.gitignore vendored
View File

@@ -34,6 +34,7 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
.cache .cache
*.tsbuildinfo *.tsbuildinfo
.turbo .turbo
.worktrees/
# IntelliJ based IDEs # IntelliJ based IDEs
.idea .idea

View File

@@ -1,52 +1,9 @@
## Bun Guidelines ## Bun Guide
**CRITICAL**: Do not assume you know full Bun APIs. For **ANY** Bun API you use, confirm them by using `bun-docs` MCP tools. - Package manager/runtime/test runner is Bun `1.3.13`.
- Use `bun install`, `bun run <script>`, `bun test`, and `bun build`; do not add npm/yarn/pnpm scripts.
Default to using Bun instead of Node.js. - Prefer Bun-native runtime APIs already used in repo: `Bun.serve`, built-in `fetch`, Web APIs, and `bun:test`.
- Keep servers framework-free. Do not introduce Express/Koa/Fastify for the adapters.
- Use `bun <file>` instead of `node <file>` or `ts-node <file>` - Bun auto-loads `.env`; do not add `dotenv`.
- Use `bun test` instead of `jest` or `vitest` - For tests, import from `bun:test` and restore mocked globals/env in `afterEach` or `finally`.
- Use `bun build <file.html|file.ts|file.css>` instead of `webpack` or `esbuild` - Root `bun test` is misleading because `bunfig.toml` sets a dummy root. Run package test paths explicitly.
- Use `bun install` instead of `npm install` or `yarn install` or `pnpm install`
- Use `bun run <script>` instead of `npm run <script>` or `yarn run <script>` or `pnpm run <script>`
- Use `bunx <package> <command>` instead of `npx <package> <command>`
- Bun automatically loads .env, so don't use dotenv.
### APIs
- `Bun.serve()` supports WebSockets, HTTPS, and routes. Don't use `express`.
- `bun:sqlite` for SQLite. Don't use `better-sqlite3`.
- `Bun.redis` for Redis. Don't use `ioredis`.
- `Bun.sql` for Postgres. Don't use `pg` or `postgres.js`.
- `WebSocket` is built-in. Don't use `ws`.
- Prefer `Bun.file` over `node:fs`'s readFile/writeFile
- Bun.$`ls` instead of execa.
### Testing
#### Quick Start
- Run tests: `bun test`
- Write tests in `tests/` folder
#### Test Structure
- Use `describe` blocks to group related tests
- Use `test` for individual test cases
- Use `beforeEach`/`afterEach` for setup/teardown
#### Assertions
- Import: `import { test, expect, describe, beforeEach, afterEach, mock } from "bun:test";`
- Common: `expect(value).toBe(expected)`, `expect(fn).rejects.toThrow()`
- Async: `await expect(asyncFn()).resolves.toBe(expected)`
#### Mocking
- Mock functions: `mock(fn)`
- Mock globals: `global.fetch = mock(...)`
- Restore mocks in `afterEach` or `finally`
#### Best Practices
- Mock external APIs (fetch, file I/O)
- Test error cases and edge conditions
- Use descriptive test names
- Clean up resources in `afterEach`
For more information, read the Bun API docs in `node_modules/bun-types/docs/**.mdx`.

View File

@@ -2,37 +2,46 @@
## Repo Shape ## Repo Shape
- Bun workspace monorepo. - Bun workspace monorepo with packages under `packages/*`.
- `packages/core`: scraper logic, parsing, shared cookie/http/format helpers, and the only checked-in tests. - `packages/core`: scraper behavior, parsing, result types, cookie handling, HTTP helpers.
- `packages/api-server`: Bun HTTP adapter exposing `/api/*` routes. - `packages/api-server`: Bun HTTP adapter exposing `/api/*` routes over core.
- `packages/mcp-server`: MCP JSON-RPC adapter that proxies to the API server. - `packages/mcp-server`: MCP/JSON-RPC adapter that proxies to the API server.
- `dist/`: build output. Do not edit generated files here. - `cookies/`: local cookie docs/examples only. Treat real cookie files as secrets.
- `cookies/`: local cookie examples and docs. Never commit real session cookies. - `dist/`, `node_modules/`, `.turbo/`, `.direnv/`, `.devenv/`: generated/vendor/cache. Do not edit.
## Commands ## Commands
- Install: `bun install` - Install: `bun install`
- Lint/format check: `bun run ci` - Lint/format/typecheck: `bun run ci`
- Build everything: `bun run build` - Build all packages: `bun run build`
- Run tests: `bun test` - Build bundled runtime output: `bun run build:all`
- Run tests: `bun test packages/core/test packages/api-server/test packages/mcp-server/test`
- API dev server: `bun run --cwd packages/api-server dev` - API dev server: `bun run --cwd packages/api-server dev`
- MCP dev server: `bun run --cwd packages/mcp-server dev` - MCP dev server: `bun run --cwd packages/mcp-server dev`
## Repo Conventions ## Boundaries
- Keep marketplace scraping behavior in `packages/core`. `api-server` and `mcp-server` stay thin adapters. - Marketplace behavior belongs in `packages/core`, not adapter packages.
- Preserve cookie precedence everywhere: request parameter > environment variable > cookie file. - HTTP route code should parse request input, call core, and map status/errors.
- Shared public surface for scraper code is `packages/core/src/index.ts`. Update exports deliberately. - MCP code should define tools, validate JSON-RPC flow, and map tool args to API URLs.
- Tests should stay deterministic and offline. Mock `fetch`; do not hit live marketplace endpoints. - Keep API query params and MCP tool args in sync.
- Use Bun and Bun-native APIs in this repo. Do not introduce Node-specific tooling unless already required. - Shared public surface for scraper code is `packages/core/src/index.ts`; update exports deliberately.
- Biome and strict TypeScript are part of the contract. Fix code to satisfy them; do not relax config.
## Invariants
- Cookie precedence in core helpers: explicit/request cookie string before environment variable.
- Tests must be deterministic and offline. Mock `fetch`; do not hit live marketplace endpoints.
- Use Bun and Bun-native APIs. Do not add Node-specific tooling unless already required.
- Biome and strict TypeScript are contract. Fix code; do not relax config.
## Verification ## Verification
- Core changes: `bun test && bun run ci` - Core changes: `bun test packages/core/test && bun run ci`
- Cross-package contract changes: `bun test && bun run ci && bun run build` - Adapter-only changes: relevant package build plus `bun run ci`
- Adapter-only changes: run the relevant package build plus `bun run ci` - Cross-package contract changes: `bun test packages/core/test packages/api-server/test packages/mcp-server/test && bun run ci && bun run build`
## Gotchas ## Gotchas
- The root `build` script emits separate bundles to `dist/api` and `dist/mcp`, then `scripts/start.sh` launches both. - `bunfig.toml` points test root at `./do-not-run-tests-from-root`; pass package test paths explicitly.
- Root `build` cleans `dist`, then Turbo emits bundles for API and MCP.
- `scripts/start.sh` launches `dist/api/index.js` and `dist/mcp/index.js`.

View File

@@ -32,6 +32,7 @@
"version": "1.0.0", "version": "1.0.0",
"dependencies": { "dependencies": {
"@typescript/native-preview": "catalog:", "@typescript/native-preview": "catalog:",
"argon2-wasm-pro": "1.1.0",
"cli-progress": "^3.12.0", "cli-progress": "^3.12.0",
"linkedom": "^0.18.12", "linkedom": "^0.18.12",
"unidecode": "^1.1.0", "unidecode": "^1.1.0",
@@ -120,6 +121,8 @@
"ansi-regex": ["ansi-regex@5.0.1", "", {}, "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ=="], "ansi-regex": ["ansi-regex@5.0.1", "", {}, "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ=="],
"argon2-wasm-pro": ["argon2-wasm-pro@1.1.0", "", {}, "sha512-ApZAKEgbWQILckY+IdjrETB0oTC8L9YHT3JVQhdun77tilExkXNyM/T/qbkvX+Uv68+IQmVwewQwg6yJnSwVxQ=="],
"boolbase": ["boolbase@1.0.0", "", {}, "sha512-JZOSA7Mo9sNGB8+UjSgzdLtokWAky1zbztM3WRLCbZ70/3cTANmQmOdR7y2g+J0e2WXywy1yS468tY+IruqEww=="], "boolbase": ["boolbase@1.0.0", "", {}, "sha512-JZOSA7Mo9sNGB8+UjSgzdLtokWAky1zbztM3WRLCbZ70/3cTANmQmOdR7y2g+J0e2WXywy1yS468tY+IruqEww=="],
"bun-types": ["bun-types@1.3.13", "", { "dependencies": { "@types/node": "*" } }, "sha512-QXKeHLlOLqQX9LgYaHJfzdBaV21T63HhFJnvuRCcjZiaUDpbs5ED1MgxbMra71CsryN/1dAoXuJJJwIv/2drVA=="], "bun-types": ["bun-types@1.3.13", "", { "dependencies": { "@types/node": "*" } }, "sha512-QXKeHLlOLqQX9LgYaHJfzdBaV21T63HhFJnvuRCcjZiaUDpbs5ED1MgxbMra71CsryN/1dAoXuJJJwIv/2drVA=="],

View File

@@ -1,55 +1,18 @@
# Marketplace Cookies Setup # cookies
Both Facebook Marketplace and eBay require valid session cookies to bypass bot detection and access listings. ## Scope
## Cookie Configuration - This directory is for cookie setup docs and local examples only.
- Treat any real browser cookie export as a secret, even if already present locally.
Authenticated scrapers now read cookies only from environment variables: ## Runtime Sources
1. `FACEBOOK_COOKIE`
2. `EBAY_COOKIE`
--- - Authenticated scrapers read raw `Cookie` header strings from environment variables such as `FACEBOOK_COOKIE` and `EBAY_COOKIE`.
- Some core entrypoints also accept explicit cookie strings from request/options; explicit input takes precedence over environment values.
## Facebook Marketplace ## Safety Rules
### Required Cookies - Never commit real cookie values, browser exports, or session files.
- `c_user`: Your Facebook user ID - Use placeholder values in docs: `c_user=123; xs=token; fr=request`.
- `xs`: Facebook session token - Do not paste cookie values into logs, tests, fixtures, or generated agent docs.
- `fr`: Facebook request token - If editing this directory, verify diffs do not contain real `c_user`, `xs`, `fr`, `datr`, `sb`, `s`, `ds2`, or `ebay` values.
- `datr`: Data attribution token
- `sb`: Session browser token
### Setup
```bash
export FACEBOOK_COOKIE='c_user=123; xs=token; fr=request'
```
Use the raw `Cookie` header string copied from an authenticated browser session.
---
## eBay
eBay has aggressive bot detection that blocks requests without valid session cookies.
### Setup
```bash
export EBAY_COOKIE='s=VALUE; ds2=VALUE; ebay=VALUE'
```
Use the raw `Cookie` header string copied from an authenticated browser session.
---
## Important Notes
- Cookies must be from active browser sessions
- Cookies expire and need periodic refresh
- **NEVER** commit real cookies to version control
- Platforms may still block automated scraping despite valid cookies
## Security
Do not commit real cookie values or store them in tracked files.

File diff suppressed because it is too large Load Diff

View File

@@ -19,5 +19,6 @@
## Verify ## Verify
- `bun test packages/api-server/test`
- `bun run --cwd packages/api-server build` - `bun run --cwd packages/api-server build`
- `bun run ci` - `bun run ci`

View File

@@ -1,82 +1,76 @@
import { fetchEbayItems } from "@marketplace-scrapers/core"; import { fetchEbayItems } from "@marketplace-scrapers/core";
import { logger } from "../logger"; import { logger } from "../logger";
import {
emptySearchResponse,
getRequiredSearchQuery,
parseNonNegativeIntegerParam,
} from "./helpers";
/** /**
* GET /api/ebay?q={query}&minPrice={minPrice}&maxPrice={maxPrice}&strictMode={strictMode}&exclusions={exclusions}&keywords={keywords}&buyItNowOnly={buyItNowOnly}&canadaOnly={canadaOnly} * GET /api/ebay?q={query}&minPrice={minPrice}&maxPrice={maxPrice}&strictMode={strictMode}&exclusions={exclusions}&keywords={keywords}&buyItNowOnly={buyItNowOnly}&canadaOnly={canadaOnly}
* Search eBay for listings (default: Buy It Now only, Canada only) * Search eBay for listings (default: Buy It Now only, Canada only)
*/ */
export async function ebayRoute(req: Request): Promise<Response> { export async function ebayRoute(req: Request): Promise<Response> {
const reqUrl = new URL(req.url);
const SEARCH_QUERY = getRequiredSearchQuery(req);
if (SEARCH_QUERY instanceof Response) {
return SEARCH_QUERY;
}
const minPrice = parseNonNegativeIntegerParam(
reqUrl.searchParams,
"minPrice",
);
if (minPrice instanceof Response) {
return minPrice;
}
const maxPrice = parseNonNegativeIntegerParam(
reqUrl.searchParams,
"maxPrice",
);
if (maxPrice instanceof Response) {
return maxPrice;
}
const strictMode = reqUrl.searchParams.get("strictMode") === "true";
const buyItNowOnly = reqUrl.searchParams.get("buyItNowOnly") !== "false";
const canadaOnly = reqUrl.searchParams.get("canadaOnly") !== "false";
const exclusionsParam = reqUrl.searchParams.get("exclusions");
const exclusions = exclusionsParam
? exclusionsParam.split(",").map((s) => s.trim())
: [];
const keywordsParam = reqUrl.searchParams.get("keywords");
const keywords = keywordsParam
? keywordsParam.split(",").map((s) => s.trim())
: [SEARCH_QUERY];
const maxItems = parseNonNegativeIntegerParam(
reqUrl.searchParams,
"maxItems",
);
if (maxItems instanceof Response) {
return maxItems;
}
const hideUnstableResults =
reqUrl.searchParams.get("unstableFilter") === "true";
const opts = {
minPrice,
maxPrice,
strictMode,
exclusions,
keywords,
buyItNowOnly,
canadaOnly,
maxItems,
};
try { try {
const reqUrl = new URL(req.url);
const SEARCH_QUERY =
req.headers.get("query") || reqUrl.searchParams.get("q") || null;
if (!SEARCH_QUERY)
return Response.json(
{
message:
"Request didn't have 'query' header or 'q' search parameter!",
},
{ status: 400 },
);
const minPriceParam = reqUrl.searchParams.get("minPrice");
const minPrice = minPriceParam ? parseInt(minPriceParam, 10) : undefined;
if (minPriceParam && (Number.isNaN(minPrice) || (minPrice ?? 0) < 0)) {
return Response.json(
{ message: "Invalid minPrice parameter" },
{ status: 400 },
);
}
const maxPriceParam = reqUrl.searchParams.get("maxPrice");
const maxPrice = maxPriceParam ? parseInt(maxPriceParam, 10) : undefined;
if (maxPriceParam && (Number.isNaN(maxPrice) || (maxPrice ?? 0) < 0)) {
return Response.json(
{ message: "Invalid maxPrice parameter" },
{ status: 400 },
);
}
const strictMode = reqUrl.searchParams.get("strictMode") === "true";
const buyItNowOnly = reqUrl.searchParams.get("buyItNowOnly") !== "false";
const canadaOnly = reqUrl.searchParams.get("canadaOnly") !== "false";
const exclusionsParam = reqUrl.searchParams.get("exclusions");
const exclusions = exclusionsParam
? exclusionsParam.split(",").map((s) => s.trim())
: [];
const keywordsParam = reqUrl.searchParams.get("keywords");
const keywords = keywordsParam
? keywordsParam.split(",").map((s) => s.trim())
: [SEARCH_QUERY];
const maxItemsParam = reqUrl.searchParams.get("maxItems");
const maxItems = maxItemsParam ? parseInt(maxItemsParam, 10) : undefined;
if (maxItemsParam && (Number.isNaN(maxItems) || (maxItems ?? 0) < 0)) {
return Response.json(
{ message: "Invalid maxItems parameter" },
{ status: 400 },
);
}
const hideUnstableResults =
reqUrl.searchParams.get("unstableFilter") === "true";
const opts = {
minPrice,
maxPrice,
strictMode,
exclusions,
keywords,
buyItNowOnly,
canadaOnly,
maxItems,
};
if (hideUnstableResults) { if (hideUnstableResults) {
const items = await fetchEbayItems(SEARCH_QUERY, 1, opts, { const items = await fetchEbayItems(SEARCH_QUERY, 1, opts, {
hideUnstableResults: true, hideUnstableResults: true,
}); });
if (items.results.length === 0 && items.unstableResults.length === 0) { if (items.results.length === 0 && items.unstableResults.length === 0) {
return Response.json( return emptySearchResponse();
{ message: "Search didn't return any results!" },
{ status: 404 },
);
} }
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} }
@@ -84,11 +78,9 @@ export async function ebayRoute(req: Request): Promise<Response> {
const items = await fetchEbayItems(SEARCH_QUERY, 1, opts); const items = await fetchEbayItems(SEARCH_QUERY, 1, opts);
const isEmpty = !items || items.length === 0; const isEmpty = !items || items.length === 0;
if (isEmpty) if (isEmpty) {
return Response.json( return emptySearchResponse();
{ message: "Search didn't return any results!" }, }
{ status: 404 },
);
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} catch (error) { } catch (error) {
logger.error("eBay scraping error:", error); logger.error("eBay scraping error:", error);

View File

@@ -1,5 +1,10 @@
import { fetchFacebookItems } from "@marketplace-scrapers/core"; import { fetchFacebookItems } from "@marketplace-scrapers/core";
import { logger } from "../logger"; import { logger } from "../logger";
import {
emptySearchResponse,
getRequiredSearchQuery,
parseNonNegativeIntegerParam,
} from "./helpers";
/** /**
* GET /api/facebook?q={query}&location={location} * GET /api/facebook?q={query}&location={location}
@@ -8,24 +13,19 @@ import { logger } from "../logger";
export async function facebookRoute(req: Request): Promise<Response> { export async function facebookRoute(req: Request): Promise<Response> {
const reqUrl = new URL(req.url); const reqUrl = new URL(req.url);
const SEARCH_QUERY = const SEARCH_QUERY = getRequiredSearchQuery(req);
req.headers.get("query") || reqUrl.searchParams.get("q") || null; if (SEARCH_QUERY instanceof Response) {
if (!SEARCH_QUERY) return SEARCH_QUERY;
return Response.json( }
{
message: "Request didn't have 'query' header or 'q' search parameter!",
},
{ status: 400 },
);
const LOCATION = reqUrl.searchParams.get("location") || "toronto"; const LOCATION = reqUrl.searchParams.get("location") || "toronto";
const maxItemsParam = reqUrl.searchParams.get("maxItems"); const maxItems = parseNonNegativeIntegerParam(
const maxItems = maxItemsParam ? parseInt(maxItemsParam, 10) : 25; reqUrl.searchParams,
if (maxItemsParam && (Number.isNaN(maxItems) || maxItems < 0)) { "maxItems",
return Response.json( 25,
{ message: "Invalid maxItems parameter" }, );
{ status: 400 }, if (maxItems instanceof Response) {
); return maxItems;
} }
const hideUnstableResults = const hideUnstableResults =
reqUrl.searchParams.get("unstableFilter") === "true"; reqUrl.searchParams.get("unstableFilter") === "true";
@@ -42,20 +42,15 @@ export async function facebookRoute(req: Request): Promise<Response> {
}, },
); );
if (items.results.length === 0 && items.unstableResults.length === 0) { if (items.results.length === 0 && items.unstableResults.length === 0) {
return Response.json( return emptySearchResponse();
{ message: "Search didn't return any results!" },
{ status: 404 },
);
} }
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} }
const items = await fetchFacebookItems(SEARCH_QUERY, 1, LOCATION, maxItems); const items = await fetchFacebookItems(SEARCH_QUERY, 1, LOCATION, maxItems);
if (!items || items.length === 0) if (!items || items.length === 0) {
return Response.json( return emptySearchResponse();
{ message: "Search didn't return any results!" }, }
{ status: 404 },
);
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} catch (error) { } catch (error) {
logger.error("Facebook scraping error:", error); logger.error("Facebook scraping error:", error);

View File

@@ -0,0 +1,47 @@
export function getRequiredSearchQuery(req: Request): string | Response {
const reqUrl = new URL(req.url);
const query = req.headers.get("query") || reqUrl.searchParams.get("q");
if (!query) {
return Response.json(
{
message: "Request didn't have 'query' header or 'q' search parameter!",
},
{ status: 400 },
);
}
return query;
}
export function parseNonNegativeIntegerParam(
searchParams: URLSearchParams,
name: string,
defaultValue: number,
): number | Response;
export function parseNonNegativeIntegerParam(
searchParams: URLSearchParams,
name: string,
): number | undefined | Response;
export function parseNonNegativeIntegerParam(
searchParams: URLSearchParams,
name: string,
defaultValue?: number,
): number | undefined | Response {
const rawValue = searchParams.get(name);
if (rawValue === null) {
return defaultValue;
}
if (!/^\d+$/.test(rawValue)) {
return Response.json(
{ message: `Invalid ${name} parameter` },
{ status: 400 },
);
}
return Number(rawValue);
}
export function emptySearchResponse(hint?: string): Response {
const message = hint
? `Search didn't return any results! ${hint}`
: "Search didn't return any results!";
return Response.json({ message }, { status: 404 });
}

View File

@@ -1,5 +1,10 @@
import { fetchKijijiItems } from "@marketplace-scrapers/core"; import { fetchKijijiItems } from "@marketplace-scrapers/core";
import { logger } from "../logger"; import { logger } from "../logger";
import {
emptySearchResponse,
getRequiredSearchQuery,
parseNonNegativeIntegerParam,
} from "./helpers";
/** /**
* GET /api/kijiji?q={query} * GET /api/kijiji?q={query}
@@ -8,39 +13,32 @@ import { logger } from "../logger";
export async function kijijiRoute(req: Request): Promise<Response> { export async function kijijiRoute(req: Request): Promise<Response> {
const reqUrl = new URL(req.url); const reqUrl = new URL(req.url);
const SEARCH_QUERY = const SEARCH_QUERY = getRequiredSearchQuery(req);
req.headers.get("query") || reqUrl.searchParams.get("q") || null; if (SEARCH_QUERY instanceof Response) {
if (!SEARCH_QUERY) return SEARCH_QUERY;
return Response.json( }
{
message: "Request didn't have 'query' header or 'q' search parameter!",
},
{ status: 400 },
);
const maxPagesParam = reqUrl.searchParams.get("maxPages"); const maxPages = parseNonNegativeIntegerParam(
const maxPages = maxPagesParam ? parseInt(maxPagesParam, 10) : 5; reqUrl.searchParams,
if (maxPagesParam && (Number.isNaN(maxPages) || maxPages < 0)) { "maxPages",
return Response.json( 5,
{ message: "Invalid maxPages parameter" }, );
{ status: 400 }, if (maxPages instanceof Response) {
); return maxPages;
} }
const priceMinParam = reqUrl.searchParams.get("priceMin"); const priceMin = parseNonNegativeIntegerParam(
const priceMin = priceMinParam ? parseInt(priceMinParam, 10) : undefined; reqUrl.searchParams,
if (priceMinParam && (Number.isNaN(priceMin) || (priceMin ?? 0) < 0)) { "priceMin",
return Response.json( );
{ message: "Invalid priceMin parameter" }, if (priceMin instanceof Response) {
{ status: 400 }, return priceMin;
);
} }
const priceMaxParam = reqUrl.searchParams.get("priceMax"); const priceMax = parseNonNegativeIntegerParam(
const priceMax = priceMaxParam ? parseInt(priceMaxParam, 10) : undefined; reqUrl.searchParams,
if (priceMaxParam && (Number.isNaN(priceMax) || (priceMax ?? 0) < 0)) { "priceMax",
return Response.json( );
{ message: "Invalid priceMax parameter" }, if (priceMax instanceof Response) {
{ status: 400 }, return priceMax;
);
} }
const hideUnstableResults = const hideUnstableResults =
reqUrl.searchParams.get("unstableFilter") === "true"; reqUrl.searchParams.get("unstableFilter") === "true";
@@ -62,7 +60,6 @@ export async function kijijiRoute(req: Request): Promise<Response> {
maxPages, maxPages,
priceMin, priceMin,
priceMax, priceMax,
cookies: reqUrl.searchParams.get("cookies") || undefined,
}; };
try { try {
@@ -76,9 +73,9 @@ export async function kijijiRoute(req: Request): Promise<Response> {
{ hideUnstableResults: true }, { hideUnstableResults: true },
); );
if (items.results.length === 0 && items.unstableResults.length === 0) { if (items.results.length === 0 && items.unstableResults.length === 0) {
return Response.json( return emptySearchResponse(
{ message: "Search didn't return any results!" }, `Kijiji matches ALL words in the query against listing titles. ` +
{ status: 404 }, `Try a shorter or more common query (e.g. "macbook air m1" instead of "macbook air m1 apple silicon").`,
); );
} }
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
@@ -91,11 +88,12 @@ export async function kijijiRoute(req: Request): Promise<Response> {
searchOptions, searchOptions,
{}, {},
); );
if (!items || items.length === 0) if (!items || items.length === 0) {
return Response.json( return emptySearchResponse(
{ message: "Search didn't return any results!" }, `Kijiji matches ALL words in the query against listing titles. ` +
{ status: 404 }, `Try a shorter or more common query (e.g. "macbook air m1" instead of "macbook air m1 apple silicon").`,
); );
}
return Response.json(items, { status: 200 }); return Response.json(items, { status: 200 });
} catch (error) { } catch (error) {
logger.error("Kijiji scraping error:", error); logger.error("Kijiji scraping error:", error);

View File

@@ -76,7 +76,7 @@ describe("API routes", () => {
}); });
}); });
test("kijijiRoute passes cookies query parameter", async () => { test("kijijiRoute ignores cookies query parameter", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji"); const { kijijiRoute } = await import("../src/routes/kijiji");
await kijijiRoute( await kijijiRoute(
@@ -98,7 +98,6 @@ describe("API routes", () => {
maxPages: 3, maxPages: 3,
priceMin: undefined, priceMin: undefined,
priceMax: undefined, priceMax: undefined,
cookies: "s=1",
}, },
{}, {},
); );
@@ -188,7 +187,6 @@ describe("API routes", () => {
maxPages: 5, maxPages: 5,
priceMin: undefined, priceMin: undefined,
priceMax: undefined, priceMax: undefined,
cookies: undefined,
}, },
{}, {},
{ {
@@ -279,7 +277,6 @@ describe("API routes", () => {
maxPages: 5, maxPages: 5,
priceMin: undefined, priceMin: undefined,
priceMax: undefined, priceMax: undefined,
cookies: undefined,
}, },
{}, {},
); );
@@ -307,7 +304,6 @@ describe("API routes", () => {
maxPages: 5, maxPages: 5,
priceMin: undefined, priceMin: undefined,
priceMax: undefined, priceMax: undefined,
cookies: undefined,
}, },
{}, {},
); );
@@ -398,7 +394,8 @@ describe("API routes", () => {
expect(response.status).toBe(404); expect(response.status).toBe(404);
const body = await response.json(); const body = await response.json();
expect(body.message).toBe("Search didn't return any results!"); expect(body.message).toStartWith("Search didn't return any results!");
expect(body.message).toContain("Kijiji matches ALL words");
}); });
test("ebayRoute forwards maxItems to core in default mode", async () => { test("ebayRoute forwards maxItems to core in default mode", async () => {
@@ -505,6 +502,66 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid maxItems parameter"); expect(body.message).toBe("Invalid maxItems parameter");
}); });
test("ebayRoute returns 400 for non-integer maxItems", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxItems=10abc"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("ebayRoute returns 400 for decimal maxItems", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxItems=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("ebayRoute returns 400 for empty maxItems", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxItems="),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("ebayRoute returns 400 for whitespace maxItems", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxItems=%20%20"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("ebayRoute returns 400 for hex maxItems", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxItems=0x10"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("facebookRoute returns 400 for invalid maxItems", async () => { test("facebookRoute returns 400 for invalid maxItems", async () => {
const { facebookRoute } = await import("../src/routes/facebook"); const { facebookRoute } = await import("../src/routes/facebook");
@@ -517,6 +574,150 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid maxItems parameter"); expect(body.message).toBe("Invalid maxItems parameter");
}); });
test("facebookRoute returns 400 for non-integer maxItems", async () => {
const { facebookRoute } = await import("../src/routes/facebook");
const response = await facebookRoute(
new Request("http://localhost/api/facebook?q=laptop&maxItems=10abc"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("facebookRoute returns 400 for decimal maxItems", async () => {
const { facebookRoute } = await import("../src/routes/facebook");
const response = await facebookRoute(
new Request("http://localhost/api/facebook?q=laptop&maxItems=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("facebookRoute returns 400 for empty maxItems", async () => {
const { facebookRoute } = await import("../src/routes/facebook");
const response = await facebookRoute(
new Request("http://localhost/api/facebook?q=laptop&maxItems="),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("facebookRoute returns 400 for whitespace maxItems", async () => {
const { facebookRoute } = await import("../src/routes/facebook");
const response = await facebookRoute(
new Request("http://localhost/api/facebook?q=laptop&maxItems=%20%20"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("facebookRoute returns 400 for hex maxItems", async () => {
const { facebookRoute } = await import("../src/routes/facebook");
const response = await facebookRoute(
new Request("http://localhost/api/facebook?q=laptop&maxItems=0x10"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxItems parameter");
});
test("ebayRoute returns 400 for empty minPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&minPrice="),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid minPrice parameter");
});
test("ebayRoute returns 400 for whitespace minPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&minPrice=%20%20"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid minPrice parameter");
});
test("ebayRoute returns 400 for hex minPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&minPrice=0x10"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid minPrice parameter");
});
test("ebayRoute returns 400 for empty maxPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxPrice="),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPrice parameter");
});
test("ebayRoute returns 400 for whitespace maxPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxPrice=%20%20"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPrice parameter");
});
test("ebayRoute returns 400 for hex maxPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxPrice=0x10"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPrice parameter");
});
test("ebayRoute returns 400 for non-integer minPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&minPrice=10abc"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid minPrice parameter");
});
test("ebayRoute returns 400 for invalid minPrice", async () => { test("ebayRoute returns 400 for invalid minPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay"); const { ebayRoute } = await import("../src/routes/ebay");
@@ -529,6 +730,30 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid minPrice parameter"); expect(body.message).toBe("Invalid minPrice parameter");
}); });
test("ebayRoute returns 400 for decimal minPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&minPrice=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid minPrice parameter");
});
test("ebayRoute returns 400 for non-integer maxPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxPrice=10abc"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPrice parameter");
});
test("ebayRoute returns 400 for invalid maxPrice", async () => { test("ebayRoute returns 400 for invalid maxPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay"); const { ebayRoute } = await import("../src/routes/ebay");
@@ -541,6 +766,30 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid maxPrice parameter"); expect(body.message).toBe("Invalid maxPrice parameter");
}); });
test("ebayRoute returns 400 for decimal maxPrice", async () => {
const { ebayRoute } = await import("../src/routes/ebay");
const response = await ebayRoute(
new Request("http://localhost/api/ebay?q=laptop&maxPrice=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPrice parameter");
});
test("kijijiRoute returns 400 for decimal maxPages", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&maxPages=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPages parameter");
});
test("kijijiRoute returns 400 for invalid maxPages", async () => { test("kijijiRoute returns 400 for invalid maxPages", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji"); const { kijijiRoute } = await import("../src/routes/kijiji");
@@ -553,6 +802,54 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid maxPages parameter"); expect(body.message).toBe("Invalid maxPages parameter");
}); });
test("kijijiRoute returns 400 for non-integer maxPages", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&maxPages=10abc"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPages parameter");
});
test("kijijiRoute returns 400 for empty maxPages", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&maxPages="),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPages parameter");
});
test("kijijiRoute returns 400 for whitespace maxPages", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&maxPages=%20%20"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPages parameter");
});
test("kijijiRoute returns 400 for hex maxPages", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&maxPages=0x10"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid maxPages parameter");
});
test("kijijiRoute returns 400 for invalid priceMin", async () => { test("kijijiRoute returns 400 for invalid priceMin", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji"); const { kijijiRoute } = await import("../src/routes/kijiji");
@@ -565,6 +862,66 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid priceMin parameter"); expect(body.message).toBe("Invalid priceMin parameter");
}); });
test("kijijiRoute returns 400 for decimal priceMin", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMin=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMin parameter");
});
test("kijijiRoute returns 400 for non-integer priceMin", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMin=10abc"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMin parameter");
});
test("kijijiRoute returns 400 for empty priceMin", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMin="),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMin parameter");
});
test("kijijiRoute returns 400 for whitespace priceMin", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMin=%20%20"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMin parameter");
});
test("kijijiRoute returns 400 for hex priceMin", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMin=0x10"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMin parameter");
});
test("kijijiRoute returns 400 for invalid priceMax", async () => { test("kijijiRoute returns 400 for invalid priceMax", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji"); const { kijijiRoute } = await import("../src/routes/kijiji");
@@ -577,6 +934,66 @@ describe("API routes", () => {
expect(body.message).toBe("Invalid priceMax parameter"); expect(body.message).toBe("Invalid priceMax parameter");
}); });
test("kijijiRoute returns 400 for decimal priceMax", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMax=1.5"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMax parameter");
});
test("kijijiRoute returns 400 for non-integer priceMax", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMax=10abc"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMax parameter");
});
test("kijijiRoute returns 400 for empty priceMax", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMax="),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMax parameter");
});
test("kijijiRoute returns 400 for whitespace priceMax", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMax=%20%20"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMax parameter");
});
test("kijijiRoute returns 400 for hex priceMax", async () => {
const { kijijiRoute } = await import("../src/routes/kijiji");
const response = await kijijiRoute(
new Request("http://localhost/api/kijiji?q=laptop&priceMax=0x10"),
);
expect(response.status).toBe(400);
const body = await response.json();
expect(body.message).toBe("Invalid priceMax parameter");
});
test("facebookRoute returns 400 for negative maxItems", async () => { test("facebookRoute returns 400 for negative maxItems", async () => {
const { facebookRoute } = await import("../src/routes/facebook"); const { facebookRoute } = await import("../src/routes/facebook");

View File

@@ -11,6 +11,7 @@
}, },
"dependencies": { "dependencies": {
"@typescript/native-preview": "catalog:", "@typescript/native-preview": "catalog:",
"argon2-wasm-pro": "1.1.0",
"cli-progress": "^3.12.0", "cli-progress": "^3.12.0",
"linkedom": "^0.18.12", "linkedom": "^0.18.12",
"unidecode": "^1.1.0" "unidecode": "^1.1.0"

View File

@@ -39,6 +39,7 @@ export * from "./types/common";
// Export shared utilities // Export shared utilities
export * from "./utils/cookies"; export * from "./utils/cookies";
export * from "./utils/delay"; export * from "./utils/delay";
export * from "./utils/ebay-challenge";
export * from "./utils/format"; export * from "./utils/format";
export * from "./utils/http"; export * from "./utils/http";
export * from "./utils/unstable"; export * from "./utils/unstable";

View File

@@ -10,6 +10,8 @@ import {
formatCookiesForHeader, formatCookiesForHeader,
} from "../utils/cookies"; } from "../utils/cookies";
import { delay } from "../utils/delay"; import { delay } from "../utils/delay";
import { solveEbayChallenge } from "../utils/ebay-challenge";
import { fetchHtml, HttpError, RateLimitError } from "../utils/http";
import { logger } from "../utils/logger"; import { logger } from "../utils/logger";
import { classifyUnstableListings } from "../utils/unstable"; import { classifyUnstableListings } from "../utils/unstable";
@@ -40,6 +42,229 @@ export interface EbayListingDetails {
} }
const EBAY_PRICE_TEXT_RE = /^(?:\s*(?:CA|C|US)\s*\$|\s*[$£¥])/u; const EBAY_PRICE_TEXT_RE = /^(?:\s*(?:CA|C|US)\s*\$|\s*[$£¥])/u;
const EBAY_ITEM_URL_RE = /^https?:\/\/(?:www\.)?ebay\.(?:ca|com)\/itm\//u;
function decodeHtmlEntities(value: string): string {
return value
.replace(/&amp;/g, "&")
.replace(/&quot;/g, '"')
.replace(/&#39;/g, "'")
.replace(/&lt;/g, "<")
.replace(/&gt;/g, ">")
.trim();
}
function stripHtml(value: string): string {
return decodeHtmlEntities(
value.replace(/<[^>]*>/g, " ").replace(/\s+/g, " "),
);
}
function getHtmlAttr(tag: string, attrName: string): string | null {
const attrMatch = tag.match(
new RegExp(`\\s${attrName}=(?:"([^"]*)"|'([^']*)'|([^\\s>]+))`, "iu"),
);
return attrMatch?.[1] ?? attrMatch?.[2] ?? attrMatch?.[3] ?? null;
}
function normalizeEbayUrl(url: string): string | null {
const decodedUrl = decodeHtmlEntities(url);
try {
const parsed = new URL(decodedUrl, "https://www.ebay.ca");
return EBAY_ITEM_URL_RE.test(parsed.href) ? parsed.href : null;
} catch {
return null;
}
}
function toEbayListing(
url: string,
title: string,
priceText: string,
): EbayListingDetails | null {
const normalizedUrl = normalizeEbayUrl(url);
const cleanedTitle = stripHtml(title);
const cleanedPrice = stripHtml(priceText);
const priceInfo = parseEbayPrice(cleanedPrice);
if (!normalizedUrl || !cleanedTitle || cleanedTitle === "Shop on eBay") {
return null;
}
if (!priceInfo) return null;
return {
url: normalizedUrl,
title: cleanedTitle,
listingPrice: {
amountFormatted: cleanedPrice,
cents: priceInfo.cents,
currency: priceInfo.currency,
},
listingType: "OFFER",
listingStatus: "ACTIVE",
address: null,
};
}
function readObjectString(
value: Record<string, unknown>,
keys: string[],
): string | null {
for (const key of keys) {
const candidate = value[key];
if (typeof candidate === "string" && candidate.trim()) {
return candidate.trim();
}
}
return null;
}
function readPayloadPrice(value: Record<string, unknown>): string | null {
const directPrice = readObjectString(value, [
"price",
"currentPrice",
"displayPrice",
]);
if (directPrice) return directPrice;
for (const key of ["price", "currentPrice", "displayPrice", "priceInfo"]) {
const candidate = value[key];
if (
!candidate ||
typeof candidate !== "object" ||
Array.isArray(candidate)
) {
continue;
}
const priceObject = candidate as Record<string, unknown>;
const formatted = readObjectString(priceObject, [
"amount",
"formatted",
"text",
]);
if (formatted) return formatted;
const numericValue = priceObject.value;
const currency = readObjectString(priceObject, [
"currency",
"currencyCode",
]);
if (typeof numericValue === "string" && numericValue.trim()) {
return currency ? `${currency} ${numericValue}` : numericValue;
}
if (typeof numericValue === "number") {
return currency ? `${currency} ${numericValue}` : String(numericValue);
}
}
return null;
}
function collectPayloadListings(
value: unknown,
results: EbayListingDetails[],
): void {
if (!value || typeof value !== "object") return;
if (Array.isArray(value)) {
for (const item of value) {
collectPayloadListings(item, results);
}
return;
}
const objectValue = value as Record<string, unknown>;
const url = readObjectString(objectValue, [
"itemWebUrl",
"itemUrl",
"url",
"webUrl",
]);
const title = readObjectString(objectValue, ["title", "itemTitle", "name"]);
const priceText = readPayloadPrice(objectValue);
if (url && title && priceText) {
const listing = toEbayListing(url, title, priceText);
if (listing) {
results.push(listing);
return;
}
}
for (const child of Object.values(objectValue)) {
collectPayloadListings(child, results);
}
}
function parseEmbeddedEbayListings(
htmlString: HTMLString,
): EbayListingDetails[] {
const results: EbayListingDetails[] = [];
const payloadMatches = htmlString.matchAll(
/data-inlinepayload=(?:"([^"]*)"|'([^']*)'|([^\s>]+))/giu,
);
for (const match of payloadMatches) {
const rawPayload = match[1] ?? match[2] ?? match[3];
if (!rawPayload) continue;
try {
const decodedPayload = decodeURIComponent(decodeHtmlEntities(rawPayload));
collectPayloadListings(JSON.parse(decodedPayload), results);
} catch {
// eBay inline payloads vary by module; non-JSON payloads are ignored.
}
}
return results;
}
function parseSCardHtmlListings(htmlString: HTMLString): EbayListingDetails[] {
const results: EbayListingDetails[] = [];
const cardMatches = htmlString.matchAll(
/<div\b[^>]*class=(?:"[^"]*\bs-card\b[^"]*"|'[^']*\bs-card\b[^']*'|[^\s>]*\bs-card\b[^\s>]*)[\s\S]*?(?=<div\b[^>]*class=(?:"[^"]*\bs-card\b[^"]*"|'[^']*\bs-card\b[^']*'|[^\s>]*\bs-card\b[^\s>]*)|<\/body>|<\/html>)/giu,
);
for (const cardMatch of cardMatches) {
const cardHtml = cardMatch[0];
const linkTag = cardHtml.match(
/<a\b[^>]*\bhref=(?:"[^"]*\/itm\/[^"]*"|'[^']*\/itm\/[^']*'|[^\s>]*\/itm\/[^\s>]*)[^>]*>/iu,
)?.[0];
const titleMatch = cardHtml.match(
/<[^>]*\bclass=(?:"[^"]*\bs-card__title\b[^"]*"|'[^']*\bs-card__title\b[^']*'|[^\s>]*\bs-card__title\b[^\s>]*)[^>]*>([\s\S]*?)<\/[^>]+>/iu,
);
const priceMatch = cardHtml.match(
/<[^>]*\bclass=(?:"[^"]*\bs-card__price\b[^"]*"|'[^']*\bs-card__price\b[^']*'|[^\s>]*\bs-card__price\b[^\s>]*)[^>]*>([\s\S]*?)<\/[^>]+>/iu,
);
if (!linkTag || !titleMatch?.[1] || !priceMatch?.[1]) continue;
const href = getHtmlAttr(linkTag, "href");
if (!href) continue;
const listing = toEbayListing(href, titleMatch[1], priceMatch[1]);
if (listing) results.push(listing);
}
return results;
}
function dedupeEbayListings(
listings: EbayListingDetails[],
): EbayListingDetails[] {
const results: EbayListingDetails[] = [];
const seenUrls = new Set<string>();
for (const listing of listings) {
const canonicalUrl = canonicalizeEbayItemUrl(listing.url);
if (seenUrls.has(canonicalUrl)) continue;
seenUrls.add(canonicalUrl);
results.push(listing);
}
return results;
}
function canonicalizeEbayItemUrl(url: string): string { function canonicalizeEbayItemUrl(url: string): string {
try { try {
@@ -102,17 +327,6 @@ function parseEbayPrice(
return { cents, currency }; return { cents, currency };
} }
class HttpError extends Error {
constructor(
message: string,
public readonly status: number,
public readonly url: string,
) {
super(message);
this.name = "HttpError";
}
}
// ----------------------------- Parsing ----------------------------- // ----------------------------- Parsing -----------------------------
/** /**
@@ -124,6 +338,11 @@ function parseEbayListings(
exclusions: string[], exclusions: string[],
strictMode: boolean, strictMode: boolean,
): EbayListingDetails[] { ): EbayListingDetails[] {
const embeddedListings = parseEmbeddedEbayListings(htmlString);
if (embeddedListings.length > 0) {
return dedupeEbayListings(embeddedListings);
}
const { document } = parseHTML(htmlString); const { document } = parseHTML(htmlString);
const results: EbayListingDetails[] = []; const results: EbayListingDetails[] = [];
const seenUrls = new Set<string>(); const seenUrls = new Set<string>();
@@ -359,13 +578,34 @@ function parseEbayListings(
} }
} }
return results; if (results.length > 0) {
return results;
}
return dedupeEbayListings(
parseSCardHtmlListings(htmlString).filter((listing) => {
if (
exclusions.some((exclusion) =>
listing.title.toLowerCase().includes(exclusion.toLowerCase()),
)
) {
return false;
}
return (
!strictMode ||
keywords.some((keyword) =>
listing.title.toLowerCase().includes(keyword.toLowerCase()),
)
);
}),
);
} }
// ----------------------------- Cookie Loading ----------------------------- // ----------------------------- Session & Challenge -----------------------------
/** /**
* Load eBay cookies from EBAY_COOKIE * Load eBay cookies from EBAY_COOKIE env var
*/ */
async function loadEbayCookies(): Promise<string | undefined> { async function loadEbayCookies(): Promise<string | undefined> {
try { try {
@@ -379,6 +619,92 @@ async function loadEbayCookies(): Promise<string | undefined> {
} }
} }
const EBAY_UA =
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36";
/**
* Visit eBay homepage to collect Akamai fingerprinting cookies.
* These are required to pass the edge layer before any search request.
*/
async function warmEbaySession(): Promise<string | undefined> {
try {
const res = await fetch("https://www.ebay.ca", {
headers: {
"User-Agent": EBAY_UA,
Accept:
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-CA,en-US;q=0.9,en;q=0.8",
},
redirect: "manual",
});
if (!res.ok) return undefined;
const setCookies = res.headers.getSetCookie?.() ?? [];
const jar: Record<string, string> = {};
for (const header of setCookies) {
const match = header.match(/^([^=]+)=([^;]+)/);
if (match?.[1] && match[2]) jar[match[1]] = match[2];
}
const cookieKeys = Object.keys(jar);
if (cookieKeys.length === 0) return undefined;
return cookieKeys.map((k) => `${k}=${jar[k] ?? ""}`).join("; ");
} catch {
return undefined;
}
}
function mergeCookies(
base: string,
...additions: (string | undefined)[]
): string {
const jar: Record<string, string> = {};
const all = [base, ...additions.filter(Boolean)] as string[];
for (const str of all) {
for (const pair of str.split(";")) {
const eq = pair.indexOf("=");
if (eq > 0) {
jar[pair.substring(0, eq).trim()] = pair.substring(eq + 1).trim();
}
}
}
return Object.entries(jar)
.map(([k, v]) => `${k}=${v}`)
.join("; ");
}
function collectResponseCookies(res: Response, jar: Record<string, string>) {
for (const header of res.headers.getSetCookie?.() ?? []) {
const match = header.match(/^([^=]+)=([^;]+)/);
if (match?.[1] && match[2]) jar[match[1]] = match[2];
}
}
function cookiesToString(jar: Record<string, string>): string {
return Object.entries(jar)
.map(([k, v]) => `${k}=${v}`)
.join("; ");
}
const CHALLENGE_REDIRECT = 307;
const CHALLENGE_MARKER = "splashui/challenge";
function isChallengeRedirect(res: Response): boolean {
return (
res.status === CHALLENGE_REDIRECT &&
(res.headers.get("location") ?? "").includes(CHALLENGE_MARKER)
);
}
function isChallengeHtml(html: string): boolean {
return (
html.length < 50000 &&
(html.includes("_crefId") || html.includes("_cdetail"))
);
}
// ----------------------------- Main ----------------------------- // ----------------------------- Main -----------------------------
export default async function fetchEbayItems( export default async function fetchEbayItems(
@@ -454,7 +780,10 @@ export default async function fetchEbayItems(
return classifyUnstableListings(limitedListings); return classifyUnstableListings(limitedListings);
}; };
const cookies = await loadEbayCookies(); // Collect cookies from env var + warm-up session
const envCookies = await loadEbayCookies();
const warmCookies = await warmEbaySession();
const baseCookies = mergeCookies(envCookies ?? "", warmCookies);
// Build eBay search URL - use Canadian site, Buy It Now filter, and Canada-only preference // Build eBay search URL - use Canadian site, Buy It Now filter, and Canada-only preference
const urlParams = new URLSearchParams({ const urlParams = new URLSearchParams({
@@ -478,33 +807,113 @@ export default async function fetchEbayItems(
logger.log(`Fetching eBay search: ${searchUrl}`); logger.log(`Fetching eBay search: ${searchUrl}`);
try { try {
// Use custom headers modeled after real browser requests to bypass bot detection const searchHeaders: Record<string, string> = {
const headers: Record<string, string> = { "User-Agent": EBAY_UA,
"User-Agent": Accept:
"Mozilla/5.0 (X11; Linux x86_64; rv:141.0) Gecko/20100101 Firefox/141.0", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-CA,en-US;q=0.9,en;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br, zstd",
Referer: "https://www.ebay.ca/", Referer: "https://www.ebay.ca/",
Connection: "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-User": "?1",
Priority: "u=0, i",
}; };
// Add cookies if available (helps bypass bot detection) if (baseCookies) {
if (cookies) { searchHeaders.Cookie = baseCookies;
headers.Cookie = cookies;
} }
const res = await fetch(searchUrl, { // Step 1: Make search request (follow redirects for challenge flow)
let res = await fetch(searchUrl, {
method: "GET", method: "GET",
headers, headers: searchHeaders,
redirect: "manual",
}); });
const cookieJar: Record<string, string> = {};
// Collect cookies from homepage warm-up
if (baseCookies) {
for (const pair of baseCookies.split(";")) {
const eq = pair.indexOf("=");
if (eq > 0) {
cookieJar[pair.substring(0, eq).trim()] = pair
.substring(eq + 1)
.trim();
}
}
}
// Step 2: Follow challenge redirect if present
if (isChallengeRedirect(res)) {
const chalUrl = res.headers.get("location") ?? "";
collectResponseCookies(res, cookieJar);
logger.log("Challenge detected, fetching challenge page...");
res = await fetch(chalUrl, {
headers: { ...searchHeaders, Cookie: cookiesToString(cookieJar) },
redirect: "manual",
});
collectResponseCookies(res, cookieJar);
}
// Step 3: If response is challenge HTML, solve and submit
const responseHtml = await res.text();
if (isChallengeHtml(responseHtml)) {
logger.log("Solving challenge...");
const result = await solveEbayChallenge(
responseHtml,
cookiesToString(cookieJar),
);
if (result) {
// Merge answer cookies into jar
if (baseCookies) {
searchHeaders.Cookie = mergeCookies(baseCookies, result.cookies);
} else {
searchHeaders.Cookie = result.cookies;
}
logger.log("Challenge solved, retrying search...");
// Delay briefly before retry
await delay(DELAY_MS);
res = await fetch(searchUrl, {
method: "GET",
headers: searchHeaders,
});
if (!res.ok && res.status !== 200) {
logger.warn(`Retry after challenge returned ${res.status}`);
return finalizeResults([]);
}
const retryHtml = await res.text();
await delay(DELAY_MS);
const listings = parseEbayListings(
retryHtml,
keywords,
exclusions,
strictMode,
);
const filteredListings = listings.filter((listing) => {
const cents = listing.listingPrice?.cents;
return (
typeof cents === "number" && cents >= minPrice && cents <= maxPrice
);
});
logger.log(
`Parsed ${filteredListings.length} eBay listings (after challenge).`,
);
return finalizeResults(filteredListings);
}
logger.warn("Challenge solve failed, returning empty results.");
return finalizeResults([]);
}
// Step 4: Normal flow — no challenge
if (!res.ok) { if (!res.ok) {
throw new HttpError( throw new HttpError(
`Request failed with status ${res.status}`, `Request failed with status ${res.status}`,
@@ -513,20 +922,17 @@ export default async function fetchEbayItems(
); );
} }
const searchHtml = await res.text();
// Respect per-request delay to keep at or under REQUESTS_PER_SECOND
await delay(DELAY_MS); await delay(DELAY_MS);
logger.log(`\nParsing eBay listings...`); logger.log(`\nParsing eBay listings...`);
const listings = parseEbayListings( const listings = parseEbayListings(
searchHtml, responseHtml,
keywords, keywords,
exclusions, exclusions,
strictMode, strictMode,
); );
// Filter by price range (additional safety check)
const filteredListings = listings.filter((listing) => { const filteredListings = listings.filter((listing) => {
const cents = listing.listingPrice?.cents; const cents = listing.listingPrice?.cents;
return ( return (
@@ -537,9 +943,9 @@ export default async function fetchEbayItems(
logger.log(`Parsed ${filteredListings.length} eBay listings.`); logger.log(`Parsed ${filteredListings.length} eBay listings.`);
return finalizeResults(filteredListings); return finalizeResults(filteredListings);
} catch (err) { } catch (err) {
if (err instanceof HttpError) { if (err instanceof HttpError || err instanceof RateLimitError) {
console.error( logger.warn(
`Failed to fetch eBay search (${err.status}): ${err.message}`, `Failed to fetch eBay search (${err instanceof HttpError ? err.statusCode : 429}): ${err.message}`,
); );
return finalizeResults([]); return finalizeResults([]);
} }

View File

@@ -12,9 +12,8 @@ import {
formatCookiesForHeader, formatCookiesForHeader,
parseCookieString, parseCookieString,
} from "../utils/cookies"; } from "../utils/cookies";
import { delay } from "../utils/delay";
import { formatCentsToCurrency } from "../utils/format"; import { formatCentsToCurrency } from "../utils/format";
import { isRecord } from "../utils/http"; import { fetchHtml, HttpError, isRecord, RateLimitError } from "../utils/http";
import { logger } from "../utils/logger"; import { logger } from "../utils/logger";
import { classifyUnstableListings } from "../utils/unstable"; import { classifyUnstableListings } from "../utils/unstable";
@@ -219,17 +218,6 @@ export async function ensureFacebookCookies(): Promise<Cookie[]> {
return ensureCookies(FACEBOOK_COOKIE_CONFIG); return ensureCookies(FACEBOOK_COOKIE_CONFIG);
} }
class HttpError extends Error {
constructor(
message: string,
public readonly status: number,
public readonly url: string,
) {
super(message);
this.name = "HttpError";
}
}
// ----------------------------- Extraction Metrics ----------------------------- // ----------------------------- Extraction Metrics -----------------------------
/** /**
@@ -274,112 +262,21 @@ function logExtractionMetrics(success: boolean, itemId?: string) {
// ----------------------------- HTTP Client ----------------------------- // ----------------------------- HTTP Client -----------------------------
/** function createFacebookHeaders(cookies: string): Record<string, string> {
Fetch HTML with a basic retry strategy and simple rate-limit delay between calls. return {
- Retries on 429 and 5xx accept:
- Respects X-RateLimit-Reset when present (seconds) "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
- Supports custom cookies for Facebook authentication "accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
*/ "cache-control": "no-cache",
async function fetchHtml( "upgrade-insecure-requests": "1",
url: string, "sec-fetch-dest": "document",
DELAY_MS: number, "sec-fetch-mode": "navigate",
opts?: { "sec-fetch-site": "none",
maxRetries?: number; "sec-fetch-user": "?1",
retryBaseMs?: number; "user-agent":
onRateInfo?: (remaining: string | null, reset: string | null) => void; "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
cookies?: string; cookie: cookies,
}, };
): Promise<{ html: HTMLString; responseUrl: string }> {
const maxRetries = opts?.maxRetries ?? 3;
const retryBaseMs = opts?.retryBaseMs ?? 500;
let lastRateLimitError: HttpError | null = null;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const headers: Record<string, string> = {
accept:
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
"accept-encoding": "gzip, deflate, br",
"cache-control": "no-cache",
"upgrade-insecure-requests": "1",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"user-agent":
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
};
// Add cookies if provided
if (opts?.cookies) {
headers.cookie = opts.cookies;
}
const res = await fetch(url, {
method: "GET",
headers,
});
const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining");
const rateLimitReset = res.headers.get("X-RateLimit-Reset");
opts?.onRateInfo?.(rateLimitRemaining, rateLimitReset);
if (!res.ok) {
// Respect 429 reset if provided
if (res.status === 429) {
lastRateLimitError = new HttpError(
`Request failed with status ${res.status}`,
res.status,
url,
);
const resetSeconds = rateLimitReset
? Number(rateLimitReset)
: Number.NaN;
const waitMs = Number.isFinite(resetSeconds)
? Math.max(0, resetSeconds * 1000)
: (attempt + 1) * retryBaseMs;
if (attempt >= maxRetries) {
throw lastRateLimitError;
}
await delay(waitMs);
continue;
}
// For Facebook, 400 often means authentication required
// Don't retry 4xx client errors except 429
if (res.status >= 400 && res.status < 500 && res.status !== 429) {
throw new HttpError(
`Request failed with status ${res.status} (Facebook may require authentication cookies for access)`,
res.status,
url,
);
}
// Retry on 5xx
if (res.status >= 500 && res.status < 600 && attempt < maxRetries) {
await delay((attempt + 1) * retryBaseMs);
continue;
}
throw new HttpError(
`Request failed with status ${res.status}`,
res.status,
url,
);
}
const html = await res.text();
// Respect per-request delay to keep at or under REQUESTS_PER_SECOND
await delay(DELAY_MS);
return { html, responseUrl: res.url || url };
} catch (err) {
if (err instanceof HttpError) {
throw err;
}
if (attempt >= maxRetries) throw err;
await delay((attempt + 1) * retryBaseMs);
}
}
throw lastRateLimitError ?? new Error("Exhausted retries without response");
} }
// ----------------------------- Parsing ----------------------------- // ----------------------------- Parsing -----------------------------
@@ -1157,6 +1054,8 @@ export default async function fetchFacebookItems(
try { try {
const response = await fetchHtml(searchUrl, DELAY_MS, { const response = await fetchHtml(searchUrl, DELAY_MS, {
maxRetries: 3, maxRetries: 3,
includeResponseUrl: true,
headers: createFacebookHeaders(cookiesHeader),
onRateInfo: (remaining, reset) => { onRateInfo: (remaining, reset) => {
if (remaining && reset) { if (remaining && reset) {
logger.log( logger.log(
@@ -1164,22 +1063,29 @@ export default async function fetchFacebookItems(
); );
} }
}, },
cookies: cookiesHeader,
}); });
searchHtml = response.html; searchHtml = response.html;
searchResponseUrl = response.responseUrl; searchResponseUrl = response.responseUrl;
} catch (err) { } catch (err) {
if (err instanceof HttpError) { if (err instanceof HttpError) {
logger.warn( logger.warn(
`\nFacebook marketplace access failed (${err.status}): ${err.message}`, `\nFacebook marketplace access failed (${err.statusCode}): ${err.message}`,
); );
if (err.status === 400 || err.status === 401 || err.status === 403) { if (
err.statusCode === 400 ||
err.statusCode === 401 ||
err.statusCode === 403
) {
logger.warn( logger.warn(
"This might indicate invalid or expired cookies. Update FACEBOOK_COOKIE with a fresh raw Cookie header string.", "This might indicate invalid or expired cookies. Update FACEBOOK_COOKIE with a fresh raw Cookie header string.",
); );
} }
return finalizeResults([]); return finalizeResults([]);
} }
if (err instanceof RateLimitError) {
logger.warn(`\nFacebook marketplace access rate limited: ${err.message}`);
return finalizeResults([]);
}
throw err; throw err;
} }
@@ -1261,6 +1167,8 @@ export async function fetchFacebookItem(
let itemResponseUrl = itemUrl; let itemResponseUrl = itemUrl;
try { try {
const response = await fetchHtml(itemUrl, 1000, { const response = await fetchHtml(itemUrl, 1000, {
includeResponseUrl: true,
headers: createFacebookHeaders(cookiesHeader),
onRateInfo: (remaining, reset) => { onRateInfo: (remaining, reset) => {
if (remaining && reset) { if (remaining && reset) {
logger.log( logger.log(
@@ -1268,18 +1176,17 @@ export async function fetchFacebookItem(
); );
} }
}, },
cookies: cookiesHeader,
}); });
itemHtml = response.html; itemHtml = response.html;
itemResponseUrl = response.responseUrl; itemResponseUrl = response.responseUrl;
} catch (err) { } catch (err) {
if (err instanceof HttpError) { if (err instanceof HttpError) {
logger.warn( logger.warn(
`\nFacebook marketplace item access failed (${err.status}): ${err.message}`, `\nFacebook marketplace item access failed (${err.statusCode}): ${err.message}`,
); );
// Enhanced error handling based on status codes // Enhanced error handling based on status codes
switch (err.status) { switch (err.statusCode) {
case 400: case 400:
case 401: case 401:
case 403: case 403:
@@ -1305,10 +1212,19 @@ export async function fetchFacebookItem(
); );
break; break;
default: default:
logger.warn(`Unexpected error status: ${err.status}`); logger.warn(`Unexpected error status: ${err.statusCode}`);
} }
return null; return null;
} }
if (err instanceof RateLimitError) {
logger.warn(
`\nFacebook marketplace item rate limited for item ${itemId}: ${err.message}`,
);
logger.warn(
"Rate limited: Too many requests. Facebook is blocking access temporarily.",
);
return null;
}
throw err; throw err;
} }

View File

@@ -11,6 +11,7 @@ import {
formatCookiesForHeader, formatCookiesForHeader,
loadCookiesOptional, loadCookiesOptional,
} from "../utils/cookies"; } from "../utils/cookies";
import { delay } from "../utils/delay";
import { formatCentsToCurrency } from "../utils/format"; import { formatCentsToCurrency } from "../utils/format";
import { import {
fetchHtml, fetchHtml,
@@ -568,78 +569,6 @@ export function parseSearch(
return results; return results;
} }
/**
Parse a listing page into a typed object (backward compatible).
*/
function _parseListing(
htmlString: HTMLString,
BASE_URL: string,
): KijijiListingDetails | null {
const apolloState = extractApolloState(htmlString);
if (!apolloState) return null;
const listingKey = findApolloListingKey(
apolloState,
(value) => typeof value.url === "string" && typeof value.title === "string",
);
if (!listingKey) return null;
const root = apolloState[listingKey];
if (!isRecord(root)) return null;
const {
url,
title,
description,
price,
type,
status,
activationDate,
endDate,
metrics,
location,
} = root as ApolloListingRoot;
const cents = price?.amount != null ? Number(price.amount) : undefined;
const amountFormatted =
cents != null ? formatCentsToCurrency(cents, "en-CA") : undefined;
const numberOfViews =
metrics?.views != null ? Number(metrics.views) : undefined;
const listingUrl =
typeof url === "string"
? url.startsWith("http")
? url
: `${BASE_URL}${url}`
: "";
if (!listingUrl || !title) return null;
return {
url: listingUrl,
title,
description,
listingPrice: amountFormatted
? {
amountFormatted,
cents:
cents !== undefined && Number.isFinite(cents) ? cents : undefined,
currency: price?.currency,
}
: undefined,
listingType: type,
listingStatus: status,
creationDate: activationDate,
endDate,
numberOfViews:
numberOfViews !== undefined && Number.isFinite(numberOfViews)
? numberOfViews
: undefined,
address: location?.address ?? null,
};
}
/** /**
* Parse a listing page into a detailed object with all available fields * Parse a listing page into a detailed object with all available fields
*/ */
@@ -893,7 +822,17 @@ export default async function fetchKijijiItems(
const searchResults = parseSearch(searchHtml, BASE_URL); const searchResults = parseSearch(searchHtml, BASE_URL);
if (searchResults.length === 0) { if (searchResults.length === 0) {
logger.log(`No more results found on page ${page}. Stopping pagination.`); if (page === 1) {
logger.log(
`No results found on page 1. The search URL was: ${searchUrl}\n` +
`Tip: Kijiji matches ALL words in the query against listing titles. ` +
`Try a shorter or more common query (e.g. "macbook air m1" instead of "macbook air m1 apple silicon").`,
);
} else {
logger.log(
`No more results found on page ${page}. Stopping pagination.`,
);
}
break; break;
} }
@@ -928,9 +867,7 @@ export default async function fetchKijijiItems(
const batchPromises = batch.map(async (link, batchIndex) => { const batchPromises = batch.map(async (link, batchIndex) => {
try { try {
if (batchIndex > 0) { if (batchIndex > 0) {
await new Promise((resolve) => await delay(DELAY_MS * batchIndex);
setTimeout(resolve, DELAY_MS * batchIndex),
);
} }
const html = await fetchHtml(link, 0, { const html = await fetchHtml(link, 0, {
@@ -952,11 +889,11 @@ export default async function fetchKijijiItems(
return parsed; return parsed;
} catch (err) { } catch (err) {
if (err instanceof HttpError) { if (err instanceof HttpError) {
console.error( logger.warn(
`\nFailed to fetch ${link}\n - ${err.statusCode} ${err.message}`, `\nFailed to fetch ${link}\n - ${err.statusCode} ${err.message}`,
); );
} else { } else {
console.error( logger.warn(
`\nFailed to fetch ${link}\n - ${String((err as Error)?.message || err)}`, `\nFailed to fetch ${link}\n - ${String((err as Error)?.message || err)}`,
); );
} }
@@ -974,7 +911,7 @@ export default async function fetchKijijiItems(
results.push(...batchResults); results.push(...batchResults);
if (i + CONCURRENT_REQUESTS < newListingLinks.length) { if (i + CONCURRENT_REQUESTS < newListingLinks.length) {
await new Promise((resolve) => setTimeout(resolve, DELAY_MS)); await delay(DELAY_MS);
} }
} }

View File

@@ -0,0 +1,25 @@
declare module "argon2-wasm-pro" {
interface Argon2Options {
pass: string | Uint8Array;
salt: Uint8Array;
time: number;
mem: number;
hashLen: number;
parallelism: number;
type: number;
}
interface Argon2Result {
hash: Uint8Array;
hashHex: string;
encoded: string;
}
function hash(options: Argon2Options): Promise<Argon2Result>;
const argon2: {
hash: typeof hash;
};
export default argon2;
}

View File

@@ -7,6 +7,7 @@ import { logger } from "./logger";
export interface Cookie { export interface Cookie {
name: string; name: string;
value: string; value: string;
rawValue?: string;
domain: string; domain: string;
path: string; path: string;
secure?: boolean; secure?: boolean;
@@ -55,6 +56,7 @@ export function parseCookieString(
return { return {
name: trimmedName, name: trimmedName,
value: decodeURIComponent(trimmedValue), value: decodeURIComponent(trimmedValue),
rawValue: trimmedValue,
domain, domain,
path: "/", path: "/",
secure: true, secure: true,
@@ -95,7 +97,7 @@ export function formatCookiesForHeader(
}); });
return validCookies return validCookies
.map((cookie) => `${cookie.name}=${cookie.value}`) .map((cookie) => `${cookie.name}=${cookie.rawValue ?? cookie.value}`)
.join("; "); .join("; ");
} }

View File

@@ -0,0 +1,239 @@
import argon2 from "argon2-wasm-pro";
// ------------------ Types ------------------
interface ChallengeDetails {
p2: number;
p6: number;
p7: number;
p9: string;
p11: string;
p12: number;
p13: number;
p15: number;
}
interface ChallengeParams {
crefId: string;
cdetail: ChallengeDetails;
iid: string;
chlghost: string;
appName: string;
p: string;
destUrl: string;
}
interface ChallengeResult {
cookies: string;
}
// ------------------ Helpers ------------------
function memcmp(a: Uint8Array, b: number[], len: number): number {
for (let i = 0; i < len; i++) {
const va = a[i] ?? 0;
const vb = b[i] ?? 0;
if (va !== vb) return (va & 0xff) - (vb & 0xff);
}
return 0;
}
function intToBytes(val: number, arr: Uint8Array, offset: number) {
arr[offset] = val >>> 24;
arr[offset + 1] = val >>> 16;
arr[offset + 2] = val >>> 8;
arr[offset + 3] = val;
}
function string2Bin(str: string): number[] {
const result: number[] = [];
for (let i = 0; i < str.length; i++) {
result.push(str.charCodeAt(i));
}
return result;
}
function bufferToBase64(buf: Uint8Array): string {
return btoa(String.fromCharCode(...buf));
}
function parseCookiesFromSetCookie(cookies: string[]): Record<string, string> {
const result: Record<string, string> = {};
for (const header of cookies) {
const match = header.match(/^([^=]+)=([^;]+)/);
if (match?.[1] && match[2]) {
result[match[1]] = match[2];
}
}
return result;
}
// ------------------ Default headers ------------------
const BROWSER_UA =
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36";
const _EBAY_HEADERS: Record<string, string> = {
"User-Agent": BROWSER_UA,
Accept:
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-CA,en-US;q=0.9,en;q=0.8",
};
// ------------------ Parser ------------------
export function parseChallengePage(html: string): ChallengeParams | null {
const getHidden = (id: string): string => {
const re = new RegExp(
`id=${id}\\s+value='([^']*)'` +
`|id=${id}\\s+value="([^"]*)"` +
`|id=${id}\\s+value=([^\\s>]+)`,
"i",
);
const m = html.match(re);
if (!m) return "";
return m[1] ?? m[2] ?? m[3] ?? "";
};
const crefId = getHidden("_crefId");
const cdetailRaw = getHidden("_cdetail");
const iid = getHidden("_iid");
const chlghost = getHidden("_chlghost");
const appName = getHidden("_appName");
const p = getHidden("_p");
const formActionMatch = html.match(
/<form\s+id=destForm\s+[^>]*action=([^\s>]+)/i,
);
const destUrl = formActionMatch?.[1]?.trim() ?? "";
if (!crefId || !cdetailRaw) return null;
let cdetail: ChallengeDetails;
try {
const parsed = JSON.parse(cdetailRaw);
const d = parsed.details;
cdetail = {
p2: Number(d.p2),
p6: Number(d.p6),
p7: Number(d.p7),
p9: d.p9,
p11: d.p11,
p12: Number(d.p12),
p13: Number(d.p13),
p15: Number(d.p15),
};
} catch {
return null;
}
return {
crefId,
cdetail,
iid,
chlghost: chlghost || "https://www.ebay.ca",
appName: appName || "orch",
p,
destUrl,
};
}
// ------------------ Solver ------------------
async function solveArgon2Challenge(
cdetail: ChallengeDetails,
): Promise<string[]> {
const targetBytes = string2Bin(atob(cdetail.p11));
const targetLen = targetBytes.length;
const nonceLen = cdetail.p6;
const answerCount = cdetail.p15;
const salt = new Uint8Array(
Uint8Array.from(atob(cdetail.p9), (c) => c.charCodeAt(0)),
);
const answers: string[] = [];
let nonce = new Uint8Array(nonceLen);
crypto.getRandomValues(nonce);
intToBytes(0, nonce, nonce.length - 4);
let counter = 0;
while (answers.length < answerCount) {
const result = await argon2.hash({
pass: nonce,
salt,
time: cdetail.p2,
mem: cdetail.p13,
hashLen: cdetail.p7,
parallelism: cdetail.p12,
type: 2,
});
const hashBytes = result.hash as Uint8Array;
if (memcmp(hashBytes, targetBytes, targetLen) <= 0) {
answers.push(bufferToBase64(nonce));
nonce = new Uint8Array(nonceLen);
crypto.getRandomValues(nonce);
intToBytes(0, nonce, nonce.length - 4);
counter = 0;
} else {
counter++;
intToBytes(counter, nonce, nonce.length - 4);
}
}
return answers;
}
// ------------------ Public API ------------------
export async function solveEbayChallenge(
html: string,
cookieHeader?: string,
): Promise<ChallengeResult | null> {
const params = parseChallengePage(html);
if (!params) return null;
const answers = await solveArgon2Challenge(params.cdetail);
const encodedAnswers = encodeURIComponent(answers.join(","));
const body = JSON.stringify({
iid: params.iid,
appName: params.appName,
referenceId: params.crefId,
pvt: Date.now().toString(),
crt: Date.now().toString(),
encodedAnswers,
p: params.p,
ru: params.destUrl,
});
const headers: Record<string, string> = {
"content-type": "application/json",
accept: "application/json, text/plain, */*",
"user-agent": BROWSER_UA,
};
if (cookieHeader) {
headers.cookie = cookieHeader;
}
const res = await fetch(`${params.chlghost}/splashui/challengesvc/answer`, {
method: "POST",
headers,
body,
});
if (!res.ok) return null;
// Collect cookies from answer response
const setCookies = res.headers.getSetCookie?.() ?? [];
const answerCookies = parseCookiesFromSetCookie(setCookies);
const cookieEntries = Object.entries(answerCookies);
if (cookieEntries.length === 0) return null;
const cookies = cookieEntries.map(([k, v]) => `${k}=${v}`).join("; ");
return { cookies };
}

View File

@@ -1,3 +1,4 @@
import type { HTMLString } from "../types/common";
import { delay } from "./delay"; import { delay } from "./delay";
/** Custom error class for HTTP-related failures */ /** Custom error class for HTTP-related failures */
@@ -60,10 +61,57 @@ export function isRecord(value: unknown): value is Record<string, unknown> {
/** /**
* Calculate exponential backoff delay with jitter * Calculate exponential backoff delay with jitter
*/ */
function calculateBackoffDelay(attempt: number, baseMs: number): number { function calculateBackoffDelay(
attempt: number,
baseMs: number,
jitter: () => number = Math.random,
): number {
const exponentialDelay = baseMs * 2 ** attempt; const exponentialDelay = baseMs * 2 ** attempt;
const jitter = Math.random() * 0.1 * exponentialDelay; // 10% jitter const jitterDelay = jitter() * 0.1 * exponentialDelay; // 10% jitter
return Math.min(exponentialDelay + jitter, 30000); // Cap at 30 seconds return Math.min(exponentialDelay + jitterDelay, 30000); // Cap at 30 seconds
}
const MAX_RATE_LIMIT_WAIT_MS = 30_000;
const MAX_DELTA_RESET_SECONDS = 86_400;
function mergeHeaders(
defaultHeaders: Record<string, string>,
customHeaders?: Record<string, string>,
): Record<string, string> {
const merged: Record<string, string> = {};
for (const [key, value] of Object.entries(defaultHeaders)) {
merged[key.toLowerCase()] = value;
}
for (const [key, value] of Object.entries(customHeaders ?? {})) {
merged[key.toLowerCase()] = value;
}
return merged;
}
function calculateRateLimitWaitMs(
resetHeader: string | null,
fallbackWaitMs: number,
): number {
if (!resetHeader) return fallbackWaitMs;
const resetValue = Number(resetHeader);
if (!Number.isFinite(resetValue)) return fallbackWaitMs;
const waitMs =
resetValue <= MAX_DELTA_RESET_SECONDS
? resetValue * 1000
: resetValue * 1000 - Date.now();
return Math.min(Math.max(0, waitMs), MAX_RATE_LIMIT_WAIT_MS);
}
/** Result type when includeResponseUrl is true */
export interface FetchHtmlResult {
html: HTMLString;
responseUrl: string;
} }
/** Options for fetchHtml */ /** Options for fetchHtml */
@@ -73,6 +121,8 @@ export interface FetchHtmlOptions {
timeoutMs?: number; timeoutMs?: number;
onRateInfo?: (remaining: string | null, reset: string | null) => void; onRateInfo?: (remaining: string | null, reset: string | null) => void;
headers?: Record<string, string>; headers?: Record<string, string>;
includeResponseUrl?: boolean;
jitter?: () => number;
} }
/** /**
@@ -80,14 +130,24 @@ export interface FetchHtmlOptions {
* @param url - The URL to fetch * @param url - The URL to fetch
* @param delayMs - Delay in milliseconds between requests (rate limiting) * @param delayMs - Delay in milliseconds between requests (rate limiting)
* @param opts - Optional fetch options * @param opts - Optional fetch options
* @returns The HTML content as a string * @returns The HTML content as a string, or an object with html and responseUrl
* @throws HttpError, NetworkError, or RateLimitError on failure * @throws HttpError, NetworkError, or RateLimitError on failure
*/ */
export async function fetchHtml(
url: string,
delayMs: number,
opts: FetchHtmlOptions & { includeResponseUrl: true },
): Promise<FetchHtmlResult>;
export async function fetchHtml( export async function fetchHtml(
url: string, url: string,
delayMs: number, delayMs: number,
opts?: FetchHtmlOptions, opts?: FetchHtmlOptions,
): Promise<string> { ): Promise<HTMLString>;
export async function fetchHtml(
url: string,
delayMs: number,
opts?: FetchHtmlOptions,
): Promise<HTMLString | FetchHtmlResult> {
const maxRetries = opts?.maxRetries ?? 3; const maxRetries = opts?.maxRetries ?? 3;
const retryBaseMs = opts?.retryBaseMs ?? 1000; const retryBaseMs = opts?.retryBaseMs ?? 1000;
const timeoutMs = opts?.timeoutMs ?? 30000; const timeoutMs = opts?.timeoutMs ?? 30000;
@@ -118,13 +178,17 @@ export async function fetchHtml(
const controller = new AbortController(); const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeoutMs); const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
const res = await fetch(url, { const res = await (async () => {
method: "GET", try {
headers: { ...defaultHeaders, ...opts?.headers }, return await fetch(url, {
signal: controller.signal, method: "GET",
}); headers: mergeHeaders(defaultHeaders, opts?.headers),
signal: controller.signal,
clearTimeout(timeoutId); });
} finally {
clearTimeout(timeoutId);
}
})();
const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining"); const rateLimitRemaining = res.headers.get("X-RateLimit-Remaining");
const rateLimitReset = res.headers.get("X-RateLimit-Reset"); const rateLimitReset = res.headers.get("X-RateLimit-Reset");
@@ -136,12 +200,17 @@ export async function fetchHtml(
const resetSeconds = rateLimitReset const resetSeconds = rateLimitReset
? Number(rateLimitReset) ? Number(rateLimitReset)
: Number.NaN; : Number.NaN;
const waitMs = Number.isFinite(resetSeconds) const waitMs = calculateRateLimitWaitMs(
? Math.max(0, resetSeconds * 1000) rateLimitReset,
: calculateBackoffDelay(attempt, retryBaseMs); calculateBackoffDelay(
attempt,
retryBaseMs,
opts?.jitter ?? Math.random,
),
);
if (attempt < maxRetries) { if (attempt < maxRetries) {
await new Promise((resolve) => setTimeout(resolve, waitMs)); await delay(waitMs);
continue; continue;
} }
throw new RateLimitError( throw new RateLimitError(
@@ -153,8 +222,12 @@ export async function fetchHtml(
// Retry on server errors // Retry on server errors
if (res.status >= 500 && res.status < 600 && attempt < maxRetries) { if (res.status >= 500 && res.status < 600 && attempt < maxRetries) {
await new Promise((resolve) => await delay(
setTimeout(resolve, calculateBackoffDelay(attempt, retryBaseMs)), calculateBackoffDelay(
attempt,
retryBaseMs,
opts?.jitter ?? Math.random,
),
); );
continue; continue;
} }
@@ -170,7 +243,9 @@ export async function fetchHtml(
// Respect per-request delay to maintain rate limiting // Respect per-request delay to maintain rate limiting
await delay(delayMs); await delay(delayMs);
return html; return opts?.includeResponseUrl
? { html, responseUrl: res.url || url }
: html;
} catch (err) { } catch (err) {
// Re-throw known errors // Re-throw known errors
if ( if (
@@ -183,8 +258,12 @@ export async function fetchHtml(
if (err instanceof Error && err.name === "AbortError") { if (err instanceof Error && err.name === "AbortError") {
if (attempt < maxRetries) { if (attempt < maxRetries) {
await new Promise((resolve) => await delay(
setTimeout(resolve, calculateBackoffDelay(attempt, retryBaseMs)), calculateBackoffDelay(
attempt,
retryBaseMs,
opts?.jitter ?? Math.random,
),
); );
continue; continue;
} }
@@ -193,8 +272,12 @@ export async function fetchHtml(
// Network or other errors // Network or other errors
if (attempt < maxRetries) { if (attempt < maxRetries) {
await new Promise((resolve) => await delay(
setTimeout(resolve, calculateBackoffDelay(attempt, retryBaseMs)), calculateBackoffDelay(
attempt,
retryBaseMs,
opts?.jitter ?? Math.random,
),
); );
continue; continue;
} }

View File

@@ -29,9 +29,11 @@ const originalWarn = console.warn;
describe("eBay Scraper Cookie Handling", () => { describe("eBay Scraper Cookie Handling", () => {
beforeEach(() => { beforeEach(() => {
delete process.env.EBAY_COOKIE;
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => Promise.resolve("<html><body></body></html>"), text: () => Promise.resolve("<html><body></body></html>"),
}), }),
) as unknown as typeof fetch; ) as unknown as typeof fetch;
@@ -46,17 +48,22 @@ describe("eBay Scraper Cookie Handling", () => {
test("should ignore request cookie overrides and rely on EBAY_COOKIE", async () => { test("should ignore request cookie overrides and rely on EBAY_COOKIE", async () => {
await fetchEbayItems("laptop", 1000); await fetchEbayItems("laptop", 1000);
expect(global.fetch).toHaveBeenCalledTimes(1); // First call is homepage warm-up, second is search
expect(global.fetch).toHaveBeenCalledTimes(2);
const firstFetchCall = (global.fetch as unknown as ReturnType<typeof mock>) // The search request is the second call
.mock.calls[0]; const secondFetchCall = (global.fetch as unknown as ReturnType<typeof mock>)
if (!firstFetchCall) { .mock.calls[1];
throw new Error("Expected fetch to be called"); if (!secondFetchCall) {
throw new Error("Expected search fetch to be called");
} }
const [, init] = firstFetchCall; const [searchUrl, init] = secondFetchCall;
const headers = (init as RequestInit).headers as Record<string, string>; const headers = (init as RequestInit).headers as Record<string, string>;
expect(searchUrl).toBe(
"https://www.ebay.ca/sch/i.html?_nkw=laptop&_sacat=0&_from=R40&LH_BIN=1&LH_PrefLoc=1",
);
expect(headers.Cookie).toBeUndefined(); expect(headers.Cookie).toBeUndefined();
}); });
@@ -64,6 +71,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -84,10 +92,26 @@ describe("eBay Scraper Cookie Handling", () => {
]); ]);
}); });
test("returns empty results when eBay rate-limits the request", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: false,
status: 429,
headers: { get: () => "0" },
text: () => Promise.resolve(""),
}),
) as unknown as typeof fetch;
const results = await fetchEbayItems("laptop", 1000);
expect(results).toEqual([]);
});
test("deduplicates repeated item links from the same card", async () => { test("deduplicates repeated item links from the same card", async () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -114,6 +138,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -146,6 +171,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -188,6 +214,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -210,10 +237,86 @@ describe("eBay Scraper Cookie Handling", () => {
]); ]);
}); });
test("parses current eBay s-card markup with unquoted item links", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(`
<html><body>
<div class="s-card s-card--horizontal">
<div class=su-card-container__header>
<a class=s-card__link href=https://ebay.com/itm/1234567890?itmmeta=abc>
<div role=heading aria-level=3 class=s-card__title>
<span class="su-styled-text primary default">Apple MacBook Air M1 2020 8GB 256GB</span>
</div>
</a>
</div>
<div class=su-card-container__attributes>
<span class="su-styled-text primary bold large-1 s-card__price">CA $599.00</span>
</div>
</div>
</body></html>
`),
}),
) as unknown as typeof fetch;
const results = await fetchEbayItems("macbook", 1000);
expect(results).toEqual([
expect.objectContaining({
title: "Apple MacBook Air M1 2020 8GB 256GB",
url: "https://ebay.com/itm/1234567890?itmmeta=abc",
listingPrice: expect.objectContaining({ cents: 59_900 }),
}),
]);
});
test("parses embedded eBay payload listings before HTML fallback", async () => {
const payload = encodeURIComponent(
JSON.stringify({
searchResults: [
{
title: "Apple MacBook Air M1 API Result",
itemWebUrl: "https://www.ebay.ca/itm/9876543210?hash=item987",
price: { value: "550.00", currency: "CAD" },
},
],
}),
);
global.fetch = mock(() =>
Promise.resolve({
ok: true,
text: () =>
Promise.resolve(`
<html><body>
<script data-inlinepayload="${payload}"></script>
</body></html>
`),
}),
) as unknown as typeof fetch;
const results = await fetchEbayItems("macbook", 1000);
expect(results).toEqual([
expect.objectContaining({
title: "Apple MacBook Air M1 API Result",
url: "https://www.ebay.ca/itm/9876543210?hash=item987",
listingPrice: expect.objectContaining({
amountFormatted: "CAD 550.00",
cents: 55_000,
currency: "CAD",
}),
}),
]);
});
test("treats US dollar prices as USD", async () => { test("treats US dollar prices as USD", async () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -243,6 +346,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -272,6 +376,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -301,6 +406,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -343,6 +449,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -375,6 +482,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -407,6 +515,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -440,6 +549,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -467,6 +577,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -499,6 +610,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -529,6 +641,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -574,6 +687,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>
@@ -612,6 +726,7 @@ describe("eBay Scraper Cookie Handling", () => {
global.fetch = mock(() => global.fetch = mock(() =>
Promise.resolve({ Promise.resolve({
ok: true, ok: true,
headers: { get: () => null },
text: () => text: () =>
Promise.resolve(` Promise.resolve(`
<html><body> <html><body>

View File

@@ -70,6 +70,7 @@ describe("Facebook Marketplace Scraper Core Tests", () => {
expect(result[0]).toEqual({ expect(result[0]).toEqual({
name: "c_user", name: "c_user",
value: "123456789", value: "123456789",
rawValue: "123456789",
domain: ".facebook.com", domain: ".facebook.com",
path: "/", path: "/",
secure: true, secure: true,
@@ -80,6 +81,7 @@ describe("Facebook Marketplace Scraper Core Tests", () => {
expect(result[1]).toEqual({ expect(result[1]).toEqual({
name: "xs", name: "xs",
value: "abcdef123456", value: "abcdef123456",
rawValue: "abcdef123456",
domain: ".facebook.com", domain: ".facebook.com",
path: "/", path: "/",
secure: true, secure: true,
@@ -97,6 +99,16 @@ describe("Facebook Marketplace Scraper Core Tests", () => {
expect(result[1]?.value).toBe("abc=def"); expect(result[1]?.value).toBe("abc=def");
}); });
test("should preserve raw encoded values when formatting cookie headers", () => {
const cookieString = "c_user=123%2B456; xs=abc%3Ddef";
const result = formatCookiesForHeader(
parseFacebookCookieString(cookieString),
"www.facebook.com",
);
expect(result).toBe(cookieString);
});
test("should filter out malformed cookies", () => { test("should filter out malformed cookies", () => {
const cookieString = "c_user=123; invalid; xs=abc; =empty"; const cookieString = "c_user=123; invalid; xs=abc; =empty";
const result = parseFacebookCookieString(cookieString); const result = parseFacebookCookieString(cookieString);

View File

@@ -38,4 +38,87 @@ describe("fetchHtml", () => {
expect(scheduledDelays).not.toContain(1000); expect(scheduledDelays).not.toContain(1000);
}); });
test("fetchHtml returns responseUrl when includeResponseUrl is true", async () => {
process.env.NODE_ENV = "test";
global.fetch = mock(() =>
Promise.resolve({
ok: true,
status: 200,
url: "https://example.test/final",
headers: { get: () => null },
text: () => Promise.resolve("<html></html>"),
}),
) as unknown as typeof fetch;
const result = await fetchHtml("https://example.test", 0, {
includeResponseUrl: true,
});
expect(result.html).toBe("<html></html>");
expect(result.responseUrl).toBe("https://example.test/final");
});
test("rate limit epoch reset uses bounded wait", async () => {
process.env.NODE_ENV = "production";
const scheduledDelays: number[] = [];
const farFutureEpochSeconds = Math.floor(Date.now() / 1000) + 315_360_000;
let calls = 0;
global.fetch = mock(() => {
calls += 1;
return Promise.resolve({
ok: calls > 1,
status: calls > 1 ? 200 : 429,
url: "https://example.test",
headers: {
get: (name: string) =>
name === "X-RateLimit-Reset" ? String(farFutureEpochSeconds) : null,
},
text: () => Promise.resolve("<html></html>"),
});
}) as unknown as typeof fetch;
globalThis.setTimeout = mock((handler: TimerHandler, timeout?: number) => {
scheduledDelays.push(Number(timeout));
if (timeout !== 1_234_567 && typeof handler === "function") {
handler();
}
return 0 as unknown as ReturnType<typeof setTimeout>;
}) as unknown as typeof setTimeout;
globalThis.clearTimeout = mock(() => {}) as unknown as typeof clearTimeout;
await fetchHtml("https://example.test", 0, {
maxRetries: 1,
timeoutMs: 1_234_567,
});
expect(scheduledDelays).toContain(30_000);
expect(scheduledDelays).not.toContain(farFutureEpochSeconds * 1000);
});
test("custom Accept header overrides default accept without duplicate casing", async () => {
process.env.NODE_ENV = "test";
const customAccept = "text/plain";
let requestHeaders: HeadersInit | undefined;
global.fetch = mock((_url: string | URL | Request, init?: RequestInit) => {
requestHeaders = init?.headers;
return Promise.resolve({
ok: true,
status: 200,
url: "https://example.test",
headers: { get: () => null },
text: () => Promise.resolve("<html></html>"),
});
}) as unknown as typeof fetch;
await fetchHtml("https://example.test", 0, {
headers: { Accept: customAccept },
});
expect(requestHeaders).toBeDefined();
expect((requestHeaders as Record<string, string>).accept).toBe(
customAccept,
);
expect((requestHeaders as Record<string, string>).Accept).toBeUndefined();
});
}); });

View File

@@ -1,11 +1,6 @@
// Test setup for Bun test runner global.fetch = Object.assign(
// This file is loaded before any tests run due to bunfig.toml preload () => {
throw new Error("Tests must mock fetch explicitly");
// Mock fetch globally for tests },
global.fetch = { preconnect: fetch.preconnect },
global.fetch || ) as typeof fetch;
(() => {
throw new Error("fetch is not available in test environment");
});
// Add any global test utilities here

View File

@@ -21,5 +21,6 @@
## Verify ## Verify
- `bun test packages/mcp-server/test`
- `bun run --cwd packages/mcp-server build` - `bun run --cwd packages/mcp-server build`
- `bun run ci` - `bun run ci`

View File

@@ -2,7 +2,32 @@ import { logger } from "../logger";
import { tools } from "./tools"; import { tools } from "./tools";
const API_BASE_URL = process.env.API_BASE_URL || "http://localhost:4005/api"; const API_BASE_URL = process.env.API_BASE_URL || "http://localhost:4005/api";
const API_TIMEOUT = Number(process.env.API_TIMEOUT) || 180000; // 3 minutes default const API_TIMEOUT = Number(process.env.API_TIMEOUT) || 180000;
async function callMarketplaceApi(
marketplace: string,
params: URLSearchParams,
): Promise<unknown> {
const url = `${API_BASE_URL}/${marketplace}?${params.toString()}`;
logger.log(`[MCP] Calling ${marketplace} API`);
const response = await Promise.race([
fetch(url),
new Promise<Response>((_, reject) =>
setTimeout(
() => reject(new Error(`Request timed out after ${API_TIMEOUT}ms`)),
API_TIMEOUT,
),
),
]);
if (!response.ok) {
const errorText = await response.text();
logger.error(
`[MCP] ${marketplace} API error ${response.status}: ${errorText}`,
);
throw new Error(`API returned ${response.status}: ${errorText}`);
}
return response.json();
}
/** /**
* Handle MCP JSON-RPC 2.0 protocol requests * Handle MCP JSON-RPC 2.0 protocol requests
@@ -116,7 +141,6 @@ export async function handleMcpRequest(req: Request): Promise<Response> {
params.append("priceMin", args.priceMin.toString()); params.append("priceMin", args.priceMin.toString());
if (args.priceMax) if (args.priceMax)
params.append("priceMax", args.priceMax.toString()); params.append("priceMax", args.priceMax.toString());
if (args.cookies) params.append("cookies", args.cookies);
if (args.unstableFilter !== undefined) if (args.unstableFilter !== undefined)
params.append("unstableFilter", args.unstableFilter.toString()); params.append("unstableFilter", args.unstableFilter.toString());
@@ -139,7 +163,14 @@ export async function handleMcpRequest(req: Request): Promise<Response> {
logger.error( logger.error(
`[MCP] Kijiji API error ${response.status}: ${errorText}`, `[MCP] Kijiji API error ${response.status}: ${errorText}`,
); );
throw new Error(`API returned ${response.status}: ${errorText}`); let errorMessage = `API returned ${response.status}: ${errorText}`;
try {
const errorJson = JSON.parse(errorText) as { message?: string };
if (errorJson.message) errorMessage = errorJson.message;
} catch {
// not JSON — use raw text
}
throw new Error(errorMessage);
} }
result = await response.json(); result = await response.json();
logger.log( logger.log(
@@ -161,31 +192,7 @@ export async function handleMcpRequest(req: Request): Promise<Response> {
if (args.unstableFilter !== undefined) if (args.unstableFilter !== undefined)
params.append("unstableFilter", args.unstableFilter.toString()); params.append("unstableFilter", args.unstableFilter.toString());
logger.log( result = await callMarketplaceApi("facebook", params);
`[MCP] Calling Facebook API: ${API_BASE_URL}/facebook?${params.toString()}`,
);
const response = await Promise.race([
fetch(`${API_BASE_URL}/facebook?${params.toString()}`),
new Promise<Response>((_, reject) =>
setTimeout(
() =>
reject(new Error(`Request timed out after ${API_TIMEOUT}ms`)),
API_TIMEOUT,
),
),
]);
if (!response.ok) {
const errorText = await response.text();
logger.error(
`[MCP] Facebook API error ${response.status}: ${errorText}`,
);
throw new Error(`API returned ${response.status}: ${errorText}`);
}
result = await response.json();
logger.log(
`[MCP] Facebook returned ${Array.isArray(result) ? result.length : 0} items`,
);
} else if (name === "search_ebay") { } else if (name === "search_ebay") {
const query = args.query; const query = args.query;
if (!query) { if (!query) {
@@ -215,31 +222,7 @@ export async function handleMcpRequest(req: Request): Promise<Response> {
if (args.unstableFilter !== undefined) if (args.unstableFilter !== undefined)
params.append("unstableFilter", args.unstableFilter.toString()); params.append("unstableFilter", args.unstableFilter.toString());
logger.log( result = await callMarketplaceApi("ebay", params);
`[MCP] Calling eBay API: ${API_BASE_URL}/ebay?${params.toString()}`,
);
const response = await Promise.race([
fetch(`${API_BASE_URL}/ebay?${params.toString()}`),
new Promise<Response>((_, reject) =>
setTimeout(
() =>
reject(new Error(`Request timed out after ${API_TIMEOUT}ms`)),
API_TIMEOUT,
),
),
]);
if (!response.ok) {
const errorText = await response.text();
logger.error(
`[MCP] eBay API error ${response.status}: ${errorText}`,
);
throw new Error(`API returned ${response.status}: ${errorText}`);
}
result = await response.json();
logger.log(
`[MCP] eBay returned ${Array.isArray(result) ? result.length : 0} items`,
);
} else { } else {
return Response.json({ return Response.json({
jsonrpc: "2.0", jsonrpc: "2.0",

View File

@@ -11,7 +11,11 @@ export const tools = [
properties: { properties: {
query: { query: {
type: "string", type: "string",
description: "Search query for Kijiji listings", description:
"Search query for Kijiji listings. " +
"Kijiji requires ALL words to appear in the listing title — keep queries short and use terms sellers actually write. " +
"Avoid marketing/brand phrases sellers don't use (e.g. use 'macbook air m1' not 'macbook air m1 apple silicon'). " +
"If the search returns no results, try a shorter or more common query.",
}, },
location: { location: {
type: "string", type: "string",
@@ -52,11 +56,6 @@ export const tools = [
type: "number", type: "number",
description: "Maximum price in cents", description: "Maximum price in cents",
}, },
cookies: {
type: "string",
description:
"Optional: Kijiji session cookies to bypass bot detection (JSON array or 'name1=value1; name2=value2')",
},
unstableFilter: { unstableFilter: {
type: "boolean", type: "boolean",
description: description:

View File

@@ -15,18 +15,13 @@ describe("MCP protocol cookie inputs", () => {
global.fetch = originalFetch; global.fetch = originalFetch;
}); });
test("search tools should not expose Facebook or eBay cookie inputs", () => { test("search tools should not expose cookie inputs", () => {
const searchFacebookTool = tools.find( const toolNames = ["search_kijiji", "search_facebook", "search_ebay"];
(tool) => tool.name === "search_facebook", for (const toolName of toolNames) {
); const tool = tools.find((candidate) => candidate.name === toolName);
const searchEbayTool = tools.find((tool) => tool.name === "search_ebay"); expect(tool?.inputSchema.properties).not.toHaveProperty("cookies");
expect(tool?.inputSchema.properties).not.toHaveProperty("cookiesSource");
expect(searchFacebookTool?.inputSchema.properties).not.toHaveProperty( }
"cookiesSource",
);
expect(searchEbayTool?.inputSchema.properties).not.toHaveProperty(
"cookies",
);
}); });
test("search_facebook should not forward cookies query parameters", async () => { test("search_facebook should not forward cookies query parameters", async () => {
@@ -53,6 +48,31 @@ describe("MCP protocol cookie inputs", () => {
expect(String(calledUrl)).toContain("/facebook?q=laptop"); expect(String(calledUrl)).toContain("/facebook?q=laptop");
expect(String(calledUrl)).not.toContain("cookies="); expect(String(calledUrl)).not.toContain("cookies=");
}); });
test("search_kijiji should not forward cookies query parameters", async () => {
await handleMcpRequest(
new Request("http://localhost", {
method: "POST",
body: JSON.stringify({
jsonrpc: "2.0",
id: 1,
method: "tools/call",
params: {
name: "search_kijiji",
arguments: {
query: "laptop",
cookies: "s=1",
},
},
}),
}),
);
const calledUrl = (global.fetch as unknown as ReturnType<typeof mock>).mock
.calls[0]?.[0];
expect(String(calledUrl)).toContain("/kijiji?q=laptop");
expect(String(calledUrl)).not.toContain("cookies=");
});
}); });
describe("MCP protocol unstableFilter", () => { describe("MCP protocol unstableFilter", () => {
@@ -132,6 +152,33 @@ describe("MCP protocol unstableFilter", () => {
expect(String(calledUrl)).toContain("unstableFilter=true"); expect(String(calledUrl)).toContain("unstableFilter=true");
}); });
test("tools/call returns API JSON as text content", async () => {
global.fetch = mock(() =>
Promise.resolve(
new Response(JSON.stringify([{ title: "item" }]), { status: 200 }),
),
) as unknown as typeof fetch;
const response = await handleMcpRequest(
new Request("http://localhost", {
method: "POST",
body: JSON.stringify({
jsonrpc: "2.0",
id: 1,
method: "tools/call",
params: {
name: "search_facebook",
arguments: { query: "laptop" },
},
}),
}),
);
const body = await response.json();
expect(body.result.content[0].type).toBe("text");
expect(JSON.parse(body.result.content[0].text)).toEqual([{ title: "item" }]);
});
test("handler should forward unstableFilter=true for search_ebay", async () => { test("handler should forward unstableFilter=true for search_ebay", async () => {
await handleMcpRequest( await handleMcpRequest(
new Request("http://localhost", { new Request("http://localhost", {