chore: format markdown
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
This commit is contained in:
@@ -1,12 +1,13 @@
|
||||
# Design: Adopt opencode Monorepo Config
|
||||
|
||||
**Date:** 2025-07-14
|
||||
**Status:** Approved
|
||||
**Date:** 2025-07-14\
|
||||
**Status:** Approved\
|
||||
**Approach:** Full adoption (A)
|
||||
|
||||
## Context
|
||||
|
||||
Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 packages (`core`, `api-server`, `mcp-server`). Reference: `anomalyco/opencode` monorepo patterns.
|
||||
Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 packages
|
||||
(`core`, `api-server`, `mcp-server`). Reference: `anomalyco/opencode` monorepo patterns.
|
||||
|
||||
**Gaps vs opencode:**
|
||||
- No Turbo (task orchestration, caching, dep graph)
|
||||
@@ -20,7 +21,8 @@ Current repo (`marketplace-scrapers-monorepo`) has basic bun workspaces with 3 p
|
||||
### 1. Root `package.json`
|
||||
|
||||
- Add `workspaces.catalog` block with shared deps:
|
||||
- `@typescript/native-preview`, `@types/bun`, `@types/unidecode`, `@types/cli-progress`
|
||||
- `@typescript/native-preview`, `@types/bun`, `@types/unidecode`,
|
||||
`@types/cli-progress`
|
||||
- Add `turbo` to `devDependencies`
|
||||
- Add `@tsconfig/bun` to `devDependencies` + catalog
|
||||
- Update root scripts: `typecheck` and `build` delegate to `turbo run`
|
||||
@@ -93,7 +95,8 @@ exact = true
|
||||
root = "./do-not-run-tests-from-root"
|
||||
```
|
||||
|
||||
Exact installs = reproducible. Root test guard prevents accidental root-level test runs.
|
||||
Exact installs = reproducible.
|
||||
Root test guard prevents accidental root-level test runs.
|
||||
|
||||
### 6. Package `exports` field
|
||||
|
||||
@@ -102,7 +105,8 @@ Replace `main`/`module` with `exports` in all 3 packages:
|
||||
"exports": { ".": "./src/index.ts" }
|
||||
```
|
||||
|
||||
Remove `main` and `module` fields. Bun resolves `.ts` directly.
|
||||
Remove `main` and `module` fields.
|
||||
Bun resolves `.ts` directly.
|
||||
|
||||
### 7. Catalog references in per-package `package.json`
|
||||
|
||||
@@ -115,7 +119,7 @@ Replace pinned versions with `"catalog:"` for shared deps:
|
||||
## Files Changed
|
||||
|
||||
| File | Action |
|
||||
|---|---|
|
||||
| --- | --- |
|
||||
| `package.json` | Update (catalog, turbo dep, scripts) |
|
||||
| `turbo.json` | Create |
|
||||
| `tsconfig.json` | Create |
|
||||
|
||||
@@ -3,7 +3,9 @@
|
||||
## Summary
|
||||
|
||||
Remove all file-based and request-provided cookie inputs across the repo.
|
||||
The only supported authentication input becomes a raw `Cookie` header string supplied through scraper-specific environment variables such as `FACEBOOK_COOKIE` and `EBAY_COOKIE`.
|
||||
The only supported authentication input becomes a raw `Cookie` header string supplied
|
||||
through scraper-specific environment variables such as `FACEBOOK_COOKIE` and
|
||||
`EBAY_COOKIE`.
|
||||
|
||||
## Goals
|
||||
|
||||
@@ -17,7 +19,8 @@ The only supported authentication input becomes a raw `Cookie` header string sup
|
||||
|
||||
- Changing scraper behavior unrelated to authentication input.
|
||||
- Adding new cookie formats or migration helpers.
|
||||
- Preserving backward compatibility for cookie files, JSON cookie arrays, or request overrides.
|
||||
- Preserving backward compatibility for cookie files, JSON cookie arrays, or request
|
||||
overrides.
|
||||
|
||||
## Current State
|
||||
|
||||
@@ -27,27 +30,33 @@ The current shared cookie utilities support three sources in priority order:
|
||||
2. Environment variable
|
||||
3. Cookie file
|
||||
|
||||
`packages/core/src/utils/cookies.ts` includes file loading, JSON array parsing, and auto-detection between JSON and header-string formats.
|
||||
Facebook also exposes deprecated `cookiePath` arguments that still reach shared loading logic.
|
||||
Docs in `cookies/AGENTS.md` still describe file-based setup and request-level overrides.
|
||||
`packages/core/src/utils/cookies.ts` includes file loading, JSON array parsing, and
|
||||
auto-detection between JSON and header-string formats.
|
||||
Facebook also exposes deprecated `cookiePath` arguments that still reach shared loading
|
||||
logic. Docs in `cookies/AGENTS.md` still describe file-based setup and request-level
|
||||
overrides.
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
Use the hard-reset approach.
|
||||
Delete the shared multi-source cookie-loading model and reduce the cookie surface to env-header parsing only.
|
||||
This is a larger diff than a surgical removal, but it avoids leaving behind abstractions that imply unsupported inputs still exist.
|
||||
Delete the shared multi-source cookie-loading model and reduce the cookie surface to
|
||||
env-header parsing only.
|
||||
This is a larger diff than a surgical removal, but it avoids leaving behind abstractions
|
||||
that imply unsupported inputs still exist.
|
||||
|
||||
## Design
|
||||
|
||||
### Shared Cookie Utilities
|
||||
|
||||
`packages/core/src/utils/cookies.ts` will keep only the pieces needed for env-header-based auth:
|
||||
`packages/core/src/utils/cookies.ts` will keep only the pieces needed for
|
||||
env-header-based auth:
|
||||
|
||||
- `Cookie` type
|
||||
- A reduced cookie config shape containing only `name`, `domain`, and `envVar`
|
||||
- `parseCookieString()` for raw `Cookie` header strings
|
||||
- `formatCookiesForHeader()` for domain filtering and request formatting
|
||||
- An env-only loader that reads `process.env[config.envVar]`, parses it, and throws a targeted error when missing or invalid
|
||||
- An env-only loader that reads `process.env[config.envVar]`, parses it, and throws a
|
||||
targeted error when missing or invalid
|
||||
|
||||
The following shared utilities will be removed:
|
||||
|
||||
@@ -68,15 +77,18 @@ For Facebook this means:
|
||||
|
||||
For eBay this means:
|
||||
|
||||
- Remove any remaining fallback/file-oriented behavior from shared calls and error strings
|
||||
- Remove any remaining fallback/file-oriented behavior from shared calls and error
|
||||
strings
|
||||
- Keep the existing env-var auth path, but make it the only path
|
||||
|
||||
### Public API Surface
|
||||
|
||||
Exports from `packages/core/src/index.ts` should reflect the new contract.
|
||||
If exported functions currently advertise cookie-source or cookie-path arguments, their signatures will be tightened so callers cannot pass unsupported inputs.
|
||||
If exported functions currently advertise cookie-source or cookie-path arguments, their
|
||||
signatures will be tightened so callers cannot pass unsupported inputs.
|
||||
|
||||
Downstream adapter packages should continue calling core through the simplified signatures without adding their own cookie-loading behavior.
|
||||
Downstream adapter packages should continue calling core through the simplified
|
||||
signatures without adding their own cookie-loading behavior.
|
||||
|
||||
### Error Handling
|
||||
|
||||
@@ -93,8 +105,8 @@ Errors should be blunt and specific:
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
Follow TDD.
|
||||
Start by changing or adding core tests so the old file/request behavior is no longer accepted.
|
||||
Follow TDD. Start by changing or adding core tests so the old file/request behavior is
|
||||
no longer accepted.
|
||||
|
||||
Coverage targets:
|
||||
|
||||
@@ -102,7 +114,8 @@ Coverage targets:
|
||||
2. Missing env vars fail with the new env-only error.
|
||||
3. Invalid env strings fail without falling back to files or request data.
|
||||
4. Facebook APIs no longer expose or honor cookie-path/request-cookie behavior.
|
||||
5. Existing tests that depended on missing files or JSON cookie arrays are rewritten to the env-only contract.
|
||||
5. Existing tests that depended on missing files or JSON cookie arrays are rewritten to
|
||||
the env-only contract.
|
||||
|
||||
Verification target after implementation:
|
||||
|
||||
@@ -121,11 +134,15 @@ Update cookie-related docs to match the new contract:
|
||||
|
||||
## Risks
|
||||
|
||||
- External callers using request cookie overrides will break at compile time or runtime, depending on how they consume the package.
|
||||
- Recent work added support for custom Facebook cookie paths, so removing that path intentionally reverses a newly introduced behavior.
|
||||
- Tests that currently model missing-file behavior must be rewritten rather than preserved.
|
||||
- External callers using request cookie overrides will break at compile time or runtime,
|
||||
depending on how they consume the package.
|
||||
- Recent work added support for custom Facebook cookie paths, so removing that path
|
||||
intentionally reverses a newly introduced behavior.
|
||||
- Tests that currently model missing-file behavior must be rewritten rather than
|
||||
preserved.
|
||||
|
||||
## Rollout Notes
|
||||
|
||||
This is an intentional contract break.
|
||||
The code, tests, and docs should all land together so there is no mixed messaging about supported cookie sources.
|
||||
The code, tests, and docs should all land together so there is no mixed messaging about
|
||||
supported cookie sources.
|
||||
|
||||
@@ -2,35 +2,46 @@
|
||||
|
||||
## Summary
|
||||
|
||||
Replace the legacy Facebook Marketplace scraper with a route-aware implementation built around current Comet bootstrap markers and route-specific extraction.
|
||||
The new scraper will keep authenticated direct HTTP fetches as the primary transport, but it will stop treating legacy `require`, `__bbox`, and `marketplace_product_details_page` structures as the main parsing contract.
|
||||
Replace the legacy Facebook Marketplace scraper with a route-aware implementation built
|
||||
around current Comet bootstrap markers and route-specific extraction.
|
||||
The new scraper will keep authenticated direct HTTP fetches as the primary transport,
|
||||
but it will stop treating legacy `require`, `__bbox`, and
|
||||
`marketplace_product_details_page` structures as the main parsing contract.
|
||||
|
||||
## Goals
|
||||
|
||||
- Replace both Facebook search and item-detail extraction with a current-shape parser.
|
||||
- Keep authenticated direct HTTP requests as the primary fetch strategy.
|
||||
- Parse route-specific Comet bootstrap/state payloads before falling back to rendered-HTML extraction.
|
||||
- Parse route-specific Comet bootstrap/state payloads before falling back to
|
||||
rendered-HTML extraction.
|
||||
- Detect auth-gated, unavailable, and unknown responses explicitly.
|
||||
- Update tests so they model current route markers and failure modes instead of legacy page objects.
|
||||
- Update tests so they model current route markers and failure modes instead of legacy
|
||||
page objects.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Reworking non-Facebook scrapers.
|
||||
- Converting the scraper to browser-only automation.
|
||||
- Preserving old parser behavior for `marketplace_product_details_page` or `__bbox`-driven item extraction.
|
||||
- Reverse-engineering every internal Facebook bootstrap payload shape exhaustively before implementation.
|
||||
- Preserving old parser behavior for `marketplace_product_details_page` or
|
||||
`__bbox`-driven item extraction.
|
||||
- Reverse-engineering every internal Facebook bootstrap payload shape exhaustively
|
||||
before implementation.
|
||||
|
||||
## Current State
|
||||
|
||||
The current implementation in `packages/core/src/scrapers/facebook.ts` still uses authenticated HTTP requests, which remains correct.
|
||||
The search path parses embedded script JSON and looks for `marketplace_search.feed_units.edges`.
|
||||
The item-detail path is centered on legacy extraction paths such as:
|
||||
The current implementation in `packages/core/src/scrapers/facebook.ts` still uses
|
||||
authenticated HTTP requests, which remains correct.
|
||||
The search path parses embedded script JSON and looks for
|
||||
`marketplace_search.feed_units.edges`. The item-detail path is centered on legacy
|
||||
extraction paths such as:
|
||||
|
||||
- `parsed.require[0][3].__bbox.result.data.viewer.marketplace_product_details_page.target`
|
||||
- nested `__bbox.require[...]` variations
|
||||
- recursive search through `parsed.require`
|
||||
|
||||
Live evidence gathered earlier in this session and by the isolated research subagent shows that current Facebook Marketplace pages are Comet route-driven and expose markers such as:
|
||||
Live evidence gathered earlier in this session and by the isolated research subagent
|
||||
shows that current Facebook Marketplace pages are Comet route-driven and expose markers
|
||||
such as:
|
||||
|
||||
- `XCometMarketplaceSearchController`
|
||||
- `XCometMarketplacePermalinkController`
|
||||
@@ -41,7 +52,9 @@ Live evidence gathered earlier in this session and by the isolated research suba
|
||||
- `data-sjs`
|
||||
- `data-btmanifest`
|
||||
|
||||
The same live investigation also showed that authenticated item pages no longer expose the old `marketplace_product_details_page` marker reliably, while live search still returns usable results.
|
||||
The same live investigation also showed that authenticated item pages no longer expose
|
||||
the old `marketplace_product_details_page` marker reliably, while live search still
|
||||
returns usable results.
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
@@ -52,9 +65,11 @@ The scraper will:
|
||||
1. Fetch authenticated HTML directly.
|
||||
2. Classify the response using current route and auth markers.
|
||||
3. Parse inline bootstrap/state payloads using route-specific probes.
|
||||
4. Fall back to rendered-HTML extraction only when bootstrap markers are present but the payload cannot be decoded into the expected search or item shape.
|
||||
4. Fall back to rendered-HTML extraction only when bootstrap markers are present but the
|
||||
payload cannot be decoded into the expected search or item shape.
|
||||
|
||||
This keeps the cheaper direct-HTTP transport while shifting the parser contract from legacy page-object names to current Comet route structure.
|
||||
This keeps the cheaper direct-HTTP transport while shifting the parser contract from
|
||||
legacy page-object names to current Comet route structure.
|
||||
|
||||
## Design
|
||||
|
||||
@@ -88,7 +103,8 @@ Primary behavior:
|
||||
- fetch the Marketplace search HTML with auth cookies
|
||||
- confirm the response class is `search`
|
||||
- extract inline bootstrap/state blobs from script tags and page attributes
|
||||
- probe for route-specific search payloads associated with `XCometMarketplaceSearchController`
|
||||
- probe for route-specific search payloads associated with
|
||||
`XCometMarketplaceSearchController`
|
||||
- map decoded search results into summary listing records
|
||||
|
||||
Search summary fields should remain aligned with the current public output shape:
|
||||
@@ -102,7 +118,8 @@ Search summary fields should remain aligned with the current public output shape
|
||||
|
||||
Fallback behavior:
|
||||
|
||||
- if search route markers are present but structured payload decoding fails, extract listing summaries from rendered HTML anchors and text patterns
|
||||
- if search route markers are present but structured payload decoding fails, extract
|
||||
listing summaries from rendered HTML anchors and text patterns
|
||||
- use item links matching `/marketplace/item/<id>` as the anchor for fallback extraction
|
||||
- treat fallback results as summary-only data, not rich detail data
|
||||
|
||||
@@ -132,9 +149,12 @@ Priority item fields:
|
||||
|
||||
Fallback behavior:
|
||||
|
||||
- if permalink route markers are present but no stable payload object is decodable, extract data from rendered HTML text structure
|
||||
- prioritize title, price, condition, description, location text, and seller module content
|
||||
- return partial item data when core user-facing fields are present rather than failing solely because deeper commerce metadata is missing
|
||||
- if permalink route markers are present but no stable payload object is decodable,
|
||||
extract data from rendered HTML text structure
|
||||
- prioritize title, price, condition, description, location text, and seller module
|
||||
content
|
||||
- return partial item data when core user-facing fields are present rather than failing
|
||||
solely because deeper commerce metadata is missing
|
||||
|
||||
### Bootstrap Parsing Strategy
|
||||
|
||||
@@ -151,11 +171,14 @@ Candidate discovery inputs:
|
||||
- `ServerJS` / `Bootloader` inline blobs
|
||||
- route controller names
|
||||
|
||||
Candidate scoring for search should favor objects that contain repeated result-card semantics, item IDs, listing links, titles, prices, or location summaries.
|
||||
Candidate scoring for item pages should favor objects that contain singular listing semantics, title, price, condition, description, location, seller, or permalink context.
|
||||
Candidate scoring for search should favor objects that contain repeated result-card
|
||||
semantics, item IDs, listing links, titles, prices, or location summaries.
|
||||
Candidate scoring for item pages should favor objects that contain singular listing
|
||||
semantics, title, price, condition, description, location, seller, or permalink context.
|
||||
|
||||
The parser should not depend on one hard-coded object name surviving forever.
|
||||
Instead, it should look for route-specific semantic clusters and choose the strongest candidate.
|
||||
Instead, it should look for route-specific semantic clusters and choose the strongest
|
||||
candidate.
|
||||
|
||||
### Legacy Removal
|
||||
|
||||
@@ -166,7 +189,9 @@ Specifically:
|
||||
- delete legacy-first `require` / `__bbox` navigation tables
|
||||
- delete tests whose only purpose is to preserve those legacy paths
|
||||
|
||||
If a minimal legacy compatibility branch remains, it must be a last-resort fallback behind the new route-aware parser and should not shape test fixtures or design decisions.
|
||||
If a minimal legacy compatibility branch remains, it must be a last-resort fallback
|
||||
behind the new route-aware parser and should not shape test fixtures or design
|
||||
decisions.
|
||||
|
||||
### Error Handling
|
||||
|
||||
@@ -178,7 +203,8 @@ Facebook responses should now fail with explicit route-aware outcomes:
|
||||
4. Search or item route detected, but no decodable data found.
|
||||
5. Unknown response shape.
|
||||
|
||||
Error messages should name the actual class of failure instead of implying that every parse miss is caused by expired cookies.
|
||||
Error messages should name the actual class of failure instead of implying that every
|
||||
parse miss is caused by expired cookies.
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
@@ -190,11 +216,15 @@ Coverage targets:
|
||||
1. Search responses classify correctly from current Comet controller markers.
|
||||
2. Item responses classify correctly from current Comet controller markers.
|
||||
3. Login-gated and unavailable responses are detected before parsing.
|
||||
4. Search bootstrap parsing produces summary listing results from current-shape fixtures.
|
||||
4. Search bootstrap parsing produces summary listing results from current-shape
|
||||
fixtures.
|
||||
5. Item bootstrap parsing produces rich listing details from current-shape fixtures.
|
||||
6. Search fallback extraction works when route markers exist but structured payload decoding fails.
|
||||
7. Item fallback extraction works when route markers exist but structured payload decoding fails.
|
||||
8. Old legacy-only item fixtures are removed or rewritten so they no longer define the contract.
|
||||
6. Search fallback extraction works when route markers exist but structured payload
|
||||
decoding fails.
|
||||
7. Item fallback extraction works when route markers exist but structured payload
|
||||
decoding fails.
|
||||
8. Old legacy-only item fixtures are removed or rewritten so they no longer define the
|
||||
contract.
|
||||
|
||||
Verification target after implementation:
|
||||
|
||||
@@ -204,23 +234,30 @@ Verification target after implementation:
|
||||
|
||||
## Public API Surface
|
||||
|
||||
Keep the current public function names unless the rewrite proves that a signature change is required:
|
||||
Keep the current public function names unless the rewrite proves that a signature change
|
||||
is required:
|
||||
|
||||
- `fetchFacebookItems(...)`
|
||||
- `fetchFacebookItem(...)`
|
||||
- `extractFacebookMarketplaceData(...)`
|
||||
- `extractFacebookItemData(...)`
|
||||
|
||||
The internals should change substantially, but callers should not need a new integration surface for this rewrite.
|
||||
The internals should change substantially, but callers should not need a new integration
|
||||
surface for this rewrite.
|
||||
|
||||
## Risks
|
||||
|
||||
- Facebook may change bootstrap payload naming again, so route/controller markers are more stable than exact nested object paths but still not guaranteed.
|
||||
- Search and item pages may each contain multiple partial payloads, making candidate ranking important.
|
||||
- Fallback rendered-HTML extraction may be noisier than bootstrap decoding and needs clear precedence rules.
|
||||
- Live fixtures can drift from production quickly, so tests must model route semantics rather than exact one-off payloads where possible.
|
||||
- Facebook may change bootstrap payload naming again, so route/controller markers are
|
||||
more stable than exact nested object paths but still not guaranteed.
|
||||
- Search and item pages may each contain multiple partial payloads, making candidate
|
||||
ranking important.
|
||||
- Fallback rendered-HTML extraction may be noisier than bootstrap decoding and needs
|
||||
clear precedence rules.
|
||||
- Live fixtures can drift from production quickly, so tests must model route semantics
|
||||
rather than exact one-off payloads where possible.
|
||||
|
||||
## Rollout Notes
|
||||
|
||||
The code, fixtures, and tests should change together.
|
||||
There should be no mixed state where the implementation is Comet-aware but the tests still encode `marketplace_product_details_page` as the primary contract.
|
||||
There should be no mixed state where the implementation is Comet-aware but the tests
|
||||
still encode `marketplace_product_details_page` as the primary contract.
|
||||
|
||||
@@ -2,15 +2,18 @@
|
||||
|
||||
## Summary
|
||||
|
||||
Add an optional shared result mode across Facebook, eBay, and Kijiji that moves suspiciously cheap listings out of the main results into a separate `unstableResults` bucket.
|
||||
Listings are considered unstable when their price is more than 20% below the median price of the scraper's priced search results.
|
||||
Add an optional shared result mode across Facebook, eBay, and Kijiji that moves
|
||||
suspiciously cheap listings out of the main results into a separate `unstableResults`
|
||||
bucket. Listings are considered unstable when their price is more than 20% below the
|
||||
median price of the scraper’s priced search results.
|
||||
|
||||
## Goals
|
||||
|
||||
- Support the same optional unstable-listing mode across all scrapers.
|
||||
- Keep current default scraper and route behavior unchanged unless the mode is enabled.
|
||||
- Hide unstable listings from the main results while still returning them separately.
|
||||
- Implement the rule once in shared core code instead of duplicating marketplace-specific logic.
|
||||
- Implement the rule once in shared core code instead of duplicating
|
||||
marketplace-specific logic.
|
||||
- Document the option in MCP tool descriptions so callers can discover it.
|
||||
|
||||
## Non-Goals
|
||||
@@ -24,7 +27,8 @@ Listings are considered unstable when their price is more than 20% below the med
|
||||
|
||||
`packages/core` currently returns plain arrays from scraper search functions.
|
||||
`packages/api-server` forwards those scraper results directly from marketplace routes.
|
||||
`packages/mcp-server` documents search tools per marketplace, but does not expose or describe any result-stability mode.
|
||||
`packages/mcp-server` documents search tools per marketplace, but does not expose or
|
||||
describe any result-stability mode.
|
||||
|
||||
There is no shared result-classification utility today.
|
||||
Price filtering exists in some scrapers, but not a cross-marketplace median-based split.
|
||||
@@ -33,11 +37,14 @@ Price filtering exists in some scrapers, but not a cross-marketplace median-base
|
||||
|
||||
Use a shared core utility plus per-route and per-tool opt-in.
|
||||
|
||||
The shared utility will accept parsed listings, compute the median from valid positive prices, and split the data into `results` and `unstableResults`.
|
||||
Each scraper will opt into that utility when the caller enables unstable-listing mode.
|
||||
API routes and MCP tools will expose the same optional mode so the feature is consistently available everywhere scraper search is surfaced.
|
||||
The shared utility will accept parsed listings, compute the median from valid positive
|
||||
prices, and split the data into `results` and `unstableResults`. Each scraper will opt
|
||||
into that utility when the caller enables unstable-listing mode.
|
||||
API routes and MCP tools will expose the same optional mode so the feature is
|
||||
consistently available everywhere scraper search is surfaced.
|
||||
|
||||
This keeps the heuristic centralized, minimizes duplicated logic, and preserves existing consumers by leaving the default path unchanged.
|
||||
This keeps the heuristic centralized, minimizes duplicated logic, and preserves existing
|
||||
consumers by leaving the default path unchanged.
|
||||
|
||||
## Design
|
||||
|
||||
@@ -48,14 +55,16 @@ Add a shared utility in `packages/core` for listing stability classification.
|
||||
Responsibilities:
|
||||
|
||||
- accept parsed listing arrays with `listingPrice.cents`
|
||||
- ignore listings whose price is missing, non-numeric, or non-positive when computing the median
|
||||
- ignore listings whose price is missing, non-numeric, or non-positive when computing
|
||||
the median
|
||||
- compute the median price from valid priced listings
|
||||
- classify listings as unstable when `listingPrice.cents < median * 0.8`
|
||||
- return an object with:
|
||||
- `results`: listings that remain in the main bucket
|
||||
- `unstableResults`: listings moved out of the main bucket
|
||||
|
||||
Listings excluded from median computation because their price is missing or non-positive remain in `results` unchanged.
|
||||
Listings excluded from median computation because their price is missing or non-positive
|
||||
remain in `results` unchanged.
|
||||
|
||||
### Scraper Integration
|
||||
|
||||
@@ -68,7 +77,8 @@ Default behavior:
|
||||
Opt-in behavior:
|
||||
|
||||
- run the shared classification utility after parsing search results
|
||||
- classify before final result limiting so unstable items do not consume main-result slots
|
||||
- classify before final result limiting so unstable items do not consume main-result
|
||||
slots
|
||||
- return an object shaped like:
|
||||
|
||||
```ts
|
||||
@@ -82,7 +92,8 @@ Each scraper will use its existing concrete listing subtype for these arrays.
|
||||
|
||||
### API Surface
|
||||
|
||||
Marketplace API routes will expose an optional query parameter for unstable-listing mode.
|
||||
Marketplace API routes will expose an optional query parameter for unstable-listing
|
||||
mode.
|
||||
|
||||
Requirements:
|
||||
|
||||
@@ -90,7 +101,8 @@ Requirements:
|
||||
- when enabled, return the object payload with `results` and `unstableResults`
|
||||
- use the same semantics across Facebook, eBay, and Kijiji routes
|
||||
|
||||
The exact parameter name should be consistent across routes and intentionally describe the behavior, for example `unstableFilter=true`.
|
||||
The exact parameter name should be consistent across routes and intentionally describe
|
||||
the behavior, for example `unstableFilter=true`.
|
||||
|
||||
### MCP Surface
|
||||
|
||||
@@ -100,34 +112,43 @@ Tool descriptions should explicitly document:
|
||||
|
||||
- that the option is optional
|
||||
- that it moves listings priced more than 20% below the median into `unstableResults`
|
||||
- that enabling it changes the response shape from a plain list to an object with `results` and `unstableResults`
|
||||
- that enabling it changes the response shape from a plain list to an object with
|
||||
`results` and `unstableResults`
|
||||
- that the behavior is available for Facebook, eBay, and Kijiji search tools
|
||||
|
||||
The wording should be aligned across all three tools so the feature reads as one shared capability.
|
||||
The wording should be aligned across all three tools so the feature reads as one shared
|
||||
capability.
|
||||
|
||||
### Error Handling
|
||||
|
||||
The unstable-listing mode should be best-effort and non-failing.
|
||||
|
||||
- If there are no valid positive prices, return all listings in `results` and an empty `unstableResults` array.
|
||||
- If there are no valid positive prices, return all listings in `results` and an empty
|
||||
`unstableResults` array.
|
||||
- If there is only one valid priced listing, do not classify it as unstable.
|
||||
- Parsing failures remain governed by existing scraper behavior; the classification layer should not introduce new scraper-specific errors.
|
||||
- Parsing failures remain governed by existing scraper behavior; the classification
|
||||
layer should not introduce new scraper-specific errors.
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
Follow TDD.
|
||||
Start with shared utility tests, then wire the option through scraper and route tests.
|
||||
Follow TDD. Start with shared utility tests, then wire the option through scraper and
|
||||
route tests.
|
||||
|
||||
Coverage targets:
|
||||
|
||||
1. Median calculation for odd-sized valid price sets.
|
||||
2. Median calculation for even-sized valid price sets.
|
||||
3. Strict cutoff behavior where only listings with `price < median * 0.8` move to `unstableResults`.
|
||||
4. Missing, invalid, zero, or negative prices are excluded from median computation and remain in `results`.
|
||||
3. Strict cutoff behavior where only listings with `price < median * 0.8` move to
|
||||
`unstableResults`.
|
||||
4. Missing, invalid, zero, or negative prices are excluded from median computation and
|
||||
remain in `results`.
|
||||
5. Default scraper behavior still returns plain arrays when the option is disabled.
|
||||
6. Enabled scraper behavior returns `{ results, unstableResults }` for Facebook, eBay, and Kijiji.
|
||||
7. API routes preserve existing response shapes by default and switch to the object payload only when enabled.
|
||||
8. MCP tool metadata documents the new optional mode for all three marketplace search tools.
|
||||
6. Enabled scraper behavior returns `{ results, unstableResults }` for Facebook, eBay,
|
||||
and Kijiji.
|
||||
7. API routes preserve existing response shapes by default and switch to the object
|
||||
payload only when enabled.
|
||||
8. MCP tool metadata documents the new optional mode for all three marketplace search
|
||||
tools.
|
||||
|
||||
Verification target after implementation:
|
||||
|
||||
@@ -138,11 +159,15 @@ Verification target after implementation:
|
||||
|
||||
## Risks
|
||||
|
||||
- The optional mode introduces a union return shape for scraper callers, which can ripple into downstream TypeScript signatures.
|
||||
- Applying classification before final limiting changes which items appear in the main bucket compared with a naive post-limit split.
|
||||
- Kijiji and eBay may have different mixes of priced and unpriced results, so excluding non-positive prices from the median must remain explicit and tested.
|
||||
- The optional mode introduces a union return shape for scraper callers, which can
|
||||
ripple into downstream TypeScript signatures.
|
||||
- Applying classification before final limiting changes which items appear in the main
|
||||
bucket compared with a naive post-limit split.
|
||||
- Kijiji and eBay may have different mixes of priced and unpriced results, so excluding
|
||||
non-positive prices from the median must remain explicit and tested.
|
||||
|
||||
## Rollout Notes
|
||||
|
||||
Land the shared classifier, scraper wiring, route wiring, tests, and MCP description updates together.
|
||||
That avoids a partial rollout where the feature exists in one surface but is undocumented or inconsistent elsewhere.
|
||||
Land the shared classifier, scraper wiring, route wiring, tests, and MCP description
|
||||
updates together. That avoids a partial rollout where the feature exists in one surface
|
||||
but is undocumented or inconsistent elsewhere.
|
||||
|
||||
@@ -2,25 +2,32 @@
|
||||
|
||||
## Summary
|
||||
|
||||
Add explicit live endpoint tests for each core scraper parser path. These tests are excluded from normal deterministic test commands and run only through a dedicated package script.
|
||||
Add explicit live endpoint tests for each core scraper parser path.
|
||||
These tests are excluded from normal deterministic test commands and run only through a
|
||||
dedicated package script.
|
||||
|
||||
## Scope
|
||||
|
||||
- Add one live suite per parser: eBay, Kijiji, Facebook.
|
||||
- Place suites under `packages/core/test/live/` so normal `bun test packages/core/test/*.test.ts` patterns do not include them accidentally.
|
||||
- Place suites under `packages/core/test/live/` so normal
|
||||
`bun test packages/core/test/*.test.ts` patterns do not include them accidentally.
|
||||
- Add a root `test:live` script that runs all live suites together.
|
||||
- Keep existing mocked tests unchanged.
|
||||
|
||||
## Behavior
|
||||
|
||||
- Each suite calls the public scraper entry point for that marketplace with a narrow query and low max item count.
|
||||
- Assertions verify scrape output shape and parser viability, not exact listing identity.
|
||||
- Each suite calls the public scraper entry point for that marketplace with a narrow
|
||||
query and low max item count.
|
||||
- Assertions verify scrape output shape and parser viability, not exact listing
|
||||
identity.
|
||||
- eBay and Kijiji require live network access and fail on endpoint/parser breakage.
|
||||
- Facebook is strict: missing or expired `FACEBOOK_COOKIE` fails the live suite instead of skipping.
|
||||
- Facebook is strict: missing or expired `FACEBOOK_COOKIE` fails the live suite instead
|
||||
of skipping.
|
||||
|
||||
## Test Data
|
||||
|
||||
- Use stable broad Canadian queries such as `iphone` or `laptop` to reduce empty-result risk.
|
||||
- Use stable broad Canadian queries such as `iphone` or `laptop` to reduce empty-result
|
||||
risk.
|
||||
- Use low limits to avoid unnecessary load and rate-limit pressure.
|
||||
- Avoid exact prices, titles, listing IDs, or ordering assumptions.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user