chore: format markdown
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
This commit is contained in:
@@ -1,34 +1,49 @@
|
||||
# Facebook Comet Rewrite Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
|
||||
> to implement this plan task-by-task.
|
||||
> Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid Comet-bootstrap parser for both search and item routes.
|
||||
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid
|
||||
Comet-bootstrap parser for both search and item routes.
|
||||
|
||||
**Architecture:** Keep authenticated direct HTTP fetches as the transport. Classify each Facebook response first, then parse route-specific Comet bootstrap/state candidates, and fall back to rendered-HTML extraction only when bootstrap decoding cannot produce the expected search or item shape.
|
||||
**Architecture:** Keep authenticated direct HTTP fetches as the transport.
|
||||
Classify each Facebook response first, then parse route-specific Comet bootstrap/state
|
||||
candidates, and fall back to rendered-HTML extraction only when bootstrap decoding
|
||||
cannot produce the expected search or item shape.
|
||||
|
||||
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http helpers
|
||||
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http
|
||||
helpers
|
||||
|
||||
---
|
||||
* * *
|
||||
|
||||
## File Structure
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction, search parsing, item parsing, and HTML fallbacks.
|
||||
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction,
|
||||
search parsing, item parsing, and HTML fallbacks.
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing, and route-aware item/search extraction behavior.
|
||||
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing,
|
||||
and route-aware item/search extraction behavior.
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts`
|
||||
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping for search/item entrypoints.
|
||||
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping
|
||||
for search/item entrypoints.
|
||||
|
||||
### Task 1: Add Route Classification Coverage
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Add these tests near the Facebook parser tests in `packages/core/test/facebook-core.test.ts`:
|
||||
Add these tests near the Facebook parser tests in
|
||||
`packages/core/test/facebook-core.test.ts`:
|
||||
|
||||
```ts
|
||||
test("classifies Comet search responses", () => {
|
||||
@@ -89,12 +104,14 @@ test("classifies unavailable item responses", () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Expected: FAIL because `classifyFacebookResponse` does not exist yet.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
Add this type and function near the parsing section in `packages/core/src/scrapers/facebook.ts`:
|
||||
Add this type and function near the parsing section in
|
||||
`packages/core/src/scrapers/facebook.ts`:
|
||||
|
||||
```ts
|
||||
type FacebookResponseKind = "search" | "item" | "auth_gated" | "unavailable" | "unknown";
|
||||
@@ -128,7 +145,8 @@ export function classifyFacebookResponse(htmlString: HTMLString, responseUrl: st
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -141,8 +159,11 @@ git commit -m "refactor: add facebook response classification"
|
||||
### Task 2: Add Bootstrap Candidate Extraction
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -185,7 +206,8 @@ test("keeps candidate order stable for later scoring", () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Expected: FAIL because `extractFacebookBootstrapCandidates` does not exist.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
@@ -218,7 +240,8 @@ export function extractFacebookBootstrapCandidates(htmlString: HTMLString): Reco
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -231,10 +254,15 @@ git commit -m "refactor: add facebook bootstrap candidate extraction"
|
||||
### Task 3: Replace Search Parsing With Candidate Scoring
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -323,12 +351,15 @@ const mockSearchHtml = `
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
|
||||
Expected: FAIL because the current search extractor only understands legacy `marketplace_search` shapes.
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
|
||||
Expected: FAIL because the current search extractor only understands legacy
|
||||
`marketplace_search` shapes.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
Replace the search extraction internals in `extractFacebookMarketplaceData()` with candidate scoring like this:
|
||||
Replace the search extraction internals in `extractFacebookMarketplaceData()` with
|
||||
candidate scoring like this:
|
||||
|
||||
```ts
|
||||
function findSearchEdges(candidate: unknown): FacebookEdge[] | null {
|
||||
@@ -383,7 +414,8 @@ export function extractFacebookMarketplaceData(htmlString: HTMLString): Facebook
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Expected: PASS for the rewritten search fixtures and existing unaffected tests.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -396,8 +428,11 @@ git commit -m "refactor: rewrite facebook search parser for comet bootstrap"
|
||||
### Task 4: Replace Item Parsing With Candidate Scoring
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -438,7 +473,8 @@ test("extracts item details from Comet permalink bootstrap candidates", () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
|
||||
Expected: FAIL because the current item extractor depends on legacy permalink markers.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
@@ -491,8 +527,8 @@ export function extractFacebookItemData(htmlString: HTMLString): FacebookMarketp
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts`
|
||||
Expected: PASS for current-shape item tests and remaining parser tests.
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS for
|
||||
current-shape item tests and remaining parser tests.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -504,8 +540,11 @@ git commit -m "refactor: rewrite facebook item parser for comet bootstrap"
|
||||
### Task 5: Add HTML Fallback Extraction
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-core.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -549,8 +588,10 @@ test("falls back to rendered item HTML when bootstrap payloads are undecodable",
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Expected: FAIL because the extractor currently returns `null` without a structured candidate.
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Expected: FAIL because the extractor currently returns `null` without a structured
|
||||
candidate.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -607,11 +648,13 @@ function extractItemFallback(htmlString: HTMLString): FacebookMarketplaceItem |
|
||||
}
|
||||
```
|
||||
|
||||
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()` and `extractFacebookItemData()`.
|
||||
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()`
|
||||
and `extractFacebookItemData()`.
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
@@ -624,8 +667,11 @@ git commit -m "refactor: add facebook html fallbacks"
|
||||
### Task 6: Wire Route-Aware Failures Into Entry Points
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts`
|
||||
|
||||
- Test: `packages/core/test/facebook-integration.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
@@ -664,8 +710,10 @@ test("returns null for unavailable item responses", async () => {
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
|
||||
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by route/auth state.
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
|
||||
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by
|
||||
route/auth state.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
@@ -690,12 +738,13 @@ if (itemResponseClass.kind === "unavailable") {
|
||||
}
|
||||
```
|
||||
|
||||
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to return both HTML and final URL; otherwise start by threading final URL support through the fetch helper in the same task.
|
||||
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to
|
||||
return both HTML and final URL; otherwise start by threading final URL support through
|
||||
the fetch helper in the same task.
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-integration.test.ts`
|
||||
Expected: PASS
|
||||
Run: `bun test packages/core/test/facebook-integration.test.ts` Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
@@ -707,19 +756,22 @@ git commit -m "refactor: handle facebook route-aware failure states"
|
||||
### Task 7: Run Full Verification And Live Probe
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/core/src/scrapers/facebook.ts` if small cleanup is required
|
||||
|
||||
- Modify: `packages/core/test/facebook-core.test.ts` if small cleanup is required
|
||||
|
||||
- Modify: `packages/core/test/facebook-integration.test.ts` if small cleanup is required
|
||||
|
||||
- [ ] **Step 1: Run focused Facebook tests**
|
||||
|
||||
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Run:
|
||||
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 2: Run broader core tests**
|
||||
|
||||
Run: `bun test packages/core/test`
|
||||
Expected: PASS
|
||||
Run: `bun test packages/core/test` Expected: PASS
|
||||
|
||||
- [ ] **Step 3: Run live authenticated Facebook probe**
|
||||
|
||||
@@ -742,11 +794,14 @@ if (results[0]?.url) {
|
||||
Expected:
|
||||
|
||||
- search returns at least one result
|
||||
- item fetch returns non-null for the first live result when the route is not stale/unavailable
|
||||
|
||||
- item fetch returns non-null for the first live result when the route is not
|
||||
stale/unavailable
|
||||
|
||||
- [ ] **Step 4: Make any minimal cleanup needed to keep tests and live probe green**
|
||||
|
||||
If cleanup is needed, keep it limited to naming, dead-code removal caused by the rewrite, or small parser corrections directly exposed by the verification commands.
|
||||
If cleanup is needed, keep it limited to naming, dead-code removal caused by the
|
||||
rewrite, or small parser corrections directly exposed by the verification commands.
|
||||
|
||||
- [ ] **Step 5: Re-run verification**
|
||||
|
||||
@@ -767,6 +822,11 @@ git commit -m "refactor: complete facebook comet scraper rewrite"
|
||||
|
||||
## Self-Review
|
||||
|
||||
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and live verification.
|
||||
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps remain.
|
||||
- Type consistency: all planned functions and types use the same names across tasks: `classifyFacebookResponse`, `extractFacebookBootstrapCandidates`, `extractFacebookMarketplaceData`, and `extractFacebookItemData`.
|
||||
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware
|
||||
item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and
|
||||
live verification.
|
||||
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps
|
||||
remain.
|
||||
- Type consistency: all planned functions and types use the same names across tasks:
|
||||
`classifyFacebookResponse`, `extractFacebookBootstrapCandidates`,
|
||||
`extractFacebookMarketplaceData`, and `extractFacebookItemData`.
|
||||
|
||||
Reference in New Issue
Block a user