chore: format markdown

Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
This commit is contained in:
2026-05-01 11:42:54 -04:00
parent d2c3c07e7d
commit 7ab33d0b02
15 changed files with 925 additions and 417 deletions

View File

@@ -1,34 +1,49 @@
# Facebook Comet Rewrite Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or superpowers:executing-plans
> to implement this plan task-by-task.
> Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid Comet-bootstrap parser for both search and item routes.
**Goal:** Replace the legacy Facebook Marketplace scraper with a route-aware hybrid
Comet-bootstrap parser for both search and item routes.
**Architecture:** Keep authenticated direct HTTP fetches as the transport. Classify each Facebook response first, then parse route-specific Comet bootstrap/state candidates, and fall back to rendered-HTML extraction only when bootstrap decoding cannot produce the expected search or item shape.
**Architecture:** Keep authenticated direct HTTP fetches as the transport.
Classify each Facebook response first, then parse route-specific Comet bootstrap/state
candidates, and fall back to rendered-HTML extraction only when bootstrap decoding
cannot produce the expected search or item shape.
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http helpers
**Tech Stack:** Bun, TypeScript, `bun:test`, `linkedom`, existing shared cookie/http
helpers
---
* * *
## File Structure
- Modify: `packages/core/src/scrapers/facebook.ts`
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction, search parsing, item parsing, and HTML fallbacks.
- Owns Facebook fetch flow, response classification, bootstrap candidate extraction,
search parsing, item parsing, and HTML fallbacks.
- Modify: `packages/core/test/facebook-core.test.ts`
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing, and route-aware item/search extraction behavior.
- Owns unit coverage for response classification, bootstrap parsing, fallback parsing,
and route-aware item/search extraction behavior.
- Modify: `packages/core/test/facebook-integration.test.ts`
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping for search/item entrypoints.
- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping
for search/item entrypoints.
### Task 1: Add Route Classification Coverage
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
Add these tests near the Facebook parser tests in `packages/core/test/facebook-core.test.ts`:
Add these tests near the Facebook parser tests in
`packages/core/test/facebook-core.test.ts`:
```ts
test("classifies Comet search responses", () => {
@@ -89,12 +104,14 @@ test("classifies unavailable item responses", () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Expected: FAIL because `classifyFacebookResponse` does not exist yet.
- [ ] **Step 3: Write minimal implementation**
Add this type and function near the parsing section in `packages/core/src/scrapers/facebook.ts`:
Add this type and function near the parsing section in
`packages/core/src/scrapers/facebook.ts`:
```ts
type FacebookResponseKind = "search" | "item" | "auth_gated" | "unavailable" | "unknown";
@@ -128,7 +145,8 @@ export function classifyFacebookResponse(htmlString: HTMLString, responseUrl: st
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"`
Expected: PASS
- [ ] **Step 5: Commit**
@@ -141,8 +159,11 @@ git commit -m "refactor: add facebook response classification"
### Task 2: Add Bootstrap Candidate Extraction
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -185,7 +206,8 @@ test("keeps candidate order stable for later scoring", () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Expected: FAIL because `extractFacebookBootstrapCandidates` does not exist.
- [ ] **Step 3: Write minimal implementation**
@@ -218,7 +240,8 @@ export function extractFacebookBootstrapCandidates(htmlString: HTMLString): Reco
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"`
Expected: PASS
- [ ] **Step 5: Commit**
@@ -231,10 +254,15 @@ git commit -m "refactor: add facebook bootstrap candidate extraction"
### Task 3: Replace Search Parsing With Candidate Scoring
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/test/facebook-integration.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- Test: `packages/core/test/facebook-integration.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -323,12 +351,15 @@ const mockSearchHtml = `
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
Expected: FAIL because the current search extractor only understands legacy `marketplace_search` shapes.
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"`
Expected: FAIL because the current search extractor only understands legacy
`marketplace_search` shapes.
- [ ] **Step 3: Write minimal implementation**
Replace the search extraction internals in `extractFacebookMarketplaceData()` with candidate scoring like this:
Replace the search extraction internals in `extractFacebookMarketplaceData()` with
candidate scoring like this:
```ts
function findSearchEdges(candidate: unknown): FacebookEdge[] | null {
@@ -383,7 +414,8 @@ export function extractFacebookMarketplaceData(htmlString: HTMLString): Facebook
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Run:
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Expected: PASS for the rewritten search fixtures and existing unaffected tests.
- [ ] **Step 5: Commit**
@@ -396,8 +428,11 @@ git commit -m "refactor: rewrite facebook search parser for comet bootstrap"
### Task 4: Replace Item Parsing With Candidate Scoring
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -438,7 +473,8 @@ test("extracts item details from Comet permalink bootstrap candidates", () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"`
Expected: FAIL because the current item extractor depends on legacy permalink markers.
- [ ] **Step 3: Write minimal implementation**
@@ -491,8 +527,8 @@ export function extractFacebookItemData(htmlString: HTMLString): FacebookMarketp
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts`
Expected: PASS for current-shape item tests and remaining parser tests.
Run: `bun test packages/core/test/facebook-core.test.ts` Expected: PASS for
current-shape item tests and remaining parser tests.
- [ ] **Step 5: Commit**
@@ -504,8 +540,11 @@ git commit -m "refactor: rewrite facebook item parser for comet bootstrap"
### Task 5: Add HTML Fallback Extraction
**Files:**
- Modify: `packages/core/test/facebook-core.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-core.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -549,8 +588,10 @@ test("falls back to rendered item HTML when bootstrap payloads are undecodable",
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Expected: FAIL because the extractor currently returns `null` without a structured candidate.
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Expected: FAIL because the extractor currently returns `null` without a structured
candidate.
- [ ] **Step 3: Write minimal implementation**
@@ -607,11 +648,13 @@ function extractItemFallback(htmlString: HTMLString): FacebookMarketplaceItem |
}
```
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()` and `extractFacebookItemData()`.
Then call these helpers as the last fallback inside `extractFacebookMarketplaceData()`
and `extractFacebookItemData()`.
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Run:
`bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"`
Expected: PASS
- [ ] **Step 5: Commit**
@@ -624,8 +667,11 @@ git commit -m "refactor: add facebook html fallbacks"
### Task 6: Wire Route-Aware Failures Into Entry Points
**Files:**
- Modify: `packages/core/test/facebook-integration.test.ts`
- Modify: `packages/core/src/scrapers/facebook.ts`
- Test: `packages/core/test/facebook-integration.test.ts`
- [ ] **Step 1: Write the failing tests**
@@ -664,8 +710,10 @@ test("returns null for unavailable item responses", async () => {
- [ ] **Step 2: Run test to verify it fails**
Run: `bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by route/auth state.
Run:
`bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"`
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by
route/auth state.
- [ ] **Step 3: Write minimal implementation**
@@ -690,12 +738,13 @@ if (itemResponseClass.kind === "unavailable") {
}
```
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to return both HTML and final URL; otherwise start by threading final URL support through the fetch helper in the same task.
Use the actual response URL from `fetchHtml` plumbing if that helper is extended to
return both HTML and final URL; otherwise start by threading final URL support through
the fetch helper in the same task.
- [ ] **Step 4: Run test to verify it passes**
Run: `bun test packages/core/test/facebook-integration.test.ts`
Expected: PASS
Run: `bun test packages/core/test/facebook-integration.test.ts` Expected: PASS
- [ ] **Step 5: Commit**
@@ -707,19 +756,22 @@ git commit -m "refactor: handle facebook route-aware failure states"
### Task 7: Run Full Verification And Live Probe
**Files:**
- Modify: `packages/core/src/scrapers/facebook.ts` if small cleanup is required
- Modify: `packages/core/test/facebook-core.test.ts` if small cleanup is required
- Modify: `packages/core/test/facebook-integration.test.ts` if small cleanup is required
- [ ] **Step 1: Run focused Facebook tests**
Run: `bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Run:
`bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts`
Expected: PASS
- [ ] **Step 2: Run broader core tests**
Run: `bun test packages/core/test`
Expected: PASS
Run: `bun test packages/core/test` Expected: PASS
- [ ] **Step 3: Run live authenticated Facebook probe**
@@ -742,11 +794,14 @@ if (results[0]?.url) {
Expected:
- search returns at least one result
- item fetch returns non-null for the first live result when the route is not stale/unavailable
- item fetch returns non-null for the first live result when the route is not
stale/unavailable
- [ ] **Step 4: Make any minimal cleanup needed to keep tests and live probe green**
If cleanup is needed, keep it limited to naming, dead-code removal caused by the rewrite, or small parser corrections directly exposed by the verification commands.
If cleanup is needed, keep it limited to naming, dead-code removal caused by the
rewrite, or small parser corrections directly exposed by the verification commands.
- [ ] **Step 5: Re-run verification**
@@ -767,6 +822,11 @@ git commit -m "refactor: complete facebook comet scraper rewrite"
## Self-Review
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and live verification.
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps remain.
- Type consistency: all planned functions and types use the same names across tasks: `classifyFacebookResponse`, `extractFacebookBootstrapCandidates`, `extractFacebookMarketplaceData`, and `extractFacebookItemData`.
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware
item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and
live verification.
- Placeholder scan: no `TODO`, `TBD`, or unspecified “handle appropriately” steps
remain.
- Type consistency: all planned functions and types use the same names across tasks:
`classifyFacebookResponse`, `extractFacebookBootstrapCandidates`,
`extractFacebookMarketplaceData`, and `extractFacebookItemData`.