24 KiB
Facebook Comet Rewrite Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace the legacy Facebook Marketplace scraper with a route-aware hybrid Comet-bootstrap parser for both search and item routes.
Architecture: Keep authenticated direct HTTP fetches as the transport. Classify each Facebook response first, then parse route-specific Comet bootstrap/state candidates, and fall back to rendered-HTML extraction only when bootstrap decoding cannot produce the expected search or item shape.
Tech Stack: Bun, TypeScript, bun:test, linkedom, existing shared cookie/http
helpers
File Structure
- Modify:
packages/core/src/scrapers/facebook.ts- Owns Facebook fetch flow, response classification, bootstrap candidate extraction, search parsing, item parsing, and HTML fallbacks.
- Modify:
packages/core/test/facebook-core.test.ts- Owns unit coverage for response classification, bootstrap parsing, fallback parsing, and route-aware item/search extraction behavior.
- Modify:
packages/core/test/facebook-integration.test.ts- Owns higher-level fetch flow tests, auth/degradation behavior, and result shaping for search/item entrypoints.
Task 1: Add Route Classification Coverage
Files:
-
Modify:
packages/core/test/facebook-core.test.ts -
Modify:
packages/core/src/scrapers/facebook.ts -
Test:
packages/core/test/facebook-core.test.ts -
Step 1: Write the failing tests
Add these tests near the Facebook parser tests in
packages/core/test/facebook-core.test.ts:
test("classifies Comet search responses", () => {
const html = `
<html>
<head><title>Marketplace</title></head>
<body>
<script>"XCometMarketplaceSearchController"</script>
<script>{"routing_namespace":"fb_comet","use_ssr_state_manager":true}</script>
</body>
</html>
`;
expect(classifyFacebookResponse(html, "https://www.facebook.com/marketplace/toronto/search?query=bike")).toEqual({
kind: "search",
authGated: false,
unavailable: false,
});
});
test("classifies Comet item responses", () => {
const html = `
<html>
<body>
<script>"XCometMarketplacePermalinkController"</script>
<script>{"routing_namespace":"fb_comet"}</script>
</body>
</html>
`;
expect(classifyFacebookResponse(html, "https://www.facebook.com/marketplace/item/123/")).toEqual({
kind: "item",
authGated: false,
unavailable: false,
});
});
test("classifies login-gated responses before parsing", () => {
const html = `<html><body>You must log in to Facebook</body></html>`;
expect(classifyFacebookResponse(html, "https://www.facebook.com/login/?next=%2Fmarketplace%2Fitem%2F123%2F")).toEqual({
kind: "auth_gated",
authGated: true,
unavailable: false,
});
});
test("classifies unavailable item responses", () => {
const html = `<html><body>Marketplace</body></html>`;
expect(classifyFacebookResponse(html, "https://www.facebook.com/marketplace/toronto/?unavailable_product=1")).toEqual({
kind: "unavailable",
authGated: false,
unavailable: true,
});
});
- Step 2: Run test to verify it fails
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"
Expected: FAIL because classifyFacebookResponse does not exist yet.
- Step 3: Write minimal implementation
Add this type and function near the parsing section in
packages/core/src/scrapers/facebook.ts:
type FacebookResponseKind = "search" | "item" | "auth_gated" | "unavailable" | "unknown";
export function classifyFacebookResponse(htmlString: HTMLString, responseUrl: string) {
const authGated =
responseUrl.includes("/login/") ||
htmlString.includes("You must log in to Facebook") ||
htmlString.includes("log in to Facebook");
if (authGated) {
return { kind: "auth_gated" as const, authGated: true, unavailable: false };
}
const unavailable = responseUrl.includes("unavailable_product=1");
if (unavailable) {
return { kind: "unavailable" as const, authGated: false, unavailable: true };
}
if (htmlString.includes("XCometMarketplaceSearchController")) {
return { kind: "search" as const, authGated: false, unavailable: false };
}
if (htmlString.includes("XCometMarketplacePermalinkController")) {
return { kind: "item" as const, authGated: false, unavailable: false };
}
return { kind: "unknown" as const, authGated: false, unavailable: false };
}
- Step 4: Run test to verify it passes
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "classifies"
Expected: PASS
- Step 5: Commit
git add packages/core/src/scrapers/facebook.ts packages/core/test/facebook-core.test.ts
git commit -m "refactor: add facebook response classification"
Task 2: Add Bootstrap Candidate Extraction
Files:
-
Modify:
packages/core/test/facebook-core.test.ts -
Modify:
packages/core/src/scrapers/facebook.ts -
Test:
packages/core/test/facebook-core.test.ts -
Step 1: Write the failing tests
Add these tests:
test("extracts Comet bootstrap candidates from script tags", () => {
const html = `
<html><body>
<script>{"routing_namespace":"fb_comet"}</script>
<script>{"data":{"marketplace_search_bootstrap":{"edges":[{"node":{"listing":{"id":"1"}}}]}}}</script>
<script>not json</script>
</body></html>
`;
const candidates = extractFacebookBootstrapCandidates(html);
expect(candidates).toHaveLength(2);
expect(candidates[1]).toEqual({
data: {
marketplace_search_bootstrap: {
edges: [{ node: { listing: { id: "1" } } }],
},
},
});
});
test("keeps candidate order stable for later scoring", () => {
const html = `
<html><body>
<script>{"marker":"first"}</script>
<script>{"marker":"second"}</script>
</body></html>
`;
const candidates = extractFacebookBootstrapCandidates(html);
expect(candidates.map((candidate) => candidate.marker)).toEqual(["first", "second"]);
});
- Step 2: Run test to verify it fails
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"
Expected: FAIL because extractFacebookBootstrapCandidates does not exist.
- Step 3: Write minimal implementation
Add this helper near the parser utilities in packages/core/src/scrapers/facebook.ts:
export function extractFacebookBootstrapCandidates(htmlString: HTMLString): Record<string, unknown>[] {
const { document } = parseHTML(htmlString);
const scripts = document.querySelectorAll("script");
const candidates: Record<string, unknown>[] = [];
for (const script of Array.from(scripts) as HTMLScriptElement[]) {
const scriptText = script.textContent?.trim();
if (!scriptText) continue;
try {
const parsed = JSON.parse(scriptText);
if (isRecord(parsed)) {
candidates.push(parsed);
}
} catch {
// Ignore non-JSON script bodies.
}
}
return candidates;
}
- Step 4: Run test to verify it passes
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "bootstrap candidates"
Expected: PASS
- Step 5: Commit
git add packages/core/src/scrapers/facebook.ts packages/core/test/facebook-core.test.ts
git commit -m "refactor: add facebook bootstrap candidate extraction"
Task 3: Replace Search Parsing With Candidate Scoring
Files:
-
Modify:
packages/core/test/facebook-core.test.ts -
Modify:
packages/core/test/facebook-integration.test.ts -
Modify:
packages/core/src/scrapers/facebook.ts -
Test:
packages/core/test/facebook-core.test.ts -
Test:
packages/core/test/facebook-integration.test.ts -
Step 1: Write the failing tests
Add a core test for route-aware search extraction:
test("extracts search results from Comet bootstrap candidates", () => {
const html = `
<html><body>
<script>"XCometMarketplaceSearchController"</script>
<script>
${JSON.stringify({
payload: {
resultGroups: [
{
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "Bike",
listing_price: {
amount: "120.00",
formatted_amount: "CA$120",
currency: "CAD",
},
location: {
reverse_geocode: {
city_page: { display_name: "Toronto" },
},
},
is_live: true,
},
},
},
],
},
],
},
})}
</script>
</body></html>
`;
const ads = extractFacebookMarketplaceData(html);
expect(ads).toHaveLength(1);
expect(ads?.[0].node.listing.marketplace_listing_title).toBe("Bike");
});
Replace one integration fixture with a current-shape search fixture:
const mockSearchHtml = `
<html><body>
<script>"XCometMarketplaceSearchController"</script>
<script>${JSON.stringify({
payload: {
resultGroups: [
{
edges: [
{
node: {
listing: {
id: "1",
marketplace_listing_title: "iPhone 13",
listing_price: {
amount: "500.00",
formatted_amount: "CA$500",
currency: "CAD",
},
location: { reverse_geocode: { city_page: { display_name: "Toronto" } } },
is_live: true,
},
},
},
],
},
],
},
})}</script>
</body></html>
`;
- Step 2: Run test to verify it fails
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet bootstrap candidates"
Expected: FAIL because the current search extractor only understands legacy
marketplace_search shapes.
- Step 3: Write minimal implementation
Replace the search extraction internals in extractFacebookMarketplaceData() with
candidate scoring like this:
function findSearchEdges(candidate: unknown): FacebookEdge[] | null {
if (Array.isArray(candidate)) {
for (const item of candidate) {
const result = findSearchEdges(item);
if (result) return result;
}
return null;
}
if (!isRecord(candidate)) {
return null;
}
const directEdges = candidate.feed_units?.edges;
if (Array.isArray(directEdges)) {
return directEdges as FacebookEdge[];
}
const resultGroups = candidate.resultGroups;
if (Array.isArray(resultGroups)) {
for (const group of resultGroups) {
if (isRecord(group) && Array.isArray(group.edges)) {
return group.edges as FacebookEdge[];
}
}
}
for (const value of Object.values(candidate)) {
const result = findSearchEdges(value);
if (result) return result;
}
return null;
}
export function extractFacebookMarketplaceData(htmlString: HTMLString): FacebookAdNode[] | null {
const candidates = extractFacebookBootstrapCandidates(htmlString);
for (const candidate of candidates) {
const edges = findSearchEdges(candidate);
if (edges?.length) {
return edges.map((edge) => ({ node: edge.node }));
}
}
console.warn("No marketplace data found in HTML response");
return null;
}
- Step 4: Run test to verify it passes
Run:
bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts
Expected: PASS for the rewritten search fixtures and existing unaffected tests.
- Step 5: Commit
git add packages/core/src/scrapers/facebook.ts packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts
git commit -m "refactor: rewrite facebook search parser for comet bootstrap"
Task 4: Replace Item Parsing With Candidate Scoring
Files:
-
Modify:
packages/core/test/facebook-core.test.ts -
Modify:
packages/core/src/scrapers/facebook.ts -
Test:
packages/core/test/facebook-core.test.ts -
Step 1: Write the failing tests
Replace one old item fixture with a current-shape item fixture:
test("extracts item details from Comet permalink bootstrap candidates", () => {
const html = `
<html><body>
<script>"XCometMarketplacePermalinkController"</script>
<script>
${JSON.stringify({
payload: {
listing: {
id: "123",
__typename: "GroupCommerceProductItem",
marketplace_listing_title: "Vintage Chair",
formatted_price: { text: "CA$80" },
listing_price: { amount: "80.00", currency: "CAD", amount_with_offset: "80.00" },
redacted_description: { text: "Solid wood chair" },
location_text: { text: "Toronto, ON" },
marketplace_listing_seller: { id: "seller-1", name: "Alex" },
condition: "USED",
is_live: true,
},
},
})}
</script>
</body></html>
`;
const item = extractFacebookItemData(html);
expect(item?.id).toBe("123");
expect(item?.marketplace_listing_title).toBe("Vintage Chair");
});
- Step 2: Run test to verify it fails
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "Comet permalink bootstrap"
Expected: FAIL because the current item extractor depends on legacy permalink markers.
- Step 3: Write minimal implementation
Replace the item extraction internals with a semantic candidate finder like this:
function findMarketplaceItemCandidate(candidate: unknown): FacebookMarketplaceItem | null {
if (Array.isArray(candidate)) {
for (const item of candidate) {
const result = findMarketplaceItemCandidate(item);
if (result) return result;
}
return null;
}
if (!isRecord(candidate)) {
return null;
}
if (
candidate.id &&
candidate.__typename === "GroupCommerceProductItem" &&
candidate.marketplace_listing_title
) {
return candidate as FacebookMarketplaceItem;
}
for (const value of Object.values(candidate)) {
const result = findMarketplaceItemCandidate(value);
if (result) return result;
}
return null;
}
export function extractFacebookItemData(htmlString: HTMLString): FacebookMarketplaceItem | null {
const candidates = extractFacebookBootstrapCandidates(htmlString);
for (const candidate of candidates) {
const item = findMarketplaceItemCandidate(candidate);
if (item) {
return item;
}
}
return null;
}
- Step 4: Run test to verify it passes
Run: bun test packages/core/test/facebook-core.test.ts Expected: PASS for
current-shape item tests and remaining parser tests.
- Step 5: Commit
git add packages/core/src/scrapers/facebook.ts packages/core/test/facebook-core.test.ts
git commit -m "refactor: rewrite facebook item parser for comet bootstrap"
Task 5: Add HTML Fallback Extraction
Files:
-
Modify:
packages/core/test/facebook-core.test.ts -
Modify:
packages/core/src/scrapers/facebook.ts -
Test:
packages/core/test/facebook-core.test.ts -
Step 1: Write the failing tests
Add these fallback tests:
test("falls back to rendered search HTML when bootstrap payloads are undecodable", () => {
const html = `
<html><body>
<script>"XCometMarketplaceSearchController"</script>
<a href="https://www.facebook.com/marketplace/item/123/?ref=search">Vintage Lamp</a>
<span>CA$45</span>
<span>Toronto, ON</span>
</body></html>
`;
const ads = extractFacebookMarketplaceData(html);
const parsed = ads ? parseFacebookAds(ads) : [];
expect(parsed[0].title).toBe("Vintage Lamp");
expect(parsed[0].listingPrice?.amountFormatted).toBe("CA$45");
});
test("falls back to rendered item HTML when bootstrap payloads are undecodable", () => {
const html = `
<html><body>
<script>"XCometMarketplacePermalinkController"</script>
<h1>Vintage Desk</h1>
<span>CA$120</span>
<span>Condition Used - Good</span>
<div>Description Solid oak desk.</div>
<div>Seller information Jordan</div>
</body></html>
`;
const item = extractFacebookItemData(html);
expect(item?.marketplace_listing_title).toBe("Vintage Desk");
expect(item?.formatted_price?.text).toBe("CA$120");
});
- Step 2: Run test to verify it fails
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"
Expected: FAIL because the extractor currently returns null without a structured
candidate.
- Step 3: Write minimal implementation
Add route-specific HTML fallback helpers in packages/core/src/scrapers/facebook.ts:
function extractSearchFallback(htmlString: HTMLString): FacebookAdNode[] | null {
const idMatch = htmlString.match(/marketplace\/item\/(\d+)/);
const titleMatch = htmlString.match(/marketplace\/item\/\d+\/[^>]*>([^<]+)</);
const priceMatch = htmlString.match(/CA\$\d+(?:,\d{3})*(?:\.\d{2})?/);
const cityMatch = htmlString.match(/([A-Z][a-z]+,\s*[A-Z]{2})/);
if (!idMatch || !titleMatch || !priceMatch) return null;
return [
{
node: {
listing: {
id: idMatch[1],
marketplace_listing_title: titleMatch[1].trim(),
listing_price: {
amount: priceMatch[0].replace("CA$", "").replace(/,/g, ""),
formatted_amount: priceMatch[0],
currency: "CAD",
},
location: cityMatch
? { reverse_geocode: { city_page: { display_name: cityMatch[1].split(",")[0] } } }
: undefined,
is_live: true,
},
},
},
];
}
function extractItemFallback(htmlString: HTMLString): FacebookMarketplaceItem | null {
const titleMatch = htmlString.match(/<h1[^>]*>([^<]+)<\/h1>/i);
const priceMatch = htmlString.match(/CA\$\d+(?:,\d{3})*(?:\.\d{2})?/);
if (!titleMatch || !priceMatch) return null;
return {
id: "fallback-item",
__typename: "GroupCommerceProductItem",
marketplace_listing_title: titleMatch[1].trim(),
formatted_price: { text: priceMatch[0] },
listing_price: {
amount: priceMatch[0].replace("CA$", "").replace(/,/g, ""),
currency: "CAD",
amount_with_offset: priceMatch[0].replace("CA$", "").replace(/,/g, ""),
},
redacted_description: { text: htmlString.includes("Description") ? htmlString.split("Description")[1].split("<")[0].trim() : "" },
is_live: true,
};
}
Then call these helpers as the last fallback inside extractFacebookMarketplaceData()
and extractFacebookItemData().
- Step 4: Run test to verify it passes
Run:
bun test packages/core/test/facebook-core.test.ts --test-name-pattern "falls back"
Expected: PASS
- Step 5: Commit
git add packages/core/src/scrapers/facebook.ts packages/core/test/facebook-core.test.ts
git commit -m "refactor: add facebook html fallbacks"
Task 6: Wire Route-Aware Failures Into Entry Points
Files:
-
Modify:
packages/core/test/facebook-integration.test.ts -
Modify:
packages/core/src/scrapers/facebook.ts -
Test:
packages/core/test/facebook-integration.test.ts -
Step 1: Write the failing tests
Add these integration tests:
test("returns empty search results for auth-gated search HTML", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: true,
url: "https://www.facebook.com/login/?next=%2Fmarketplace%2Ftoronto%2Fsearch",
text: () => Promise.resolve("<html><body>You must log in to Facebook</body></html>"),
headers: { get: () => null },
}),
);
const results = await fetchFacebookItems("bike", 1, "toronto", 25);
expect(results).toEqual([]);
});
test("returns null for unavailable item responses", async () => {
global.fetch = mock(() =>
Promise.resolve({
ok: true,
url: "https://www.facebook.com/marketplace/toronto/?unavailable_product=1",
text: () => Promise.resolve("<html><body>Marketplace</body></html>"),
headers: { get: () => null },
}),
);
const item = await fetchFacebookItem("123");
expect(item).toBeNull();
});
- Step 2: Run test to verify it fails
Run:
bun test packages/core/test/facebook-integration.test.ts --test-name-pattern "auth-gated|unavailable"
Expected: FAIL because the entrypoints do not yet classify successful HTML responses by
route/auth state.
- Step 3: Write minimal implementation
Update both entrypoints to classify successful HTML before parsing:
const responseClass = classifyFacebookResponse(searchHtml, searchUrl);
if (responseClass.kind === "auth_gated") {
console.warn("Facebook marketplace search is auth-gated. Update FACEBOOK_COOKIE with a fresh raw Cookie header string.");
return [];
}
const itemResponseClass = classifyFacebookResponse(itemHtml, itemUrl);
if (itemResponseClass.kind === "auth_gated") {
console.warn(`Authentication failed for item ${itemId}. Cookies may be expired.`);
return null;
}
if (itemResponseClass.kind === "unavailable") {
console.warn(`Item ${itemId} appears to be unavailable in the marketplace.`);
return null;
}
Use the actual response URL from fetchHtml plumbing if that helper is extended to
return both HTML and final URL; otherwise start by threading final URL support through
the fetch helper in the same task.
- Step 4: Run test to verify it passes
Run: bun test packages/core/test/facebook-integration.test.ts Expected: PASS
- Step 5: Commit
git add packages/core/src/scrapers/facebook.ts packages/core/test/facebook-integration.test.ts
git commit -m "refactor: handle facebook route-aware failure states"
Task 7: Run Full Verification And Live Probe
Files:
-
Modify:
packages/core/src/scrapers/facebook.tsif small cleanup is required -
Modify:
packages/core/test/facebook-core.test.tsif small cleanup is required -
Modify:
packages/core/test/facebook-integration.test.tsif small cleanup is required -
Step 1: Run focused Facebook tests
Run:
bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts
Expected: PASS
- Step 2: Run broader core tests
Run: bun test packages/core/test Expected: PASS
- Step 3: Run live authenticated Facebook probe
Run:
set -a && source .env && set +a && bun --eval 'import { fetchFacebookItems, fetchFacebookItem } from "./packages/core/src/index.ts";
const results = await fetchFacebookItems("iphone", 1, "toronto", 3);
console.log("SEARCH_COUNT=" + results.length);
console.log(JSON.stringify(results[0] ?? null));
if (results[0]?.url) {
const match = results[0].url.match(/\/item\/(\d+)/);
if (match) {
const item = await fetchFacebookItem(match[1]);
console.log(JSON.stringify(item));
}
}'
Expected:
-
search returns at least one result
-
item fetch returns non-null for the first live result when the route is not stale/unavailable
-
Step 4: Make any minimal cleanup needed to keep tests and live probe green
If cleanup is needed, keep it limited to naming, dead-code removal caused by the rewrite, or small parser corrections directly exposed by the verification commands.
- Step 5: Re-run verification
Run:
bun test packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts && bun test packages/core/test
Expected: PASS
- Step 6: Commit
git add packages/core/src/scrapers/facebook.ts packages/core/test/facebook-core.test.ts packages/core/test/facebook-integration.test.ts
git commit -m "refactor: complete facebook comet scraper rewrite"
Self-Review
- Spec coverage: the plan covers classification, route-aware search parsing, route-aware item parsing, HTML fallbacks, explicit failure-state handling, test replacement, and live verification.
- Placeholder scan: no
TODO,TBD, or unspecified “handle appropriately” steps remain. - Type consistency: all planned functions and types use the same names across tasks:
classifyFacebookResponse,extractFacebookBootstrapCandidates,extractFacebookMarketplaceData, andextractFacebookItemData.