# Facebook Marketplace Anti-Bot Challenge Solver Design ## Summary Add a challenge-detection and challenge-solving layer to the Facebook Marketplace scraper so it can handle anti-bot gates (checkpoint pages, token rotation, cookie requirements) programmatically. Build the solver in pure Bun — no browser automation in production. Use `agent-browser` only for one-time debug reconnaissance. ## Goals - Identify which anti-bot challenge(s) Facebook Marketplace triggers against programmatic HTTP requests. - Implement detection + solving for each discovered challenge type. - Wire the solver into `fetchFacebookItems` and `fetchFacebookItem` so challenges are handled transparently. - Follow the same pattern as the existing `ebay-challenge.ts` (detect → solve → retry with clearance). - Zero browser automation at runtime. Pure `fetch` + `Bun` APIs + npm packages only. ## Non-Goals - Solving login/auth-wall challenges (those require fresh cookies — not solvable programmatically). - Full account login automation (cookies must be provided by the user). - Browser-based scraping or Puppeteer/Playwright integration. - Solving challenges for non-Marketplace Facebook endpoints. ## Current State The Facebook scraper (`packages/core/src/scrapers/facebook.ts`) fetches Marketplace search and item pages via authenticated `fetch` with cookies from `FACEBOOK_COOKIE` env var. It: - Sends a browser-like header set (`sec-ch-ua`, `user-agent`, etc.) - Parses SSR HTML for embedded JSON in script tags - Has no challenge detection — if Facebook returns a challenge page, the scraper silently fails (no listings parsed, classifies as “unknown”) - Depends entirely on cookie freshness The eBay scraper already follows the challenge-solver pattern in this codebase: `ebay.ts` uses `warmEbaySession()`, `isChallengeRedirect()`, `isChallengeHtml()`, and `solveEbayChallenge()` from `ebay-challenge.ts`. ## Chosen Approach **Reconnaissance-first development:** 1. Use `agent-browser` (debug only) to capture a real Facebook Marketplace browsing session via HAR. 2. Probe programmatic `fetch` to see what Facebook returns without a browser. 3. Diff the two to identify the gap (missing headers? missing cookies? missing JS execution?). 4. Build a modular solver in `packages/core/src/utils/facebook-challenge.ts` that detects each challenge type and applies the appropriate fix. 5. Wire it into `facebook.ts` following the eBay pattern. ## Design ### File Plan | File | Purpose | | --- | --- | | `packages/core/src/utils/facebook-challenge.ts` | Challenge detection, solving, and cookie/session utilities | | `packages/core/src/scrapers/facebook.ts` | Modified: warmup, challenge detection before parsing, retry loop | | `packages/core/test/facebook-challenge.test.ts` | Unit tests with mock challenge HTML fixtures | ### Flow ``` fetchFacebookItems(searchUrl) ├── warmFacebookSession() → GET facebook.com/ (collect datr + Akamai cookies) ├── fetchHtml(searchUrl) → receives response ├── detectFacebookChallenge(response) │ ├── checkpoint/challenge HTML → solveCheckpointChallenge() │ ├── redirect to /login → fail (cookies expired) │ ├── missing required cookies → regenerate session │ ├── 429 rate limit → backoff + retry (existing http.ts handles this) │ └── no challenge → proceed to parsing ├── if solveCheckpointChallenge succeeds → retry fetchHtml with clearance cookie └── parse results ``` ### Challenge Types (to be confirmed by reconnaissance) | Type | Expected Signal | Solving Strategy | | --- | --- | --- | | Login wall | Redirect to `/login` or HTML `"You must log in"` | Fail — user must provide fresh cookies | | Checkpoint page | HTML contains `checkpoint` or `challenge` path | Parse hidden form fields, compute proof-of-work if present, submit answer endpoint | | `datr` cookie missing | No `datr` in cookie jar → request fails | Fetch homepage first to obtain `datr` (session warmup) | | DTSG token needed | Form submissions fail with CSRF error | Extract `fb_dtsg` from page HTML, include in request body | | GraphQL header check | Request blocked without internal headers | Extract `x-fb-friendly-name` from browser HAR, replicate | | Akamai/bot-manager | Redirect loops or blank pages without Akamai cookies | Homepage warmup to collect `bm_sv`, `bm_mi`, etc. | ### Key Modules **`facebook-challenge.ts`:** ``` // Session warmup — fetch homepage to prime cookies warmFacebookSession(): Promise> // Challenge detection detectFacebookChallenge(html, status, url, headers): ChallengeType | null // Checkpoint solver solveCheckpointChallenge(html, cookies): Promise // DTSG token extraction extractDtsg(html): string | null // Cookie jar management (shared with ebay.ts pattern) mergeCookies(...): Record ``` **`ChallengeResult` type:** ```ts interface ChallengeResult { solved: boolean; cookies?: Record; // clearance cookies to replay token?: string; // challenge response token error?: string; // why it failed } ``` ### Error Handling - Solver failure → return `ChallengeResult { solved: false, error: "..." }`, scraper logs warning and returns empty results (never throws). - Unrecognized challenge → log the response URL and HTML snippet for future analysis. - Rate limits → handled by existing `http.ts` exponential backoff (no change needed). - Solver timeout → 30s cap on any challenge computation, fall back to `solved: false`. ### Testing | Test | What It Verifies | | --- | --- | | `detectFacebookChallenge` with sample checkpoint HTML | Correctly identifies checkpoint challenge | | `detectFacebookChallenge` with normal search HTML | Returns null (no false positives) | | `detectFacebookChallenge` with login redirect | Identifies auth-gated | | `solveCheckpointChallenge` with known PoW params | Produces correct answer | | `warmFacebookSession` with mocked fetch | Collects expected cookies | | `extractDtsg` with sample page HTML | Extracts the DTSG token | | Integration: fetch → challenge → solve → retry → results | End-to-end mock flow | | Solver throws → scraper returns empty, no crash | Graceful fallback | | Solver unknown challenge → logs warning, returns empty | No unhandled challenge crashes | Test data will use anonymized HTML fixtures (no real user data). ## Reconnaissance Steps (debug-only, one-time) 1. **Probe programmatically:** `fetch` Marketplace search with/without cookies, record status code and HTML. 2. **Browser session:** `agent-browser` → log into Facebook → navigate Marketplace → record HAR. 3. **Diff analysis:** Compare browser request headers vs. our programmatic headers. 4. **Cookie inventory:** List all cookies from browser session, identify which are essential. 5. **Challenge trigger:** Identify what change in request signature triggers a challenge. 6. **Replay test:** Replay browser’s exact request via `fetch` to confirm headers/cookies are the differentiator. All reconnaissance artifacts saved under `docs/facebook-challenge/`. ## Decisions Deferred to Post-Reconnaissance - Exact challenge types and solving strategies (depends on what Facebook actually uses). - Whether a PoW solver, CAPTCHA solver, or token-extraction approach is needed. - npm package dependencies (only add what the reconnaissance proves necessary).