facebook-challenge.ts: session warmup, header construction, and challenge type detection. Spec document for the anti-bot challenge solver design.
7.4 KiB
Facebook Marketplace Anti-Bot Challenge Solver Design
Summary
Add a challenge-detection and challenge-solving layer to the Facebook Marketplace
scraper so it can handle anti-bot gates (checkpoint pages, token rotation, cookie
requirements) programmatically.
Build the solver in pure Bun — no browser automation in production.
Use agent-browser only for one-time debug reconnaissance.
Goals
- Identify which anti-bot challenge(s) Facebook Marketplace triggers against programmatic HTTP requests.
- Implement detection + solving for each discovered challenge type.
- Wire the solver into
fetchFacebookItemsandfetchFacebookItemso challenges are handled transparently. - Follow the same pattern as the existing
ebay-challenge.ts(detect → solve → retry with clearance). - Zero browser automation at runtime.
Pure
fetch+BunAPIs + npm packages only.
Non-Goals
- Solving login/auth-wall challenges (those require fresh cookies — not solvable programmatically).
- Full account login automation (cookies must be provided by the user).
- Browser-based scraping or Puppeteer/Playwright integration.
- Solving challenges for non-Marketplace Facebook endpoints.
Current State
The Facebook scraper (packages/core/src/scrapers/facebook.ts) fetches Marketplace
search and item pages via authenticated fetch with cookies from FACEBOOK_COOKIE env
var. It:
- Sends a browser-like header set (
sec-ch-ua,user-agent, etc.) - Parses SSR HTML for embedded JSON in script tags
- Has no challenge detection — if Facebook returns a challenge page, the scraper silently fails (no listings parsed, classifies as “unknown”)
- Depends entirely on cookie freshness
The eBay scraper already follows the challenge-solver pattern in this codebase:
ebay.ts uses warmEbaySession(), isChallengeRedirect(), isChallengeHtml(), and
solveEbayChallenge() from ebay-challenge.ts.
Chosen Approach
Reconnaissance-first development:
- Use
agent-browser(debug only) to capture a real Facebook Marketplace browsing session via HAR. - Probe programmatic
fetchto see what Facebook returns without a browser. - Diff the two to identify the gap (missing headers? missing cookies? missing JS execution?).
- Build a modular solver in
packages/core/src/utils/facebook-challenge.tsthat detects each challenge type and applies the appropriate fix. - Wire it into
facebook.tsfollowing the eBay pattern.
Design
File Plan
| File | Purpose |
|---|---|
packages/core/src/utils/facebook-challenge.ts |
Challenge detection, solving, and cookie/session utilities |
packages/core/src/scrapers/facebook.ts |
Modified: warmup, challenge detection before parsing, retry loop |
packages/core/test/facebook-challenge.test.ts |
Unit tests with mock challenge HTML fixtures |
Flow
fetchFacebookItems(searchUrl)
├── warmFacebookSession() → GET facebook.com/ (collect datr + Akamai cookies)
├── fetchHtml(searchUrl) → receives response
├── detectFacebookChallenge(response)
│ ├── checkpoint/challenge HTML → solveCheckpointChallenge()
│ ├── redirect to /login → fail (cookies expired)
│ ├── missing required cookies → regenerate session
│ ├── 429 rate limit → backoff + retry (existing http.ts handles this)
│ └── no challenge → proceed to parsing
├── if solveCheckpointChallenge succeeds → retry fetchHtml with clearance cookie
└── parse results
Challenge Types (to be confirmed by reconnaissance)
| Type | Expected Signal | Solving Strategy |
|---|---|---|
| Login wall | Redirect to /login or HTML "You must log in" |
Fail — user must provide fresh cookies |
| Checkpoint page | HTML contains checkpoint or challenge path |
Parse hidden form fields, compute proof-of-work if present, submit answer endpoint |
datr cookie missing |
No datr in cookie jar → request fails |
Fetch homepage first to obtain datr (session warmup) |
| DTSG token needed | Form submissions fail with CSRF error | Extract fb_dtsg from page HTML, include in request body |
| GraphQL header check | Request blocked without internal headers | Extract x-fb-friendly-name from browser HAR, replicate |
| Akamai/bot-manager | Redirect loops or blank pages without Akamai cookies | Homepage warmup to collect bm_sv, bm_mi, etc. |
Key Modules
facebook-challenge.ts:
// Session warmup — fetch homepage to prime cookies
warmFacebookSession(): Promise<Record<string, string>>
// Challenge detection
detectFacebookChallenge(html, status, url, headers): ChallengeType | null
// Checkpoint solver
solveCheckpointChallenge(html, cookies): Promise<ChallengeResult>
// DTSG token extraction
extractDtsg(html): string | null
// Cookie jar management (shared with ebay.ts pattern)
mergeCookies(...): Record<string, string>
ChallengeResult type:
interface ChallengeResult {
solved: boolean;
cookies?: Record<string, string>; // clearance cookies to replay
token?: string; // challenge response token
error?: string; // why it failed
}
Error Handling
- Solver failure → return
ChallengeResult { solved: false, error: "..." }, scraper logs warning and returns empty results (never throws). - Unrecognized challenge → log the response URL and HTML snippet for future analysis.
- Rate limits → handled by existing
http.tsexponential backoff (no change needed). - Solver timeout → 30s cap on any challenge computation, fall back to
solved: false.
Testing
| Test | What It Verifies |
|---|---|
detectFacebookChallenge with sample checkpoint HTML |
Correctly identifies checkpoint challenge |
detectFacebookChallenge with normal search HTML |
Returns null (no false positives) |
detectFacebookChallenge with login redirect |
Identifies auth-gated |
solveCheckpointChallenge with known PoW params |
Produces correct answer |
warmFacebookSession with mocked fetch |
Collects expected cookies |
extractDtsg with sample page HTML |
Extracts the DTSG token |
| Integration: fetch → challenge → solve → retry → results | End-to-end mock flow |
| Solver throws → scraper returns empty, no crash | Graceful fallback |
| Solver unknown challenge → logs warning, returns empty | No unhandled challenge crashes |
Test data will use anonymized HTML fixtures (no real user data).
Reconnaissance Steps (debug-only, one-time)
- Probe programmatically:
fetchMarketplace search with/without cookies, record status code and HTML. - Browser session:
agent-browser→ log into Facebook → navigate Marketplace → record HAR. - Diff analysis: Compare browser request headers vs. our programmatic headers.
- Cookie inventory: List all cookies from browser session, identify which are essential.
- Challenge trigger: Identify what change in request signature triggers a challenge.
- Replay test: Replay browser’s exact request via
fetchto confirm headers/cookies are the differentiator.
All reconnaissance artifacts saved under docs/facebook-challenge/.
Decisions Deferred to Post-Reconnaissance
- Exact challenge types and solving strategies (depends on what Facebook actually uses).
- Whether a PoW solver, CAPTCHA solver, or token-extraction approach is needed.
- npm package dependencies (only add what the reconnaissance proves necessary).