149 lines
5.1 KiB
Markdown
149 lines
5.1 KiB
Markdown
# Cookie Env-Only Design
|
|
|
|
## Summary
|
|
|
|
Remove all file-based and request-provided cookie inputs across the repo.
|
|
The only supported authentication input becomes a raw `Cookie` header string supplied
|
|
through scraper-specific environment variables such as `FACEBOOK_COOKIE` and
|
|
`EBAY_COOKIE`.
|
|
|
|
## Goals
|
|
|
|
- Remove cookie file fallback from shared and marketplace-specific code.
|
|
- Remove request-level cookie overrides from public scraper entrypoints.
|
|
- Remove deprecated cookie-path parameters from Facebook APIs.
|
|
- Keep cookie parsing deterministic and limited to raw header-string input.
|
|
- Update tests and docs so the public contract matches the implementation.
|
|
|
|
## Non-Goals
|
|
|
|
- Changing scraper behavior unrelated to authentication input.
|
|
- Adding new cookie formats or migration helpers.
|
|
- Preserving backward compatibility for cookie files, JSON cookie arrays, or request
|
|
overrides.
|
|
|
|
## Current State
|
|
|
|
The current shared cookie utilities support three sources in priority order:
|
|
|
|
1. Request parameter
|
|
2. Environment variable
|
|
3. Cookie file
|
|
|
|
`packages/core/src/utils/cookies.ts` includes file loading, JSON array parsing, and
|
|
auto-detection between JSON and header-string formats.
|
|
Facebook also exposes deprecated `cookiePath` arguments that still reach shared loading
|
|
logic. Docs in `cookies/AGENTS.md` still describe file-based setup and request-level
|
|
overrides.
|
|
|
|
## Chosen Approach
|
|
|
|
Use the hard-reset approach.
|
|
Delete the shared multi-source cookie-loading model and reduce the cookie surface to
|
|
env-header parsing only.
|
|
This is a larger diff than a surgical removal, but it avoids leaving behind abstractions
|
|
that imply unsupported inputs still exist.
|
|
|
|
## Design
|
|
|
|
### Shared Cookie Utilities
|
|
|
|
`packages/core/src/utils/cookies.ts` will keep only the pieces needed for
|
|
env-header-based auth:
|
|
|
|
- `Cookie` type
|
|
- A reduced cookie config shape containing only `name`, `domain`, and `envVar`
|
|
- `parseCookieString()` for raw `Cookie` header strings
|
|
- `formatCookiesForHeader()` for domain filtering and request formatting
|
|
- An env-only loader that reads `process.env[config.envVar]`, parses it, and throws a
|
|
targeted error when missing or invalid
|
|
|
|
The following shared utilities will be removed:
|
|
|
|
- JSON cookie-array parsing
|
|
- Auto-detection between JSON and header-string formats
|
|
- File loading helpers
|
|
- Optional loaders whose behavior depends on file fallback or request input
|
|
|
|
### Marketplace Scrapers
|
|
|
|
Marketplace scrapers that require auth will read cookies only from their env vars.
|
|
|
|
For Facebook this means:
|
|
|
|
- Remove `_cookiePath` / `cookiePath` parameters from helper and public functions
|
|
- Remove any docs/comments that mention parameter > env > file precedence
|
|
- Update auth failure messaging to name only `FACEBOOK_COOKIE`
|
|
|
|
For eBay this means:
|
|
|
|
- Remove any remaining fallback/file-oriented behavior from shared calls and error
|
|
strings
|
|
- Keep the existing env-var auth path, but make it the only path
|
|
|
|
### Public API Surface
|
|
|
|
Exports from `packages/core/src/index.ts` should reflect the new contract.
|
|
If exported functions currently advertise cookie-source or cookie-path arguments, their
|
|
signatures will be tightened so callers cannot pass unsupported inputs.
|
|
|
|
Downstream adapter packages should continue calling core through the simplified
|
|
signatures without adding their own cookie-loading behavior.
|
|
|
|
### Error Handling
|
|
|
|
There are now only two auth failure modes:
|
|
|
|
1. The required env var is missing or empty.
|
|
2. The env var does not contain any valid `name=value` cookie pairs.
|
|
|
|
Errors should be blunt and specific:
|
|
|
|
- identify the missing env var by name
|
|
- state that the value must be a raw `Cookie` header string
|
|
- stop mentioning request parameters, cookie paths, JSON arrays, or `./cookies/*.json`
|
|
|
|
### Testing Strategy
|
|
|
|
Follow TDD. Start by changing or adding core tests so the old file/request behavior is
|
|
no longer accepted.
|
|
|
|
Coverage targets:
|
|
|
|
1. Valid env header strings still parse into cookies correctly.
|
|
2. Missing env vars fail with the new env-only error.
|
|
3. Invalid env strings fail without falling back to files or request data.
|
|
4. Facebook APIs no longer expose or honor cookie-path/request-cookie behavior.
|
|
5. Existing tests that depended on missing files or JSON cookie arrays are rewritten to
|
|
the env-only contract.
|
|
|
|
Verification target after implementation:
|
|
|
|
- `bun test packages/core/test`
|
|
- `bun run ci`
|
|
- `bun run build` if any cross-package signature changes require downstream verification
|
|
|
|
## Documentation Changes
|
|
|
|
Update cookie-related docs to match the new contract:
|
|
|
|
- remove file-based setup instructions
|
|
- remove request-parameter cookie examples
|
|
- document env vars as the only supported auth input
|
|
- show raw `Cookie` header-string examples only
|
|
|
|
## Risks
|
|
|
|
- External callers using request cookie overrides will break at compile time or runtime,
|
|
depending on how they consume the package.
|
|
- Recent work added support for custom Facebook cookie paths, so removing that path
|
|
intentionally reverses a newly introduced behavior.
|
|
- Tests that currently model missing-file behavior must be rewritten rather than
|
|
preserved.
|
|
|
|
## Rollout Notes
|
|
|
|
This is an intentional contract break.
|
|
The code, tests, and docs should all land together so there is no mixed messaging about
|
|
supported cookie sources.
|