UKVisaJobs Extractor
UKVisaJobs is the most complex extractor because authenticated sessions are required.
Big picture
Two layers:
extractors/ukvisajobs/src/main.tshandles login/API calls and dataset output.orchestrator/src/server/services/ukvisajobs.tsexecutes extractor and ingests/de-dupes output.
1) Authentication and session cache
Session cache file:
extractors/ukvisajobs/storage/ukvisajobs-auth.json
Flow:
- Reuse cached token/cookies when valid
- Re-login with Playwright + Camoufox when needed
- Refresh and retry on token-expired responses
Force refresh:
UKVISAJOBS_REFRESH_ONLY=1
2) API requests
Endpoint:
https://my.ukvisajobs.com/ukvisa-api/api/fetch-jobs-data
Each request includes auth token + session cookies and paginates (15 jobs/page).
3) Mapping
- Normalizes salary from min/max/interval
- Builds fallback visa description when content missing
- Maps
job_linkto bothjobUrlandapplicationLink
4) Output dataset
Written to:
extractors/ukvisajobs/storage/datasets/default/
Includes per-job JSON files and combined jobs.json.
5) Orchestrator flow
- Spawns extractor (
npx tsx src/main.ts) - Runs terms sequentially with delay
- De-dupes by
sourceJobId(fallbackjobUrl) - Fetches detail pages when descriptions are too short
Controls
UKVISAJOBS_EMAIL,UKVISAJOBS_PASSWORDUKVISAJOBS_HEADLESSUKVISAJOBS_MAX_JOBS(default 50, max 200)UKVISAJOBS_SEARCH_KEYWORD
Practical notes
- Deleting auth cache forces next run to re-login.
- Low-concurrency/polite scraping by design.
- If extractor breaks, check session refresh path first.