Skip to main content
Version: Next

JobSpy Extractor

A walkthrough of the JobSpy extractor for Indeed, LinkedIn, and Glassdoor.

Original websites:

Big picture

JobSpy runs as a Python script per search term, writes JSON, then orchestrator ingests and normalizes into internal job shape.

1) Inputs and defaults

Key environment variables:

  • JOBSPY_SITES (default: indeed,linkedin)
  • JOBSPY_SEARCH_TERM (default: web developer)
  • JOBSPY_LOCATION (default: UK)
  • JOBSPY_RESULTS_WANTED (default: 200)
  • JOBSPY_HOURS_OLD (default: 72)
  • JOBSPY_COUNTRY_INDEED (default: UK)
  • JOBSPY_LINKEDIN_FETCH_DESCRIPTION (default: true)
  • JOBSPY_IS_REMOTE (unset by default)

2) Orchestrator flow

The service in orchestrator/src/server/services/jobspy.ts:

  • Builds search-term list from UI or env
  • Runs Python once per term with unique output file
  • Reads JSON and maps to CreateJobInput
  • De-dupes by jobUrl
  • Deletes temp output files best-effort

3) Mapping and cleanup

  • Normalizes salary ranges
  • Converts empty values to null
  • Keeps metadata like skills, ratings, remote flags when available
  • Skips rows with invalid site or missing URL

Notes

  • JOBSPY_SEARCH_TERMS can be JSON array or |, comma, newline-delimited text.
  • Set JOBSPY_LINKEDIN_FETCH_DESCRIPTION=0 to speed runs.
  • Temp output files are stored under data/imports/.
  • If workplace type is only Remote, JobSpy runs with JOBSPY_IS_REMOTE=true.
  • If workplace type includes Hybrid or Onsite, JobSpy cannot enforce those filters precisely, so the JobSpy-backed sources run without a workplace-type filter and may return broader results.

Common Problems

  • Hybrid or Onsite was selected, but Indeed, LinkedIn, or Glassdoor still returned remote jobs. JobSpy only supports a strict remote toggle. Any workplace-type selection that includes Hybrid or Onsite broadens those source results.
  • A run returned fewer LinkedIn descriptions than expected. JOBSPY_LINKEDIN_FETCH_DESCRIPTION=0 disables description fetching to speed up runs.
  • Different cities need different workplace-type filters. This is not supported in the current automatic-run flow. JobSpy receives one global workplace-type selection per run/query invocation.