Gradcracker Extractor
A plain-English walkthrough of the Gradcracker extractor in extractors/gradcracker.
Original website: gradcracker.com
What it is
The Gradcracker extractor finds UK graduate roles from gradcracker.com.
It now uses a fast HTTP-first scraper for normal runs. The scraper fetches Gradcracker list and detail HTML with a browser-like HTTP fingerprint, parses job cards locally, and decodes Gradcracker apply links without opening a browser. The older Playwright/Crawlee flow remains as a fallback when the HTTP path is blocked.
Why it exists
Gradcracker is useful for UK graduate and early-career STEM roles that broad aggregators often miss.
The HTTP-first implementation keeps the same normalized job output while avoiding the startup cost of launching a browser for every successful run.
How to use it
- Open Run jobs and choose Automatic.
- Select United Kingdom as the country.
- Leave Gradcracker enabled in Sources or toggle it on.
- Set your usual search terms and run budget.
- Start the run and monitor progress in the pipeline progress card.
Defaults and controls:
- Search terms are converted to Gradcracker role slugs, such as
software systemstosoftware-systems. - Defaults include
web-developmentandsoftware-systems. GRADCRACKER_MAX_JOBS_PER_TERMcontrols the per-term cap.GRADCRACKER_HTTP_DETAIL_CONCURRENCYcontrols concurrent detail-page fetches. The default is2.GRADCRACKER_HTTP_REQUEST_DELAY_MScontrols the minimum delay between HTTP request starts. The default is1000.JOBOPS_SKIP_APPLY_FOR_EXISTING=1andJOBOPS_EXISTING_JOB_URLS_FILEare still honored by the browser fallback.GRADCRACKER_FORCE_BROWSER=1forces the legacy Playwright/Crawlee path.GRADCRACKER_DISABLE_BROWSER_FALLBACK=1returns the HTTP scraper result directly if the fast path is blocked.
Implementation flow:
- Build search URLs from UK regions and role terms.
- Fetch list pages and parse
article[wire:key]job cards. - Fetch detail pages for new jobs only.
- Extract
.body-contentdescription text. - Decode Gradcracker
/out/...apply URLs from theuquery parameter. - Reuse saved Cloudflare clearance cookies from the headed solve flow on the HTTP retry.
- Fall back to Playwright/Crawlee only when the HTTP path cannot proceed.
Common problems
Gradcracker does not return jobs
- Confirm the selected country is United Kingdom.
- Try Gradcracker-specific terms such as
software systems,web development, ordata science. - Lower the run budget if a term is too broad and you only need the newest listings.
The HTTP scraper is blocked
- Leave browser fallback enabled so the extractor can use the existing Playwright/Crawlee challenge handling.
- When the app opens a challenge browser, complete the challenge and wait for the solver to save a
cf_clearancecookie. The next HTTP retry uses that saved cookie and the same browser user agent. - Set
GRADCRACKER_FORCE_BROWSER=1when you specifically need to debug the legacy browser flow.
Apply links stay on Gradcracker
- Some listings may not expose a decodable
/out/...target. - The extractor still stores the Gradcracker job URL, so those postings remain usable even when the final application URL is unavailable.