Add an Extractor
What it is
This guide explains how to add a new extractor that is auto-registered at orchestrator startup.
The extractor runtime is discovered from a local manifest.ts file, and the source is type-safe across API/client through the shared catalog in shared/src/extractors/index.ts.
Extractor manifests must live in extractor packages under extractors/<name>/ only. Do not add manifest files inside orchestrator/.
Extractor run logic should also live in the extractor package so orchestrator stays extractor-agnostic.
Why it exists
Without a manifest contract, adding extractors required touching multiple orchestrator files.
With the manifest system, contributors only need to:
- Add a manifest in their extractor package.
- Add the new source id to the shared typed catalog.
That keeps runtime wiring dynamic while preserving compile-time safety in API and client code.
How to use it
- Create your extractor package under
extractors/<name>/. - Add a
manifest.tsin the extractor package root (orsrc/manifest.ts).- Valid locations are only
extractors/<name>/manifest.tsorextractors/<name>/src/manifest.ts. orchestrator/**/manifest.tsis not used for extractor discovery.
- Valid locations are only
- Export a manifest with:
iddisplayNameprovidesSourcesrequiredEnvVars(optional)run(context)that returns{ success, jobs, error? }
- Add the new source id to
shared/src/extractors/index.ts:- append to
EXTRACTOR_SOURCE_IDS - add an entry in
EXTRACTOR_SOURCE_METADATA
- append to
- Ensure your extractor maps output to
CreateJobInput[]. - Run the full CI checks.
Example manifest:
import type { ExtractorManifest } from "@shared/types/extractors";
export const manifest: ExtractorManifest = {
id: "myextractor",
displayName: "My Extractor",
providesSources: ["myextractor"],
requiredEnvVars: ["MYEXTRACTOR_API_KEY"],
async run(context) {
// context.searchTerms, context.settings, context.onProgress, context.shouldCancel
const jobs = [];
return { success: true, jobs };
},
};
export default manifest;
Subprocess extractors are supported. Keep subprocess spawning inside run(context) so orchestrator only depends on the manifest contract.
Common problems
Extractor not discovered at startup
- Check file path:
extractors/<name>/manifest.tsorextractors/<name>/src/manifest.ts. - Ensure the file exports
defaultor namedmanifest.
Source compiles in extractor but fails in API/client
- Add the new source id to
shared/src/extractors/index.ts. - Confirm metadata exists for that source id.
Source appears in shared catalog but is unavailable at runtime
- The manifest was not loaded successfully.
- Check startup logs for registry warnings.
Source requires credentials but never returns jobs
- Add and validate
requiredEnvVars. - Verify your manifest
run(context)reads settings/env values correctly.