Ingest

Plan or queue ingest from a public URL or an uploaded file (PDF, Markdown, plain text).

Operation Method Path
URL ingest POST /v1/ingest/url
File ingest POST /v1/ingest/file
List ingest jobs GET /v1/ingest/jobs?limit=20
Inspect a single job GET /v1/ingest/jobs/{jobId}

All four require an API key with scope ingest.

Both POSTs are dry-run by default (dryRun: true). Live writes require distill: true plus a datedAt value in YYYY-MM-DD form. Live writes return 202 Accepted with a queued job; the ingest worker drains the queue, fetches the artefact, distills, embeds, and writes the validated graph rows.

URL ingest

Request

POST /v1/ingest/url
{
  "url": "https://example.com/my-bio",
  "dryRun": false,
  "distill": true,
  "datedAt": "2026-05-12",
  "analyzeStyle": false,
  "distillerProvider": "anthropic",
  "distillerModel": "claude-haiku-4-5",
  "focus": "cross-functional delivery",
  "focusContext": 3,
  "idempotencyKey": "bio-2026-05-12"
}
Field Type Default Notes
url string http or https. Localhost and private addresses are blocked
dryRun boolean true When true, returns a plan and does not persist anything
distill boolean false Required true for live ingest
datedAt string YYYY-MM-DD. Required for live ingest
analyzeStyle boolean false Sample broader source spans for style evidence
distillLimit integer Cap the number of notes distilled in this run
distillerProvider enum env "openai" | "anthropic"
distillerModel string env Provider-specific model id
focus string Clip the fetched page around this term before distillation
focusContext integer 3 Lines of context to keep around each focus hit
idempotencyKey string 8 ≤ length ≤ 160. Identical key returns the existing queued job instead of duplicating it

Response: 200 OK (dry-run)

{
  "schemaVersion": "marrow-url-ingest-plan-v1",
  "mode": "dry-run",
  "plan": { /* notes, claims, projects, entities, facets, edges, warnings */ }
}

Response: 202 Accepted (live)

{
  "schemaVersion": "marrow-ingest-job-v1",
  "mode": "queued",
  "created": true,
  "job": { "id": "5d2a…", "status": "queued", "kind": "url" }
}

created: false indicates the idempotencyKey already had a job; the existing job is returned.

File ingest

Request

POST /v1/ingest/file
{
  "filename": "profile.md",
  "contentType": "text/markdown",
  "contentBase64": "IyBQcm9maWxlCg==",
  "dryRun": false,
  "distill": true,
  "datedAt": "2026-05-12"
}

Accepted content types include application/pdf, text/markdown, and text/plain. The file size cap is enforced by the API server; oversize uploads return 413 Payload Too Large.

Other fields mirror URL ingest: analyzeStyle, distillLimit, distillerProvider, distillerModel, idempotencyKey.

Response

  • 200 OK (dry-run) → marrow-file-ingest-plan-v1
  • 202 Accepted (live) → marrow-ingest-job-v1 with mode: "queued"

Live file ingest persists the raw artefact first (Supabase Storage, R2, or local in dev), then queues the job.

List ingest jobs

GET /v1/ingest/jobs?limit=20
{
  "schemaVersion": "marrow-ingest-job-list-v1",
  "jobs": [
    { "id": "5d2a…", "kind": "url", "status": "succeeded", "createdAt": "…", "finishedAt": "…" }
  ],
  "pagination": { "limit": 20, "returned": 1, "hasMore": false, "nextCursor": null }
}

Get a single job

GET /v1/ingest/jobs/{jobId}
{
  "schemaVersion": "marrow-ingest-job-v1",
  "job": {
    "id": "5d2a…",
    "kind": "url",
    "status": "succeeded",
    "attempts": 1,
    "sourceRunId": "f1a3…",
    "error": null,
    "createdAt": "…",
    "finishedAt": "…"
  }
}

Terminal statuses are succeeded, failed, and quarantined. quarantined means the run produced material that did not connect to any known anchor (profile owner, project, publication, organization, repository, tool); the rows are kept under source_kind=quarantine and not surfaced to read endpoints by default.

Errors

Code error.code Reason
400 invalid_ingest_request Live ingest missing distill: true or datedAt
401 unauthorized Missing or invalid API key
403 forbidden API key lacks scope ingest
404 not_found Job id does not exist for this account
413 payload_too_large File upload exceeded the configured size cap
422 validation_error Body or query failed validation
429 rate_limit_exceeded Rate limit hit

CLI mapping

npm run dev -- api ingest-url --dated-at 2026-05-12 https://example.com/profile
npm run dev -- api ingest-url --yes --dated-at 2026-05-12 https://example.com/profile
npm run dev -- api ingest-file --dated-at 2026-05-12 ./profile.md
npm run dev -- api ingest-file --yes --dated-at 2026-05-12 ./profile.md
npm run dev -- api ingest-jobs
npm run dev -- api ingest-job <job_id>