๐Ÿ”„ Job Lifecycle

Every job follows a predictable lifecycle โ€” from submission through execution to completion or recovery.

Status Flow

The happy path:

pending โ†’ running โ†’ done โœ“

Error paths:

running โ†’ failed โœ—
running โ†’ cancelled ๐Ÿ›‘

Rate limit recovery:

running โ†’ rate_limited โ†’ auto-retry running โ†’ done โœ“

Scheduled execution:

scheduled โฐ โ†’ at time running โ†’ done โœ“

Status Reference

Status Meaning Terminal?
pending Job created in D1, waiting for Bifrost to pick up No
scheduled Job queued for future execution (has scheduledAt timestamp) No
running CLI process is executing. SSE output streams in real-time No
done CLI exited with code 0. Output saved to D1 Yes โœ“
failed CLI exited with non-zero code, or was killed by timeout/error Yes โœ—
rate_limited Hit API quota. Auto-retry queued if session ID captured No (recovery)
cancelled Manually stopped via UI Stop button or DELETE API Yes ๐Ÿ›‘

Timeouts

Per-Type Timeout

Each job type has a maximum execution time. If exceeded, the job is automatically cancelled.

Job Type Timeout
session 10 minutes
research 10 minutes
workflow 10 minutes
clara 10 minutes
autopilot 60 minutes

Inactivity Timeout

If a job produces no output (stdout or stderr) for too long, it's killed as stuck.

Backend Inactivity Limit Why
Gemini 5 minutes Gemini can silently hang on internal retries
Claude 10 minutes Claude sessions can have long thinking phases
Inactivity vs total timeout: A job can be killed by either limit. A 10-minute session producing constant output will hit the total timeout. A session producing nothing after 5 minutes (Gemini) will hit inactivity first.

Rate Limit & Auto-Retry

When a job hits an API rate limit, Bifrost handles recovery automatically:

Rate limit detected โ†’ CLI killed โ†’ Parse reset time โ†’ Queue auto-retry

How It Works

# Step
1 Rate limit event detected โ€” either from NDJSON rate_limit event (Claude) or stderr pattern matching (Gemini)
2 CLI process killed (it's stuck at the limit anyway)
3 Reset time parsed from the error message (e.g. "resets 9pm (Asia/Kuala_Lumpur)")
4 Job status set to rate_limited with retry_at timestamp
5 Retry queued in retry-queue.js (in-memory, checked every 60s)
6 Portal shows countdown timer and Cancel Retry / Resume buttons
7 When retry_at is reached, Bifrost auto-resumes using resumeSessionId
Quota reset tracking: Bifrost auto-updates the quotaResetTime in portal preferences, so the Settings panel always shows the latest reset time.

Session Resume

Both Claude and Gemini CLIs support session resume โ€” continuing from where a previous session left off.

Trigger How It Works
Auto-retry (rate limit) Bifrost captures session_id from NDJSON init event, then resumes with --resume flag
Manual resume Click "Resume" on a rate-limited job in the portal. Uses the stored session_id
Explicit payload Set resumeSessionId in the job payload to resume any previous session

Scheduled Jobs

Jobs can be scheduled for future execution by setting scheduledAt in the payload.

Field Detail
Format ISO 8601 timestamp (e.g. 2026-03-16T02:00:00+08:00)
Initial status scheduled
Mechanism Added to retry queue with retryAt set to scheduledAt. Checked every 60 seconds.
Channel message โฐ announcement posted to #dev with scheduled time
Use case: Schedule a Clara sweep for 2 AM when API quota resets, or queue an autopilot program for overnight execution.

Completion Metadata

When a job completes, Bifrost captures and reports:

Metric Source Shown In
Cost (USD) Claude result event Job card, #dev channel
Turn count Claude/Gemini result event Job card, #dev channel
Duration CLI duration_ms Job card, #dev channel
Output text Accumulated from assistant messages SSE viewer, D1 (truncated to 10KB)
Session ID NDJSON system/init event D1 (for resume)