๐ Job Lifecycle
Every job follows a predictable lifecycle โ from submission through execution to completion or recovery.
Status Flow
The happy path:
pending
โ
running
โ
done โ
Error paths:
running
โ
failed โ
running
โ
cancelled ๐
Rate limit recovery:
running
โ
rate_limited
โ auto-retry
running
โ
done โ
Scheduled execution:
scheduled โฐ
โ at time
running
โ
done โ
Status Reference
| Status | Meaning | Terminal? |
|---|---|---|
pending |
Job created in D1, waiting for Bifrost to pick up | No |
scheduled |
Job queued for future execution (has scheduledAt timestamp) |
No |
running |
CLI process is executing. SSE output streams in real-time | No |
done |
CLI exited with code 0. Output saved to D1 | Yes โ |
failed |
CLI exited with non-zero code, or was killed by timeout/error | Yes โ |
rate_limited |
Hit API quota. Auto-retry queued if session ID captured | No (recovery) |
cancelled |
Manually stopped via UI Stop button or DELETE API | Yes ๐ |
Timeouts
Per-Type Timeout
Each job type has a maximum execution time. If exceeded, the job is automatically cancelled.
| Job Type | Timeout |
|---|---|
| session | 10 minutes |
| research | 10 minutes |
| workflow | 10 minutes |
| clara | 10 minutes |
| autopilot | 60 minutes |
Inactivity Timeout
If a job produces no output (stdout or stderr) for too long, it's killed as stuck.
| Backend | Inactivity Limit | Why |
|---|---|---|
| Gemini | 5 minutes | Gemini can silently hang on internal retries |
| Claude | 10 minutes | Claude sessions can have long thinking phases |
Inactivity vs total timeout: A job can be killed by either limit. A 10-minute session
producing constant output will hit the total timeout. A session producing nothing after 5 minutes (Gemini)
will hit inactivity first.
Rate Limit & Auto-Retry
When a job hits an API rate limit, Bifrost handles recovery automatically:
Rate limit detected
โ
CLI killed
โ
Parse reset time
โ
Queue auto-retry
How It Works
| # | Step |
|---|---|
| 1 | Rate limit event detected โ either from NDJSON rate_limit event (Claude) or stderr pattern
matching (Gemini) |
| 2 | CLI process killed (it's stuck at the limit anyway) |
| 3 | Reset time parsed from the error message (e.g. "resets 9pm (Asia/Kuala_Lumpur)") |
| 4 | Job status set to rate_limited with retry_at timestamp |
| 5 | Retry queued in retry-queue.js (in-memory, checked every 60s) |
| 6 | Portal shows countdown timer and Cancel Retry / Resume buttons |
| 7 | When retry_at is reached, Bifrost auto-resumes using resumeSessionId |
Quota reset tracking: Bifrost auto-updates the
quotaResetTime in portal
preferences, so the Settings panel always shows the latest reset time.
Session Resume
Both Claude and Gemini CLIs support session resume โ continuing from where a previous session left off.
| Trigger | How It Works |
|---|---|
| Auto-retry (rate limit) | Bifrost captures session_id from NDJSON init event, then resumes with --resume
flag |
| Manual resume | Click "Resume" on a rate-limited job in the portal. Uses the stored session_id |
| Explicit payload | Set resumeSessionId in the job payload to resume any previous session |
Scheduled Jobs
Jobs can be scheduled for future execution by setting scheduledAt in the payload.
| Field | Detail |
|---|---|
| Format | ISO 8601 timestamp (e.g. 2026-03-16T02:00:00+08:00) |
| Initial status | scheduled |
| Mechanism | Added to retry queue with retryAt set to scheduledAt. Checked every 60
seconds. |
| Channel message | โฐ announcement posted to #dev with scheduled time |
Use case: Schedule a Clara sweep for 2 AM when API quota resets, or queue an autopilot
program for overnight execution.
Completion Metadata
When a job completes, Bifrost captures and reports:
| Metric | Source | Shown In |
|---|---|---|
| Cost (USD) | Claude result event |
Job card, #dev channel |
| Turn count | Claude/Gemini result event |
Job card, #dev channel |
| Duration | CLI duration_ms |
Job card, #dev channel |
| Output text | Accumulated from assistant messages | SSE viewer, D1 (truncated to 10KB) |
| Session ID | NDJSON system/init event |
D1 (for resume) |