๐ง Troubleshooting
Common errors, error detection patterns, and debugging strategies for Bifrost jobs.
Error Types
| Error |
Status Set |
Recovery |
| Rate Limit |
rate_limited |
Auto-retry when quota resets. Can cancel or manually resume. |
| Model Error |
failed |
Check model name. Resubmit with correct model. |
| Auth Error |
failed |
Check API key / credentials. Restart Bifrost after fixing .env. |
| Timeout |
failed (cancelled) |
Reduce scope or use autopilot type (60 min). Check for stuck loops. |
| Inactivity |
failed (cancelled) |
Agent is stuck. Check prompt clarity. Gemini: 5 min, Claude: 10 min. |
| Spawn Error |
failed |
CLI binary not found. Check CLAUDE_PATH / GEMINI_PATH in .env. |
Gemini Error Detection
Bifrost actively monitors Gemini CLI stderr for known error patterns and handles them proactively.
| Error Type |
Detection Patterns |
| Rate Limit |
429, RESOURCE_EXHAUSTED, quota exceeded,
too many requests, retryWithBackoff |
| Model Error |
ModelNotFoundError, model not found, invalid model |
| Auth Error |
PERMISSION_DENIED, API key not valid, UNAUTHENTICATED,
authentication failed |
Why stderr? Gemini CLI writes error messages to stderr, not NDJSON stdout. Bifrost parses
both streams separately โ NDJSON from stdout, error detection from stderr.
Common Issues
1. Bifrost shows offline (red indicator)
| Check |
Fix |
| PM2 status |
pm2 status โ is bifrost-api "online"? |
| Heartbeat |
Heartbeat POSTs to /api/remote-jobs?heartbeat=1 every 30s. Check D1 for recent heartbeats.
|
| Startup guard |
PM2 sets PM2_HOME env var. The server checks isDirectRun โ if it doesn't
detect PM2, it won't start. Check server.js. |
| Port conflict |
Default port 4003. Check if another process is using it: netstat -an | findstr 4003 |
2. Job stays "pending" forever
| Check |
Fix |
| Bifrost running? |
Check Bifrost health indicator. If offline, start it. |
| Polling working? |
Bifrost polls for pending jobs. Check Bifrost logs for poll activity. |
| Job in D1? |
GET /api/remote-jobs?status=pending โ verify the job exists. |
3. Job fails immediately
| Check |
Fix |
| CLI binary exists? |
Verify CLAUDE_PATH or GEMINI_PATH points to a valid executable. |
| API key set? |
Gemini needs GEMINI_API_KEY. Claude needs its own auth setup. |
| Invalid model? |
Check the model name matches exactly (e.g. gemini-3.1-pro-preview, not
gemini-pro). |
| Job type valid? |
Must be one of: session, research, workflow, clara,
autopilot. |
4. Job output is empty
| Check |
Fix |
| NDJSON parsing |
Output comes from assistant NDJSON events. If CLI isn't producing NDJSON, check the
--output-format stream-json flag. |
| SSE connection |
SSE viewer connects to /jobs/:id/stream. Check browser console for connection errors. |
| Prompt issue |
The agent may have no actionable work. Check the prompt in the job's channel messages. |
5. Rate limit but no auto-retry
| Check |
Fix |
| Session ID captured? |
Auto-retry needs a session_id from the CLI's init event. If it fails before init, no
session ID is available. |
| Reset time parsed? |
The reset time parser looks for patterns like "resets 9pm". If the error message format changes, parsing
may fail. |
| Retry queue running? |
The scheduler checks every 60 seconds. Verify in Bifrost logs: [retry] Fired N retry(s).
|
Debugging Tips
| Tool |
How to Use |
| Bifrost logs |
pm2 logs bifrost-api โ see all console output including spawn commands, NDJSON events,
errors. |
| Health endpoint |
GET bifrost.mipos.io:4003/health โ uptime, active job count, version. |
| Job channels |
Check #job-{id} channel in Channels tab โ has start, progress, and completion messages.
|
| D1 records |
GET /api/remote-jobs?id={id} โ full job record including status, error, output, timestamps.
|
| SSE direct |
curl bifrost.mipos.io:4003/jobs/{id}/stream โ raw SSE stream for debugging output issues.
|
Recovery Playbook
| Scenario |
Action |
| Job stuck running |
Click Stop button in Remote tab, or DELETE /jobs/{id} |
| Rate limited, want to retry now |
Cancel the queued retry, then manually resume with the session ID |
| Bifrost crashed mid-job |
Restart Bifrost. In-memory jobs are lost. PATCH the stuck D1 record to failed |
| Wrong model selected |
Cancel the job. Resubmit with the correct model in the payload |
| Job produced partial output |
Check the #job channel for progress messages. Resume from checkpoint (autopilot) or session ID (session)
|