feat: fallback to alternate model when max retries exceeded with 5xx errors

When all accounts fail with HTTP 500/503 errors (e.g., Google API returning
'Unknown Error' for Claude models on large conversations), the proxy now
attempts to use a fallback model if --fallback is enabled.

This enables graceful degradation when:
- All accounts are exhausted due to 5xx errors (not just rate limits)
- Claude models fail on very large conversations
- The API has temporary issues with specific models

The fallback uses the existing MODEL_FALLBACK_MAP configuration:
- claude-opus-4-5-thinking -> gemini-3-pro-high
- claude-sonnet-4-5-thinking -> gemini-3-flash

Relates to #88
This commit is contained in:
Tiago Rodrigues
2026-01-10 12:08:53 +00:00
parent ce2cb72563
commit 0b477c2552
2 changed files with 21 additions and 0 deletions

View File

@@ -285,6 +285,17 @@ export async function* sendMessageStream(anthropicRequest, accountManager, fallb
}
}
// All retries exhausted - try fallback model if enabled
if (fallbackEnabled) {
const fallbackModel = getFallbackModel(model);
if (fallbackModel) {
logger.warn(`[CloudCode] All retries exhausted for ${model}. Attempting fallback to ${fallbackModel} (streaming)`);
const fallbackRequest = { ...anthropicRequest, model: fallbackModel };
yield* sendMessageStream(fallbackRequest, accountManager, false); // Disable fallback for recursive call
return;
}
}
throw new Error('Max retries exceeded');
}