feat: comprehensive rate limit handling overhaul (inspired by opencode-antigravity-auth)
This commit addresses "Max retries exceeded" errors during stress testing where all accounts would become exhausted simultaneously due to short per-second rate limits triggering cascading failures. ## Rate Limit Parser (`rate-limit-parser.js`) - Remove 2s buffer enforcement that caused cascading failures when API returned short reset times (200-600ms). Now adds 200ms buffer for sub-500ms resets - Add `parseRateLimitReason()` for smart backoff based on error type: QUOTA_EXHAUSTED, RATE_LIMIT_EXCEEDED, MODEL_CAPACITY_EXHAUSTED, SERVER_ERROR ## Message/Streaming Handlers - Add per-account+model rate limit state tracking with exponential backoff - For short rate limits (< 1 second), wait and retry on same account instead of switching - prevents thundering herd when all accounts hit per-second limits - Add throttle wait support for fallback modes (emergency/lastResort) - Add `calculateSmartBackoff()` with progressive tiers by error type ## HybridStrategy (`hybrid-strategy.js`) - Refactor `#getCandidates()` to return 4 fallback levels: - `normal`: All filters pass (health, tokens, quota) - `quota`: Bypass critical quota check - `emergency`: Bypass health check when ALL accounts unhealthy - `lastResort`: Bypass BOTH health AND token bucket checks - Add throttle wait times: 500ms for lastResort, 250ms for emergency - Fix LRU calculation to use seconds (matches opencode-antigravity-auth) ## Health Tracker - Increase `recoveryPerHour` from 2 to 10 for faster recovery (1 hour vs 5 hours) ## Account Manager - Add consecutive failure tracking: `getConsecutiveFailures()`, `incrementConsecutiveFailures()`, `resetConsecutiveFailures()` - Add cooldown mechanism separate from rate limits with `CooldownReason` - Reset consecutive failures on successful request ## Base Strategy - Add `isAccountCoolingDown()` check in `isAccountUsable()` ## Constants - Replace fixed `CAPACITY_RETRY_DELAY_MS` with progressive `CAPACITY_BACKOFF_TIERS_MS` - Add `BACKOFF_BY_ERROR_TYPE` for smart backoff - Add `QUOTA_EXHAUSTED_BACKOFF_TIERS_MS` for progressive quota backoff - Add `MIN_BACKOFF_MS` floor to prevent "Available in 0s" loops - Increase `MAX_CAPACITY_RETRIES` from 3 to 5 - Reduce `RATE_LIMIT_DEDUP_WINDOW_MS` from 5s to 2s ## Frontend - Remove `capacityRetryDelayMs` config (replaced by progressive tiers) - Update default `maxCapacityRetries` display from 3 to 5 ## Testing - Add `tests/stress-test.cjs` for concurrent request stress testing Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -103,16 +103,33 @@ export const MAX_ACCOUNTS = config?.maxAccounts || 10; // From config or 10
|
||||
// Rate limit wait thresholds
|
||||
export const MAX_WAIT_BEFORE_ERROR_MS = config?.maxWaitBeforeErrorMs || 120000; // From config or 2 minutes
|
||||
|
||||
// Gap 1: Retry deduplication - prevents thundering herd on concurrent rate limits
|
||||
export const RATE_LIMIT_DEDUP_WINDOW_MS = config?.rateLimitDedupWindowMs || 5000; // 5 seconds
|
||||
// Retry deduplication - prevents thundering herd on concurrent rate limits
|
||||
export const RATE_LIMIT_DEDUP_WINDOW_MS = config?.rateLimitDedupWindowMs || 2000; // 2 seconds
|
||||
export const RATE_LIMIT_STATE_RESET_MS = config?.rateLimitStateResetMs || 120000; // 2 minutes - reset consecutive429 after inactivity
|
||||
export const FIRST_RETRY_DELAY_MS = config?.firstRetryDelayMs || 1000; // Quick 1s retry on first 429
|
||||
export const SWITCH_ACCOUNT_DELAY_MS = config?.switchAccountDelayMs || 5000; // Delay before switching accounts
|
||||
|
||||
// Gap 2: Consecutive failure tracking - extended cooldown after repeated failures
|
||||
// Consecutive failure tracking - extended cooldown after repeated failures
|
||||
export const MAX_CONSECUTIVE_FAILURES = config?.maxConsecutiveFailures || 3;
|
||||
export const EXTENDED_COOLDOWN_MS = config?.extendedCooldownMs || 60000; // 1 minute
|
||||
|
||||
// Gap 4: Capacity exhaustion - shorter retry for model capacity issues (not quota)
|
||||
export const CAPACITY_RETRY_DELAY_MS = config?.capacityRetryDelayMs || 2000; // 2 seconds
|
||||
export const MAX_CAPACITY_RETRIES = config?.maxCapacityRetries || 3;
|
||||
// Capacity exhaustion - progressive backoff tiers for model capacity issues
|
||||
export const CAPACITY_BACKOFF_TIERS_MS = config?.capacityBackoffTiersMs || [5000, 10000, 20000, 30000, 60000];
|
||||
export const MAX_CAPACITY_RETRIES = config?.maxCapacityRetries || 5;
|
||||
|
||||
// Smart backoff by error type
|
||||
export const BACKOFF_BY_ERROR_TYPE = {
|
||||
RATE_LIMIT_EXCEEDED: 30000, // 30 seconds
|
||||
MODEL_CAPACITY_EXHAUSTED: 15000, // 15 seconds
|
||||
SERVER_ERROR: 20000, // 20 seconds
|
||||
UNKNOWN: 60000 // 1 minute
|
||||
};
|
||||
|
||||
// Progressive backoff tiers for QUOTA_EXHAUSTED (60s, 5m, 30m, 2h)
|
||||
export const QUOTA_EXHAUSTED_BACKOFF_TIERS_MS = [60000, 300000, 1800000, 7200000];
|
||||
|
||||
// Minimum backoff floor to prevent "Available in 0s" loops (matches opencode-antigravity-auth)
|
||||
export const MIN_BACKOFF_MS = 2000;
|
||||
|
||||
// Thinking model constants
|
||||
export const MIN_SIGNATURE_LENGTH = 50; // Minimum valid thinking signature length
|
||||
@@ -258,10 +275,16 @@ export default {
|
||||
MAX_ACCOUNTS,
|
||||
MAX_WAIT_BEFORE_ERROR_MS,
|
||||
RATE_LIMIT_DEDUP_WINDOW_MS,
|
||||
RATE_LIMIT_STATE_RESET_MS,
|
||||
FIRST_RETRY_DELAY_MS,
|
||||
SWITCH_ACCOUNT_DELAY_MS,
|
||||
MAX_CONSECUTIVE_FAILURES,
|
||||
EXTENDED_COOLDOWN_MS,
|
||||
CAPACITY_RETRY_DELAY_MS,
|
||||
CAPACITY_BACKOFF_TIERS_MS,
|
||||
MAX_CAPACITY_RETRIES,
|
||||
BACKOFF_BY_ERROR_TYPE,
|
||||
QUOTA_EXHAUSTED_BACKOFF_TIERS_MS,
|
||||
MIN_BACKOFF_MS,
|
||||
MIN_SIGNATURE_LENGTH,
|
||||
GEMINI_MAX_OUTPUT_TOKENS,
|
||||
GEMINI_SKIP_SIGNATURE,
|
||||
|
||||
Reference in New Issue
Block a user