feat: comprehensive rate limit handling overhaul (inspired by opencode-antigravity-auth)

This commit addresses "Max retries exceeded" errors during stress testing where
all accounts would become exhausted simultaneously due to short per-second rate
limits triggering cascading failures.

## Rate Limit Parser (`rate-limit-parser.js`)
- Remove 2s buffer enforcement that caused cascading failures when API returned
  short reset times (200-600ms). Now adds 200ms buffer for sub-500ms resets
- Add `parseRateLimitReason()` for smart backoff based on error type:
  QUOTA_EXHAUSTED, RATE_LIMIT_EXCEEDED, MODEL_CAPACITY_EXHAUSTED, SERVER_ERROR

## Message/Streaming Handlers
- Add per-account+model rate limit state tracking with exponential backoff
- For short rate limits (< 1 second), wait and retry on same account instead
  of switching - prevents thundering herd when all accounts hit per-second limits
- Add throttle wait support for fallback modes (emergency/lastResort)
- Add `calculateSmartBackoff()` with progressive tiers by error type

## HybridStrategy (`hybrid-strategy.js`)
- Refactor `#getCandidates()` to return 4 fallback levels:
  - `normal`: All filters pass (health, tokens, quota)
  - `quota`: Bypass critical quota check
  - `emergency`: Bypass health check when ALL accounts unhealthy
  - `lastResort`: Bypass BOTH health AND token bucket checks
- Add throttle wait times: 500ms for lastResort, 250ms for emergency
- Fix LRU calculation to use seconds (matches opencode-antigravity-auth)

## Health Tracker
- Increase `recoveryPerHour` from 2 to 10 for faster recovery (1 hour vs 5 hours)

## Account Manager
- Add consecutive failure tracking: `getConsecutiveFailures()`,
  `incrementConsecutiveFailures()`, `resetConsecutiveFailures()`
- Add cooldown mechanism separate from rate limits with `CooldownReason`
- Reset consecutive failures on successful request

## Base Strategy
- Add `isAccountCoolingDown()` check in `isAccountUsable()`

## Constants
- Replace fixed `CAPACITY_RETRY_DELAY_MS` with progressive `CAPACITY_BACKOFF_TIERS_MS`
- Add `BACKOFF_BY_ERROR_TYPE` for smart backoff
- Add `QUOTA_EXHAUSTED_BACKOFF_TIERS_MS` for progressive quota backoff
- Add `MIN_BACKOFF_MS` floor to prevent "Available in 0s" loops
- Increase `MAX_CAPACITY_RETRIES` from 3 to 5
- Reduce `RATE_LIMIT_DEDUP_WINDOW_MS` from 5s to 2s

## Frontend
- Remove `capacityRetryDelayMs` config (replaced by progressive tiers)
- Update default `maxCapacityRetries` display from 3 to 5

## Testing
- Add `tests/stress-test.cjs` for concurrent request stress testing

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Badri Narayanan S
2026-01-24 22:43:53 +05:30
parent 71b9b001fd
commit 5a85f0cfcc
20 changed files with 869 additions and 244 deletions

View File

@@ -103,16 +103,33 @@ export const MAX_ACCOUNTS = config?.maxAccounts || 10; // From config or 10
// Rate limit wait thresholds
export const MAX_WAIT_BEFORE_ERROR_MS = config?.maxWaitBeforeErrorMs || 120000; // From config or 2 minutes
// Gap 1: Retry deduplication - prevents thundering herd on concurrent rate limits
export const RATE_LIMIT_DEDUP_WINDOW_MS = config?.rateLimitDedupWindowMs || 5000; // 5 seconds
// Retry deduplication - prevents thundering herd on concurrent rate limits
export const RATE_LIMIT_DEDUP_WINDOW_MS = config?.rateLimitDedupWindowMs || 2000; // 2 seconds
export const RATE_LIMIT_STATE_RESET_MS = config?.rateLimitStateResetMs || 120000; // 2 minutes - reset consecutive429 after inactivity
export const FIRST_RETRY_DELAY_MS = config?.firstRetryDelayMs || 1000; // Quick 1s retry on first 429
export const SWITCH_ACCOUNT_DELAY_MS = config?.switchAccountDelayMs || 5000; // Delay before switching accounts
// Gap 2: Consecutive failure tracking - extended cooldown after repeated failures
// Consecutive failure tracking - extended cooldown after repeated failures
export const MAX_CONSECUTIVE_FAILURES = config?.maxConsecutiveFailures || 3;
export const EXTENDED_COOLDOWN_MS = config?.extendedCooldownMs || 60000; // 1 minute
// Gap 4: Capacity exhaustion - shorter retry for model capacity issues (not quota)
export const CAPACITY_RETRY_DELAY_MS = config?.capacityRetryDelayMs || 2000; // 2 seconds
export const MAX_CAPACITY_RETRIES = config?.maxCapacityRetries || 3;
// Capacity exhaustion - progressive backoff tiers for model capacity issues
export const CAPACITY_BACKOFF_TIERS_MS = config?.capacityBackoffTiersMs || [5000, 10000, 20000, 30000, 60000];
export const MAX_CAPACITY_RETRIES = config?.maxCapacityRetries || 5;
// Smart backoff by error type
export const BACKOFF_BY_ERROR_TYPE = {
RATE_LIMIT_EXCEEDED: 30000, // 30 seconds
MODEL_CAPACITY_EXHAUSTED: 15000, // 15 seconds
SERVER_ERROR: 20000, // 20 seconds
UNKNOWN: 60000 // 1 minute
};
// Progressive backoff tiers for QUOTA_EXHAUSTED (60s, 5m, 30m, 2h)
export const QUOTA_EXHAUSTED_BACKOFF_TIERS_MS = [60000, 300000, 1800000, 7200000];
// Minimum backoff floor to prevent "Available in 0s" loops (matches opencode-antigravity-auth)
export const MIN_BACKOFF_MS = 2000;
// Thinking model constants
export const MIN_SIGNATURE_LENGTH = 50; // Minimum valid thinking signature length
@@ -258,10 +275,16 @@ export default {
MAX_ACCOUNTS,
MAX_WAIT_BEFORE_ERROR_MS,
RATE_LIMIT_DEDUP_WINDOW_MS,
RATE_LIMIT_STATE_RESET_MS,
FIRST_RETRY_DELAY_MS,
SWITCH_ACCOUNT_DELAY_MS,
MAX_CONSECUTIVE_FAILURES,
EXTENDED_COOLDOWN_MS,
CAPACITY_RETRY_DELAY_MS,
CAPACITY_BACKOFF_TIERS_MS,
MAX_CAPACITY_RETRIES,
BACKOFF_BY_ERROR_TYPE,
QUOTA_EXHAUSTED_BACKOFF_TIERS_MS,
MIN_BACKOFF_MS,
MIN_SIGNATURE_LENGTH,
GEMINI_MAX_OUTPUT_TOKENS,
GEMINI_SKIP_SIGNATURE,