feat: comprehensive rate limit handling overhaul (inspired by opencode-antigravity-auth)

This commit addresses "Max retries exceeded" errors during stress testing where
all accounts would become exhausted simultaneously due to short per-second rate
limits triggering cascading failures.

## Rate Limit Parser (`rate-limit-parser.js`)
- Remove 2s buffer enforcement that caused cascading failures when API returned
  short reset times (200-600ms). Now adds 200ms buffer for sub-500ms resets
- Add `parseRateLimitReason()` for smart backoff based on error type:
  QUOTA_EXHAUSTED, RATE_LIMIT_EXCEEDED, MODEL_CAPACITY_EXHAUSTED, SERVER_ERROR

## Message/Streaming Handlers
- Add per-account+model rate limit state tracking with exponential backoff
- For short rate limits (< 1 second), wait and retry on same account instead
  of switching - prevents thundering herd when all accounts hit per-second limits
- Add throttle wait support for fallback modes (emergency/lastResort)
- Add `calculateSmartBackoff()` with progressive tiers by error type

## HybridStrategy (`hybrid-strategy.js`)
- Refactor `#getCandidates()` to return 4 fallback levels:
  - `normal`: All filters pass (health, tokens, quota)
  - `quota`: Bypass critical quota check
  - `emergency`: Bypass health check when ALL accounts unhealthy
  - `lastResort`: Bypass BOTH health AND token bucket checks
- Add throttle wait times: 500ms for lastResort, 250ms for emergency
- Fix LRU calculation to use seconds (matches opencode-antigravity-auth)

## Health Tracker
- Increase `recoveryPerHour` from 2 to 10 for faster recovery (1 hour vs 5 hours)

## Account Manager
- Add consecutive failure tracking: `getConsecutiveFailures()`,
  `incrementConsecutiveFailures()`, `resetConsecutiveFailures()`
- Add cooldown mechanism separate from rate limits with `CooldownReason`
- Reset consecutive failures on successful request

## Base Strategy
- Add `isAccountCoolingDown()` check in `isAccountUsable()`

## Constants
- Replace fixed `CAPACITY_RETRY_DELAY_MS` with progressive `CAPACITY_BACKOFF_TIERS_MS`
- Add `BACKOFF_BY_ERROR_TYPE` for smart backoff
- Add `QUOTA_EXHAUSTED_BACKOFF_TIERS_MS` for progressive quota backoff
- Add `MIN_BACKOFF_MS` floor to prevent "Available in 0s" loops
- Increase `MAX_CAPACITY_RETRIES` from 3 to 5
- Reduce `RATE_LIMIT_DEDUP_WINDOW_MS` from 5s to 2s

## Frontend
- Remove `capacityRetryDelayMs` config (replaced by progressive tiers)
- Update default `maxCapacityRetries` display from 3 to 5

## Testing
- Add `tests/stress-test.cjs` for concurrent request stress testing

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Badri Narayanan S
2026-01-24 22:43:53 +05:30
parent 71b9b001fd
commit 5a85f0cfcc
20 changed files with 869 additions and 244 deletions

View File

@@ -1226,47 +1226,23 @@
x-text="$store.global.t('extendedCooldownDesc')">Applied after max consecutive failures.</p>
</div>
<div class="form-control">
<label class="label pt-0">
<span class="label-text text-gray-400 text-xs"
x-text="$store.global.t('capacityRetryDelay')">Capacity Retry Delay</span>
<span class="label-text-alt font-mono text-neon-cyan text-xs font-semibold"
x-text="Math.round((serverConfig.capacityRetryDelayMs || 2000) / 1000) + 's'"></span>
</label>
<div class="flex gap-3 items-center">
<input type="range" min="500" max="10000" step="500"
class="custom-range custom-range-cyan flex-1"
:value="serverConfig.capacityRetryDelayMs || 2000"
:style="`background-size: ${((serverConfig.capacityRetryDelayMs || 2000) - 500) / 95}% 100%`"
@input="toggleCapacityRetryDelayMs($event.target.value)"
aria-label="Capacity retry delay slider">
<input type="number" min="500" max="10000" step="500"
class="input input-xs input-bordered w-20 bg-space-800 border-space-border text-white font-mono text-center"
:value="serverConfig.capacityRetryDelayMs || 2000"
@change="toggleCapacityRetryDelayMs($event.target.value)"
aria-label="Capacity retry delay value">
</div>
<p class="text-[9px] text-gray-600 mt-1 leading-tight"
x-text="$store.global.t('capacityRetryDelayDesc')">Delay for capacity (not quota) issues.</p>
</div>
<div class="form-control">
<label class="label pt-0">
<span class="label-text text-gray-400 text-xs"
x-text="$store.global.t('maxCapacityRetries')">Max Capacity Retries</span>
<span class="label-text-alt font-mono text-neon-cyan text-xs font-semibold"
x-text="serverConfig.maxCapacityRetries || 3"></span>
x-text="serverConfig.maxCapacityRetries || 5"></span>
</label>
<div class="flex gap-3 items-center">
<input type="range" min="1" max="10" step="1"
class="custom-range custom-range-cyan flex-1"
:value="serverConfig.maxCapacityRetries || 3"
:style="`background-size: ${((serverConfig.maxCapacityRetries || 3) - 1) / 0.09}% 100%`"
:value="serverConfig.maxCapacityRetries || 5"
:style="`background-size: ${((serverConfig.maxCapacityRetries || 5) - 1) / 0.09}% 100%`"
@input="toggleMaxCapacityRetries($event.target.value)"
aria-label="Max capacity retries slider">
<input type="number" min="1" max="10" step="1"
class="input input-xs input-bordered w-16 bg-space-800 border-space-border text-white font-mono text-center"
:value="serverConfig.maxCapacityRetries || 3"
:value="serverConfig.maxCapacityRetries || 5"
@change="toggleMaxCapacityRetries($event.target.value)"
aria-label="Max capacity retries value">
</div>