Add model validation cache with 5-minute TTL to reject invalid model IDs
upfront instead of sending them to the API. This provides better error
messages and avoids unnecessary API calls.
- Add MODEL_VALIDATION_CACHE_TTL_MS constant (5 min)
- Add isValidModel() with lazy cache population
- Warm cache when listModels() is called
- Validate model ID in /v1/messages before processing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
400 errors (INVALID_ARGUMENT) are client errors that won't be fixed by
switching accounts. Previously the proxy would cycle through all accounts
before returning the error. Now it fails immediately.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use startsWith() for count_tokens URL check to match requests with
query parameters (e.g., ?beta=true).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Make mode detection more robust (handle ::1, 0.0.0.0)
- Add getProxyPort() to parse port from ANTHROPIC_BASE_URL dynamically
- Add i18n translation keys for mode toggle in all 5 languages
- Update settings.html to use translation keys and dynamic port
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved merge conflicts in public/views/settings.html:
- Fixed HTML entity escaping for quote characters in presetHint text
- Fixed HTML entity escaping for pendingPresetName text
- Update src/index.js to use HOST environment variable for main server
- Update src/auth/oauth.js to use HOST environment variable for OAuth callback server
- Add diagnostic logging to show actual bound address on startup
- Update startup banner to reflect correct host URL
Co-Authored-By: Claude (gemini-3-flash[1m]) <noreply@anthropic.com>
- Add getPackageVersion() utility function in src/utils/helpers.js
- Display version in CLI startup banner (e.g., "v2.4.2")
- Display version in WebUI navbar next to "CLAUDE PROXY SYSTEM"
- Refactor WebUI to use shared getPackageVersion() utility
- Update footer with GitHub icon and link
- Changed logging middleware to use req.originalUrl instead of req.path,
which was mangled by Express wildcard catch-all path stripping
- Suppress Chrome DevTools /.well-known/ requests from logs (debug-only)
Claude Code CLI sends cache_control on text, thinking, tool_use, and
tool_result blocks for prompt caching. Cloud Code API rejects these
with "Extra inputs are not permitted".
- Add cleanCacheControl() to proactively strip cache_control at pipeline entry
- Add sanitizeTextBlock() and sanitizeToolUseBlock() for defense-in-depth
- Update reorderAssistantContent() to use block sanitizers
- Add test-cache-control.cjs with multi-model test coverage
- Update frontend dashboard tests to match current UI design
- Update strategy tests to match v2.4.0 fallback behavior
- Update CLAUDE.md and README.md with recent features
Inspired by Antigravity-Manager's clean_cache_control_from_messages() pattern.
Co-Authored-By: Claude <noreply@anthropic.com>
This commit addresses "Max retries exceeded" errors during stress testing where
all accounts would become exhausted simultaneously due to short per-second rate
limits triggering cascading failures.
## Rate Limit Parser (`rate-limit-parser.js`)
- Remove 2s buffer enforcement that caused cascading failures when API returned
short reset times (200-600ms). Now adds 200ms buffer for sub-500ms resets
- Add `parseRateLimitReason()` for smart backoff based on error type:
QUOTA_EXHAUSTED, RATE_LIMIT_EXCEEDED, MODEL_CAPACITY_EXHAUSTED, SERVER_ERROR
## Message/Streaming Handlers
- Add per-account+model rate limit state tracking with exponential backoff
- For short rate limits (< 1 second), wait and retry on same account instead
of switching - prevents thundering herd when all accounts hit per-second limits
- Add throttle wait support for fallback modes (emergency/lastResort)
- Add `calculateSmartBackoff()` with progressive tiers by error type
## HybridStrategy (`hybrid-strategy.js`)
- Refactor `#getCandidates()` to return 4 fallback levels:
- `normal`: All filters pass (health, tokens, quota)
- `quota`: Bypass critical quota check
- `emergency`: Bypass health check when ALL accounts unhealthy
- `lastResort`: Bypass BOTH health AND token bucket checks
- Add throttle wait times: 500ms for lastResort, 250ms for emergency
- Fix LRU calculation to use seconds (matches opencode-antigravity-auth)
## Health Tracker
- Increase `recoveryPerHour` from 2 to 10 for faster recovery (1 hour vs 5 hours)
## Account Manager
- Add consecutive failure tracking: `getConsecutiveFailures()`,
`incrementConsecutiveFailures()`, `resetConsecutiveFailures()`
- Add cooldown mechanism separate from rate limits with `CooldownReason`
- Reset consecutive failures on successful request
## Base Strategy
- Add `isAccountCoolingDown()` check in `isAccountUsable()`
## Constants
- Replace fixed `CAPACITY_RETRY_DELAY_MS` with progressive `CAPACITY_BACKOFF_TIERS_MS`
- Add `BACKOFF_BY_ERROR_TYPE` for smart backoff
- Add `QUOTA_EXHAUSTED_BACKOFF_TIERS_MS` for progressive quota backoff
- Add `MIN_BACKOFF_MS` floor to prevent "Available in 0s" loops
- Increase `MAX_CAPACITY_RETRIES` from 3 to 5
- Reduce `RATE_LIMIT_DEDUP_WINDOW_MS` from 5s to 2s
## Frontend
- Remove `capacityRetryDelayMs` config (replaced by progressive tiers)
- Update default `maxCapacityRetries` display from 3 to 5
## Testing
- Add `tests/stress-test.cjs` for concurrent request stress testing
Co-Authored-By: Claude <noreply@anthropic.com>
* feat: add manual OAuth flow support in WebUI
* fix: reset add account modal state on close
* feat: display custom API key in startup banner
* fix: move translations to separate files and optimize import API
* fix: remove orphaned model-manager.js and cleanup callback server on manual auth
---------
Co-authored-by: Badri Narayanan S <59133612+badrisnarayanan@users.noreply.github.com>
When all accounts are rate-limited or token-exhausted, the retry loop
was incorrectly counting the wait time as a failed attempt. This caused
premature "Max retries exceeded" errors when we were just patiently
waiting for accounts to become available.
- Add attempt-- after sleeping for rate limits or strategy waits
- Add #diagnoseNoCandidates() to hybrid strategy for better logging
- Add getTimeUntilNextToken() and getMinTimeUntilToken() to token tracker
- Return waitMs from hybrid strategy when all accounts are token-blocked
Co-Authored-By: Claude <noreply@anthropic.com>
The POST /api/config endpoint was missing validation for several
Advanced Server Settings fields, causing the sliders to fail silently.
Added support for: rateLimitDedupWindowMs, maxConsecutiveFailures,
extendedCooldownMs, capacityRetryDelayMs, maxCapacityRetries.
Fixes#181
Co-Authored-By: Claude <noreply@anthropic.com>
The hybrid strategy now considers account quota levels when selecting
accounts, preventing any single account from being drained to 0%.
- Add QuotaTracker class to track per-account quota levels
- Exclude accounts with critical quota (<5%) from selection
- Add quota component to scoring formula (weight: 3)
- Fall back to critical accounts when no alternatives exist
- Add 18 new tests for quota-aware selection
Scoring formula: Health×2 + Tokens×5 + Quota×3 + LRU×0.1
An attempt at resolving badrisnarayanan/antigravity-claude-proxy#171
Adds `maxAccounts` configuration parameter to control the maximum number of Google accounts.
**Changes:**
- New config field `maxAccounts` (default: 10, range: 1-100)
- Settings page: slider control for adjusting limit
- Accounts page: counter badge (e.g., "8/10") with visual feedback
- Add button disabled when limit reached
- Server-side validation on account creation
**Breaking Changes:** None
Adds logging when 403/404 errors trigger endpoint fallback (daily → prod).
The retry behavior was already working but silently - now it's visible.
Matches opencode-antigravity-auth error handling behavior.
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes regression where /v1/models returned 500 error because
pickNext() method was removed in v2.2.x refactor.
Fixes#164
Co-Authored-By: Claude <noreply@anthropic.com>
paidTier values like g1-pro-tier and g1-ultra-tier are rejected by
the onboardUser API with 400 INVALID_ARGUMENT. This matches the
opencode-antigravity-auth reference which only uses allowedTiers.
Co-Authored-By: Claude <noreply@anthropic.com>
- Store project IDs in composite refresh token format (refreshToken|projectId|managedProjectId)
- Add parseRefreshParts() and formatRefreshParts() for token handling
- Extract and persist subscription tier during project discovery
- Fetch subscription in blocking mode when missing from cached accounts
- Fix conditional duetProject setting to match reference implementation
- Export parseTierId() for reuse across modules
Co-Authored-By: Claude <noreply@anthropic.com>
Consolidate strategy configuration by moving STRATEGY_LABELS from
strategies/index.js to constants.js. Update startup banner to
dynamically display strategy options from SELECTION_STRATEGIES.
Co-Authored-By: Claude <noreply@anthropic.com>
Refactor account selection into a strategy pattern with three options:
- Sticky: cache-optimized, stays on same account until rate-limited
- Round-robin: load-balanced, rotates every request
- Hybrid (default): smart distribution using health scores, token buckets, and LRU
The hybrid strategy uses multiple signals for optimal account selection:
health tracking for reliability, client-side token buckets for rate limiting,
and LRU freshness to prefer rested accounts.
Includes WebUI settings for strategy selection and unit tests.
Co-Authored-By: Claude <noreply@anthropic.com>
Set functionCallingConfig.mode = 'VALIDATED' when using Claude models
to ensure strict parameter validation, matching opencode-antigravity-auth.
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: preserve whitespace-only chunks in SSE stream
Fixes issue #138 where Claude models would swallow spaces between words because whitespace-only chunks (e.g., " ") were being filtered out as empty.
Changes:
- Modified sse-streamer.js to only skip truly empty strings (""), preserving strings that contain only whitespace.
- Added regression test case in tests/test-streaming-whitespace.cjs to verify whitespace preservation.
* test: add streaming whitespace regression test to main suite
---------
Co-authored-by: walczak <walczak@ial.ruhr>
Some accounts have paidTier.id = "free-tier" but allowedTiers includes
"standard-tier", meaning they can use Pro-level quotas even though their
Google One subscription doesn't grant Antigravity benefits.
New priority:
1. paidTier - if Pro/Ultra, use immediately
2. allowedTiers - if paidTier is free, check for higher tiers
3. currentTier - fallback if still unknown
4. allowedTiers default - final fallback
This fixes accounts that show as "free" but actually have Pro access
through programs other than Google One AI.
Co-Authored-By: Claude <noreply@anthropic.com>
- Use raw API tier IDs (e.g., 'free-tier', 'g1-pro-tier') consistently
- Pass DEFAULT_PROJECT_ID for non-free tiers during onboarding
- Fix free tier detection to use .includes('free') instead of exact match
- Upgrade onboarding error logs from debug to warn for visibility
- Add debug logging for onboarding request parameters
Fixes onboarding failures for Pro/Ultra accounts that were previously
failing because the API requires a project ID for paid tier onboarding.
Co-Authored-By: Claude <noreply@anthropic.com>
- Use parseTierId() consistently for all tier sources (paidTier, currentTier, allowedTiers)
- Add tierSource tracking to identify which field tier was detected from
- Add verbose debug logging to show all tier-related fields from loadCodeAssist API
- Fix onboarding to prioritize paidTier > currentTier > allowedTiers (consistent with model-api.js)
This helps diagnose issues where Pro/Ultra accounts are incorrectly detected as free.
Fixes#128
Co-Authored-By: Claude <noreply@anthropic.com>
When all accounts are rate-limited for > 2 minutes, now attempts
model fallback (if enabled) before throwing the error. This allows
quota-exhausted accounts to gracefully fall back to alternate models.
Co-Authored-By: Claude <noreply@anthropic.com>
- Pass project ID to fetchAvailableModels for accurate per-project quota
- Treat missing remainingFraction with resetTime as 0% (exhausted)
- Fix double-escaped regex in rate-limit-parser.js (\\d -> \d)
- Use ANTIGRAVITY_HEADERS for loadCodeAssist consistency
- Store actual reset time from API instead of capping at default
- Add getRateLimitInfo() for detailed rate limit state
- Handle disabled accounts in rate limit checks
Fixes issue where free tier accounts showed 100% quota but were actually exhausted.
Co-Authored-By: Claude <noreply@anthropic.com>