Commit Graph

38 Commits

Author SHA1 Message Date
Badri Narayanan S
90b38bbb56 feat: validate model IDs before processing requests
Add model validation cache with 5-minute TTL to reject invalid model IDs
upfront instead of sending them to the API. This provides better error
messages and avoids unnecessary API calls.

- Add MODEL_VALIDATION_CACHE_TTL_MS constant (5 min)
- Add isValidModel() with lazy cache population
- Warm cache when listModels() is called
- Validate model ID in /v1/messages before processing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:07:23 +05:30
Badri Narayanan S
2ab0c7943d fix: fail immediately on 400 errors instead of cycling accounts
400 errors (INVALID_ARGUMENT) are client errors that won't be fixed by
switching accounts. Previously the proxy would cycle through all accounts
before returning the error. Now it fails immediately.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 18:36:07 +05:30
Badri Narayanan S
5a85f0cfcc feat: comprehensive rate limit handling overhaul (inspired by opencode-antigravity-auth)
This commit addresses "Max retries exceeded" errors during stress testing where
all accounts would become exhausted simultaneously due to short per-second rate
limits triggering cascading failures.

## Rate Limit Parser (`rate-limit-parser.js`)
- Remove 2s buffer enforcement that caused cascading failures when API returned
  short reset times (200-600ms). Now adds 200ms buffer for sub-500ms resets
- Add `parseRateLimitReason()` for smart backoff based on error type:
  QUOTA_EXHAUSTED, RATE_LIMIT_EXCEEDED, MODEL_CAPACITY_EXHAUSTED, SERVER_ERROR

## Message/Streaming Handlers
- Add per-account+model rate limit state tracking with exponential backoff
- For short rate limits (< 1 second), wait and retry on same account instead
  of switching - prevents thundering herd when all accounts hit per-second limits
- Add throttle wait support for fallback modes (emergency/lastResort)
- Add `calculateSmartBackoff()` with progressive tiers by error type

## HybridStrategy (`hybrid-strategy.js`)
- Refactor `#getCandidates()` to return 4 fallback levels:
  - `normal`: All filters pass (health, tokens, quota)
  - `quota`: Bypass critical quota check
  - `emergency`: Bypass health check when ALL accounts unhealthy
  - `lastResort`: Bypass BOTH health AND token bucket checks
- Add throttle wait times: 500ms for lastResort, 250ms for emergency
- Fix LRU calculation to use seconds (matches opencode-antigravity-auth)

## Health Tracker
- Increase `recoveryPerHour` from 2 to 10 for faster recovery (1 hour vs 5 hours)

## Account Manager
- Add consecutive failure tracking: `getConsecutiveFailures()`,
  `incrementConsecutiveFailures()`, `resetConsecutiveFailures()`
- Add cooldown mechanism separate from rate limits with `CooldownReason`
- Reset consecutive failures on successful request

## Base Strategy
- Add `isAccountCoolingDown()` check in `isAccountUsable()`

## Constants
- Replace fixed `CAPACITY_RETRY_DELAY_MS` with progressive `CAPACITY_BACKOFF_TIERS_MS`
- Add `BACKOFF_BY_ERROR_TYPE` for smart backoff
- Add `QUOTA_EXHAUSTED_BACKOFF_TIERS_MS` for progressive quota backoff
- Add `MIN_BACKOFF_MS` floor to prevent "Available in 0s" loops
- Increase `MAX_CAPACITY_RETRIES` from 3 to 5
- Reduce `RATE_LIMIT_DEDUP_WINDOW_MS` from 5s to 2s

## Frontend
- Remove `capacityRetryDelayMs` config (replaced by progressive tiers)
- Update default `maxCapacityRetries` display from 3 to 5

## Testing
- Add `tests/stress-test.cjs` for concurrent request stress testing

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-24 22:43:53 +05:30
Badri Narayanan S
0fa945b069 fix: don't count rate limit waits as failed retry attempts
When all accounts are rate-limited or token-exhausted, the retry loop
was incorrectly counting the wait time as a failed attempt. This caused
premature "Max retries exceeded" errors when we were just patiently
waiting for accounts to become available.

- Add attempt-- after sleeping for rate limits or strategy waits
- Add #diagnoseNoCandidates() to hybrid strategy for better logging
- Add getTimeUntilNextToken() and getMinTimeUntilToken() to token tracker
- Return waitMs from hybrid strategy when all accounts are token-blocked

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-23 14:29:24 +05:30
Badri Narayanan S
11f135ef32 fix: simplify 403/404 endpoint fallback log messages
Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-20 23:41:17 +05:30
Badri Narayanan S
8d8f170f18 fix: add explicit logging for 403/404 endpoint fallback
Adds logging when 403/404 errors trigger endpoint fallback (daily → prod).
The retry behavior was already working but silently - now it's visible.
Matches opencode-antigravity-auth error handling behavior.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-20 23:33:44 +05:30
Badri Narayanan S
2175118f9f feat: align project discovery with opencode-antigravity-auth reference
- Store project IDs in composite refresh token format (refreshToken|projectId|managedProjectId)
- Add parseRefreshParts() and formatRefreshParts() for token handling
- Extract and persist subscription tier during project discovery
- Fetch subscription in blocking mode when missing from cached accounts
- Fix conditional duetProject setting to match reference implementation
- Export parseTierId() for reuse across modules

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-19 14:21:30 +05:30
Badri Narayanan S
5ae19a5b72 feat: add configurable account selection strategies
Refactor account selection into a strategy pattern with three options:
- Sticky: cache-optimized, stays on same account until rate-limited
- Round-robin: load-balanced, rotates every request
- Hybrid (default): smart distribution using health scores, token buckets, and LRU

The hybrid strategy uses multiple signals for optimal account selection:
health tracking for reliability, client-side token buckets for rate limiting,
and LRU freshness to prefer rested accounts.

Includes WebUI settings for strategy selection and unit tests.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-18 03:48:43 +05:30
Marvin
480b4a0bc1 fix: preserve whitespace-only chunks in SSE stream (#139)
* fix: preserve whitespace-only chunks in SSE stream

Fixes issue #138 where Claude models would swallow spaces between words because whitespace-only chunks (e.g., " ") were being filtered out as empty.

Changes:
- Modified sse-streamer.js to only skip truly empty strings (""), preserving strings that contain only whitespace.
- Added regression test case in tests/test-streaming-whitespace.cjs to verify whitespace preservation.

* test: add streaming whitespace regression test to main suite

---------

Co-authored-by: walczak <walczak@ial.ruhr>
2026-01-17 12:16:28 +05:30
Badri Narayanan S
2516e90d06 Revert "fix: check allowedTiers when paidTier is free for tier detection"
This reverts commit 11a1879fc7.
2026-01-15 21:26:08 +05:30
Badri Narayanan S
11a1879fc7 fix: check allowedTiers when paidTier is free for tier detection
Some accounts have paidTier.id = "free-tier" but allowedTiers includes
"standard-tier", meaning they can use Pro-level quotas even though their
Google One subscription doesn't grant Antigravity benefits.

New priority:
1. paidTier - if Pro/Ultra, use immediately
2. allowedTiers - if paidTier is free, check for higher tiers
3. currentTier - fallback if still unknown
4. allowedTiers default - final fallback

This fixes accounts that show as "free" but actually have Pro access
through programs other than Google One AI.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-15 21:23:19 +05:30
Badri Narayanan S
9809337190 fix: improve subscription tier detection with paidTier priority and verbose logging
- Use parseTierId() consistently for all tier sources (paidTier, currentTier, allowedTiers)
- Add tierSource tracking to identify which field tier was detected from
- Add verbose debug logging to show all tier-related fields from loadCodeAssist API
- Fix onboarding to prioritize paidTier > currentTier > allowedTiers (consistent with model-api.js)

This helps diagnose issues where Pro/Ultra accounts are incorrectly detected as free.
Fixes #128

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-15 20:08:30 +05:30
Badri Narayanan S
2a0c110f9b fix: try model fallback before throwing RESOURCE_EXHAUSTED error
When all accounts are rate-limited for > 2 minutes, now attempts
model fallback (if enabled) before throwing the error. This allows
quota-exhausted accounts to gracefully fall back to alternate models.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-15 16:41:02 +05:30
Badri Narayanan S
77363c679e fix: accurate quota reporting with project ID and improved rate limit handling
- Pass project ID to fetchAvailableModels for accurate per-project quota
- Treat missing remainingFraction with resetTime as 0% (exhausted)
- Fix double-escaped regex in rate-limit-parser.js (\\d -> \d)
- Use ANTIGRAVITY_HEADERS for loadCodeAssist consistency
- Store actual reset time from API instead of capping at default
- Add getRateLimitInfo() for detailed rate limit state
- Handle disabled accounts in rate limit checks

Fixes issue where free tier accounts showed 100% quota but were actually exhausted.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-15 16:18:13 +05:30
Badri Narayanan S
896bf81a36 revert: remove count_tokens endpoint (caused regression)
Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-14 23:43:16 +05:30
Badri Narayanan S
dee7512bd8 fix: improve subscription tier detection for Pro accounts
The tier detection was incorrectly showing Pro accounts as "free" due to:
1. paidTier field being flaky/missing from some API responses
2. standard-tier not being recognized as a Pro tier
3. No fallback to allowedTiers when currentTier is missing

Changes:
- Add parseTierId() helper to centralize tier ID parsing
- Recognize "standard-tier" as Pro (Gemini Code Assist paid tier)
- Add fallback chain: paidTier > currentTier > allowedTiers
- Return "unknown" instead of incorrectly defaulting to "free"

Fixes #121

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-14 21:27:24 +05:30
behemoth-phucnm
d33de409d4 docs: fix misleading tokenizer comments 2026-01-14 19:31:43 +07:00
minhphuc429
7da7e887bf feat: use official tokenizers for 99.99% accuracy
Replace gpt-tokenizer with model-specific official tokenizers:
- Claude models: @anthropic-ai/tokenizer (official Anthropic tokenizer)
- Gemini models: @lenml/tokenizer-gemini (GemmaTokenizer)

Changes:
- Add @anthropic-ai/tokenizer and @lenml/tokenizer-gemini dependencies
- Remove gpt-tokenizer dependency
- Update count-tokens.js with model-aware tokenization
- Use getModelFamily() to select appropriate tokenizer
- Lazy-load Gemini tokenizer (138MB) on first use
- Default to local estimation for all content types (no API calls)

Tested with all supported models:
- claude-sonnet-4-5, claude-opus-4-5-thinking, claude-sonnet-4-5-thinking
- gemini-3-flash, gemini-3-pro-low, gemini-3-pro-high
2026-01-14 16:04:13 +07:00
minhphuc429
2bdecf6e96 fix: ensure account manager initialized for count_tokens
- Add ensureInitialized() call before count_tokens handler
- Use hybrid approach: local estimation for text, API for images/docs
- This prevents "No accounts available" error on first request
2026-01-14 15:43:25 +07:00
minhphuc429
df81ba5632 feat: use API-based token counting for 100% accuracy
Switch from local estimation (gpt-tokenizer) to API-based counting
via Google Cloud Code API for accurate token counts. Falls back to
local estimation if API call fails.
2026-01-14 15:36:47 +07:00
minhphuc429
acc228b920 feat: implement /v1/messages/count_tokens endpoint
Add Anthropic-compatible token counting endpoint using hybrid approach:
- Local estimation with gpt-tokenizer for text content (~95% accuracy)
- API-based counting for complex content (images, documents)
- Automatic fallback to local estimation on API errors

This resolves warnings in LiteLLM and other clients that rely on
pre-request token counting.
2026-01-14 15:32:27 +07:00
Badri Narayanan S
70fd1baaa8 fix: improve loadCodeAssist for Google One AI Pro accounts
- Add separate LOAD_CODE_ASSIST_ENDPOINTS (prod first) and
  LOAD_CODE_ASSIST_HEADERS (google-api-nodejs-client User-Agent)
- Add duetProject to metadata for project discovery
- Silent fallback when API returns success but no project
  (matches opencode-antigravity-auth behavior)
- Only warn when all endpoints fail with actual errors

Fixes #114, addresses discussion #113

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 18:11:45 +05:30
liuwuyu118
860c0d6c2d fix: add image response support
Convert Google's inlineData format to Anthropic's image format:
- response-converter.js: Handle inlineData in non-streaming responses
- sse-parser.js: Parse inlineData for thinking models
- sse-streamer.js: Stream inlineData as image content blocks

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 11:17:17 +08:00
Badri Narayanan S
325acdba8c fix: preserve tool_use stop reason from being overwritten by finishReason
When a tool call is made, stopReason is set to 'tool_use'. However, when
finishReason: STOP arrives later, it was overwriting stopReason back to
'end_turn', breaking multi-turn tool conversations in clients like OpenCode.

Fix: Initialize stopReason to null and only set it from finishReason if
not already set. This ensures tool_use is preserved once detected.

Fixes #96

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-11 11:52:35 +05:30
Badri Narayanan S
6868bf217c Merge pull request #47 from Wha1eChai/feature/webui
feat: Add Web UI for account and quota management
2026-01-10 22:21:57 +05:30
Tiago Rodrigues
0b477c2552 feat: fallback to alternate model when max retries exceeded with 5xx errors
When all accounts fail with HTTP 500/503 errors (e.g., Google API returning
'Unknown Error' for Claude models on large conversations), the proxy now
attempts to use a fallback model if --fallback is enabled.

This enables graceful degradation when:
- All accounts are exhausted due to 5xx errors (not just rate limits)
- Claude models fail on very large conversations
- The API has temporary issues with specific models

The fallback uses the existing MODEL_FALLBACK_MAP configuration:
- claude-opus-4-5-thinking -> gemini-3-pro-high
- claude-sonnet-4-5-thinking -> gemini-3-flash

Relates to #88
2026-01-10 12:08:53 +00:00
Wha1eChai
ee6d222e4d feat(webui): add subscription tier and quota visualization
Backend:
2026-01-10 06:05:17 +08:00
Badri Narayanan S
4c5236d4b3 fix: filter Antigravity system prompt from model responses
- Add [ignore] tags around system instruction to prevent model from
  identifying as "Antigravity" when asked "Who are you?"
- Replace full system instruction with minimal version used by
  CLIProxyAPI/gcli2api to reduce token usage and improve response quality

Fixes #76

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-09 14:10:41 +05:30
SvDp
45755bfa18 fix: add optimistic reset for transient 429 rate limit errors
Fixes issue #71 - 'No accounts available' error when API returns 429

The Google Cloud Code API can return 429 RESOURCE_EXHAUSTED errors even
when accounts have quota available due to:
- Temporary API load/throttling
- Per-minute request limits (not per-day quota)
- Transient backend issues

This fix adds:
1. A 500ms buffer after waiting for rate limits to expire
2. Optimistic rate limit reset when all accounts appear stuck
3. Retry logic that clears rate limits and tries again

The fix works in conjunction with the server-level optimistic reset
that already exists, providing multiple layers of protection against
false 'No accounts available' errors.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-08 21:31:50 +05:30
Badri Narayanan S
a696ed0872 Merge pull request #64 from BrunoMarc/fix/empty-response-retry
fix: add retry mechanism for empty API responses
2026-01-08 17:38:18 +05:30
Badri Narayanan S
f34aa50ba4 Antigravity compatibility to fix antigravity usage 2026-01-08 10:24:54 +05:30
BrunoMarc
1c80c8ba52 fix: address second round code review feedback
Issues found by Claude Opus 4.5 + Gemini 3 Pro:

HIGH PRIORITY FIXES:
- Mark account rate-limited when 429 occurs during retry (was losing resetMs)
- Add exponential backoff between retries (500ms, 1000ms, 2000ms)
- Fix 5xx handling: don't pass error response to streamer, refetch instead
- Use recognizable error messages (429/401) for isRateLimitError/isAuthError

MEDIUM PRIORITY FIXES:
- Refactor while loop to for loop for clearer retry semantics
- Simplify logic flow with early returns

Code review by: Claude Opus 4.5 + Gemini 3 Pro Preview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 18:21:25 -03:00
BrunoMarc
05cd80ebb5 fix: address code review feedback
- Move MAX_EMPTY_RESPONSE_RETRIES to constants.js for consistency
- Handle 429/401/5xx errors properly during retry fetch
- Use proper message ID format (crypto.randomBytes) instead of Date.now()
- Add crypto import for UUID generation

Code review by: Gemini 3 Pro Preview + Claude Opus 4.5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 18:16:09 -03:00
BrunoMarc
49480847b6 fix: add retry mechanism for empty API responses
When Claude Code sends requests with large thinking_budget values,
the model may spend all tokens on "thinking" and return empty responses,
causing Claude Code to stop mid-conversation.

This commit adds a retry mechanism that:
- Throws EmptyResponseError instead of emitting fake message on empty response
- Retries up to 2 times before giving up
- Emits fallback message only after all retries are exhausted

Changes:
- src/errors.js: Added EmptyResponseError class and isEmptyResponseError()
- src/cloudcode/sse-streamer.js: Throw error instead of yielding fake message
- src/cloudcode/streaming-handler.js: Added retry loop with fallback

Tested for 6+ hours with 1,884 API requests and 88% recovery rate
on empty responses.

Fixes #61

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 18:11:03 -03:00
Badri Narayanan S
ac9ec6b358 Signature handling for fallback 2026-01-03 22:01:57 +05:30
Badri Narayanan S
df6625b531 fallback changes from PR #35 2026-01-03 18:01:21 +05:30
Badri Narayanan S
9c4a712a9a Selective fixes from PR #35: Model-specific rate limits & robustness improvements (#37)
* feat: apply local user changes and fixes

* ;D

* Implement OpenAI support, model-specific rate limiting, and robustness fixes

* docs: update pr title

* feat: ensure unique openai models endpoint

* fix: startup banner alignment and removed duplicates

* feat: add model fallback system with --fallback flag

* fix: accounts cli hanging after completion

* feat: add exit option to accounts cli menu

* fix: remove circular dependency warning for fallback flag

* feat: show active modes in banner and hide their flags

* Remove OpenAI compatibility and fallback features from PR #35

Cherry-picked selective fixes from PR #35 while removing:
- OpenAI-compatible API endpoints (/openai/v1/*)
- Model fallback system (fallback-config.js)
- Thinking block skip for Gemini models
- Unnecessary files (pullrequest.md, test-fix.js, test-openai.js)

Retained improvements:
- Network error handling with retry logic
- Model-specific rate limiting
- Enhanced health check with quota info
- CLI fixes (exit option, process.exit)
- Startup banner alignment (debug mode only)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* banner alignment fix

* Refactor: Model-specific rate limits and cleanup deprecated code

- Remove global rate limit fields (isRateLimited, rateLimitResetTime) in favor of model-specific limits (modelRateLimits[modelId])
- Remove deprecated wrapper functions (is429Error, isAuthInvalidError) from handlers
- Filter fetchAvailableModels to only return Claude and Gemini models
- Fix getCurrentStickyAccount() to pass model param after waiting
- Update /account-limits endpoint to show model-specific limits
- Remove multi-account OAuth flow to avoid state mismatch errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: show (x/y) limited status in account-limits table

- Status is now "ok" only when all models are available
- Shows "(x/y) limited" when x out of y models are exhausted
- Provides better visibility into partial rate limiting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: update CLAUDE.md with model-specific rate limiting

- Document modelRateLimits[modelId] for per-model rate tracking
- Add isNetworkError() helper to utilities section

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: M1noa <minoa@minoa.cat>
Co-authored-by: Minoa <altgithub@minoa.cat>
Co-authored-by: Claude <noreply@anthropic.com>
2026-01-03 15:33:49 +05:30
Badri Narayanan S
f02364d4ef refactor: Reorganize src/ into modular folder structure
Split large monolithic files into focused modules:
- cloudcode-client.js (1,107 lines) → src/cloudcode/ (9 files)
- account-manager.js (639 lines) → src/account-manager/ (5 files)
- Move auth files to src/auth/ (oauth, token-extractor, database)
- Move CLI to src/cli/accounts.js

Update all import paths and documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-01 15:13:43 +05:30