Files

Badri Narayanan S 5ae19a5b72 feat: add configurable account selection strategies

Refactor account selection into a strategy pattern with three options:
- Sticky: cache-optimized, stays on same account until rate-limited
- Round-robin: load-balanced, rotates every request
- Hybrid (default): smart distribution using health scores, token buckets, and LRU

The hybrid strategy uses multiple signals for optimal account selection:
health tracking for reliability, client-side token buckets for rate limiting,
and LRU freshness to prefer rested accounts.

Includes WebUI settings for strategy selection and unit tests.

Co-Authored-By: Claude <noreply@anthropic.com>

2026-01-18 03:48:43 +05:30

22 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Antigravity Claude Proxy is a Node.js proxy server that exposes an Anthropic-compatible API backed by Antigravity's Cloud Code service. It enables using Claude models (claude-sonnet-4-5-thinking, claude-opus-4-5-thinking) and Gemini models (gemini-3-flash, gemini-3-pro-low, gemini-3-pro-high) with Claude Code CLI.

The proxy translates requests from Anthropic Messages API format → Google Generative AI format → Antigravity Cloud Code API, then converts responses back to Anthropic format with full thinking/streaming support.

Commands

# Install dependencies (automatically builds CSS via prepare hook)
npm install

# Start server (runs on port 8080)
npm start

# Start with specific account selection strategy
npm start -- --strategy=sticky      # Cache-optimized (stays on same account)
npm start -- --strategy=round-robin # Load-balanced (rotates every request)
npm start -- --strategy=hybrid      # Smart distribution (default)

# Start with model fallback enabled (falls back to alternate model when quota exhausted)
npm start -- --fallback

# Start with debug logging
npm start -- --debug

# Development mode (file watching)
npm run dev              # Watch server files only
npm run dev:full         # Watch both CSS and server files (recommended for frontend dev)

# CSS build commands
npm run build:css        # Build CSS once (minified)
npm run watch:css        # Watch CSS files for changes

# Account management
npm run accounts         # Interactive account management
npm run accounts:add     # Add a new Google account via OAuth
npm run accounts:add -- --no-browser  # Add account on headless server (manual code input)
npm run accounts:list    # List configured accounts
npm run accounts:verify  # Verify account tokens are valid

# Run all tests (server must be running on port 8080)
npm test

# Run individual tests
npm run test:signatures    # Thinking signatures
npm run test:multiturn     # Multi-turn with tools
npm run test:streaming     # Streaming SSE events
npm run test:interleaved   # Interleaved thinking
npm run test:images        # Image processing
npm run test:caching       # Prompt caching
npm run test:crossmodel    # Cross-model thinking signatures
npm run test:oauth         # OAuth no-browser mode

# Run strategy unit tests (no server required)
node tests/test-strategies.cjs

Architecture

Request Flow:

Claude Code CLI → Express Server (server.js) → CloudCode Client → Antigravity Cloud Code API

Directory Structure:

src/
├── index.js                    # Entry point
├── server.js                   # Express server
├── constants.js                # Configuration values
├── errors.js                   # Custom error classes
├── fallback-config.js          # Model fallback mappings and helpers
│
├── cloudcode/                  # Cloud Code API client
│   ├── index.js                # Public API exports
│   ├── session-manager.js      # Session ID derivation for caching
│   ├── rate-limit-parser.js    # Parse reset times from headers/errors
│   ├── request-builder.js      # Build API request payloads
│   ├── sse-parser.js           # Parse SSE for non-streaming
│   ├── sse-streamer.js         # Stream SSE events in real-time
│   ├── message-handler.js      # Non-streaming message handling
│   ├── streaming-handler.js    # Streaming message handling
│   └── model-api.js            # Model listing and quota APIs
│
├── account-manager/            # Multi-account pool management
│   ├── index.js                # AccountManager class facade
│   ├── storage.js              # Config file I/O and persistence
│   ├── rate-limits.js          # Rate limit tracking and state
│   ├── credentials.js          # OAuth token and project handling
│   └── strategies/             # Account selection strategies
│       ├── index.js            # Strategy factory (createStrategy)
│       ├── base-strategy.js    # Abstract base class
│       ├── sticky-strategy.js  # Cache-optimized sticky selection
│       ├── round-robin-strategy.js  # Load-balanced rotation
│       ├── hybrid-strategy.js  # Smart multi-signal distribution
│       └── trackers/           # State trackers for hybrid strategy
│           ├── index.js        # Re-exports trackers
│           ├── health-tracker.js    # Account health scores
│           └── token-bucket-tracker.js  # Client-side rate limiting
│
├── auth/                       # Authentication
│   ├── oauth.js                # Google OAuth with PKCE
│   ├── token-extractor.js      # Legacy token extraction from DB
│   └── database.js             # SQLite database access
│
├── webui/                      # Web Management Interface
│   └── index.js                # Express router and API endpoints
│
├── modules/                    # Feature modules
│   └── usage-stats.js          # Request tracking and history persistence
│
├── cli/                        # CLI tools
│   └── accounts.js             # Account management CLI
│
├── format/                     # Format conversion (Anthropic ↔ Google)
│   ├── index.js                # Re-exports all converters
│   ├── request-converter.js    # Anthropic → Google conversion
│   ├── response-converter.js   # Google → Anthropic conversion
│   ├── content-converter.js    # Message content conversion
│   ├── schema-sanitizer.js     # JSON Schema cleaning for Gemini
│   ├── thinking-utils.js       # Thinking block validation/recovery
│   └── signature-cache.js      # Signature cache (tool_use + thinking signatures)
│
└── utils/                      # Utilities
    ├── helpers.js              # formatDuration, sleep, isNetworkError
    ├── logger.js               # Structured logging
    └── native-module-helper.js # Auto-rebuild for native modules

Frontend Structure (public/):

public/
├── index.html                  # Main entry point
├── css/
│   ├── style.css               # Compiled Tailwind CSS (generated, do not edit)
│   └── src/
│       └── input.css           # Tailwind source with @apply directives
├── js/
│   ├── app.js                  # Main application logic (Alpine.js)
│   ├── config/                 # Application configuration
│   │   └── constants.js        # Centralized UI constants and limits
│   ├── store.js                # Global state management
│   ├── data-store.js           # Shared data store (accounts, models, quotas)
│   ├── settings-store.js       # Settings management store
│   ├── components/             # UI Components
│   │   ├── dashboard.js        # Main dashboard orchestrator
│   │   ├── account-manager.js  # Account list & OAuth handling
│   │   ├── logs-viewer.js      # Live log streaming
│   │   ├── claude-config.js    # CLI settings editor
│   │   ├── server-config.js    # Server settings UI
│   │   └── dashboard/          # Dashboard sub-modules
│   │       ├── stats.js        # Account statistics calculation
│   │       ├── charts.js       # Chart.js visualizations
│   │       └── filters.js      # Chart filter state management
│   └── utils/                  # Frontend utilities
│       ├── error-handler.js    # Centralized error handling with ErrorHandler.withLoading
│       ├── account-actions.js  # Account operations service layer (NEW)
│       ├── validators.js       # Input validation
│       └── model-config.js     # Model configuration helpers
└── views/                      # HTML partials (loaded dynamically)
    ├── dashboard.html
    ├── accounts.html
    ├── models.html
    ├── settings.html
    └── logs.html

Key Modules:

src/server.js: Express server exposing Anthropic-compatible endpoints (/v1/messages, /v1/models, /health, /account-limits) and mounting WebUI
src/webui/index.js: WebUI backend handling API routes (/api/*) for config, accounts, and logs
src/cloudcode/: Cloud Code API client with retry/failover logic, streaming and non-streaming support
- model-api.js: Model listing, quota retrieval (getModelQuotas()), and subscription tier detection (getSubscriptionTier())
src/account-manager/: Multi-account pool with configurable selection strategies, rate limit handling, and automatic cooldown
- Strategies: sticky (cache-optimized), round-robin (load-balanced), hybrid (smart distribution)
src/auth/: Authentication including Google OAuth, token extraction, database access, and auto-rebuild of native modules
src/format/: Format conversion between Anthropic and Google Generative AI formats
src/constants.js: API endpoints, model mappings, fallback config, OAuth config, and all configuration values
src/modules/usage-stats.js: Tracks request volume by model/family, persists 30-day history to JSON, and auto-prunes old data.
src/fallback-config.js: Model fallback mappings (getFallbackModel(), hasFallback())
src/errors.js: Custom error classes (RateLimitError, AuthError, ApiError, etc.)

Multi-Account Load Balancing:

Configurable selection strategy via --strategy flag or WebUI
Three strategies available:
- Sticky (--strategy=sticky): Best for prompt caching, stays on same account
- Round-Robin (--strategy=round-robin): Maximum throughput, rotates every request
- Hybrid (--strategy=hybrid, default): Smart selection using health + tokens + LRU
Model-specific rate limiting via account.modelRateLimits[modelId]
Automatic switch only when rate-limited for > 2 minutes on the current model
Session ID derived from first user message hash for cache continuity
Account state persisted to ~/.config/antigravity-proxy/accounts.json

Account Selection Strategies:

Sticky Strategy (best for caching):
- Stays on current account until rate-limited or unavailable
- Waits up to 2 minutes for short rate limits before switching
- Maintains prompt cache continuity across requests
Round-Robin Strategy (best for throughput):
- Rotates to next account on every request
- Skips rate-limited/disabled accounts
- Maximizes concurrent request distribution
Hybrid Strategy (default, smart distribution):
- Uses health scores, token buckets, and LRU for selection
- Scoring formula: score = (Health × 2) + ((Tokens / MaxTokens × 100) × 5) + (LRU × 0.1)
- Health scores: Track success/failure patterns with passive recovery
- Token buckets: Client-side rate limiting (50 tokens, 6 per minute regeneration)
- LRU freshness: Prefer accounts that have rested longer
- Configuration in src/config.js under accountSelection

Account Data Model: Each account object in accounts.json contains:

Basic Info: email, source (oauth/manual/database), enabled, lastUsed
Credentials: refreshToken (OAuth) or apiKey (manual)
Subscription: { tier, projectId, detectedAt } - automatically detected via loadCodeAssist API
- tier: 'free' | 'pro' | 'ultra' (detected from paidTier or currentTier)
Quota: { models: {}, lastChecked } - model-specific quota cache
- models[modelId]: { remainingFraction, resetTime } from fetchAvailableModels API
Rate Limits: modelRateLimits[modelId] - temporary rate limit state (in-memory during runtime)
Validity: isInvalid, invalidReason - tracks accounts needing re-authentication

Prompt Caching:

Cache is organization-scoped (requires same account + session ID)
Session ID is SHA256 hash of first user message content (stable across turns)
cache_read_input_tokens returned in usage metadata when cache hits
Token calculation: input_tokens = promptTokenCount - cachedContentTokenCount

Model Fallback (--fallback flag):

When all accounts are exhausted for a model, automatically falls back to an alternate model
Fallback mappings defined in MODEL_FALLBACK_MAP in src/constants.js
Thinking models fall back to thinking models (e.g., claude-sonnet-4-5-thinking → gemini-3-flash)
Fallback is disabled on recursive calls to prevent infinite chains
Enable with npm start -- --fallback or FALLBACK=true environment variable

Cross-Model Thinking Signatures:

Claude and Gemini use incompatible thinking signatures
When switching models mid-conversation, incompatible signatures are detected and dropped
Signature cache tracks model family ('claude' or 'gemini') for each signature
hasGeminiHistory() detects Gemini→Claude cross-model scenarios
Thinking recovery (closeToolLoopForThinking()) injects synthetic messages to close interrupted tool loops
For Gemini targets: strict validation - drops unknown or mismatched signatures
For Claude targets: lenient - lets Claude validate its own signatures

Native Module Auto-Rebuild:

When Node.js is updated, native modules like better-sqlite3 may become incompatible
The proxy automatically detects NODE_MODULE_VERSION mismatch errors
On detection, it attempts to rebuild the module using npm rebuild
If rebuild succeeds, the module is reloaded; if reload fails, a server restart is required
Implementation in src/utils/native-module-helper.js and lazy loading in src/auth/database.js

Web Management UI:

Stack: Vanilla JS + Alpine.js + Tailwind CSS (local build with PostCSS)
Build System:
- Tailwind CLI with JIT compilation
- PostCSS + Autoprefixer
- DaisyUI component library
- Custom @apply directives in public/css/src/input.css
- Compiled output: public/css/style.css (auto-generated on npm install)
Architecture: Single Page Application (SPA) with dynamic view loading
State Management:
- Alpine.store for global state (accounts, settings, logs)
- Layered architecture: Service Layer (account-actions.js) → Component Layer → UI
Features:
- Real-time dashboard with Chart.js visualization and subscription tier distribution
- Account list with tier badges (Ultra/Pro/Free) and quota progress bars
- OAuth flow handling via popup window
- Live log streaming via Server-Sent Events (SSE)
- Config editor for both Proxy and Claude CLI (~/.claude/settings.json)
- Skeleton loading screens for improved perceived performance
- Empty state UX with actionable prompts
- Loading states for all async operations
Accessibility:
- ARIA labels on search inputs and icon buttons
- Keyboard navigation support (Escape to clear search)
Security: Optional password protection via WEBUI_PASSWORD env var
Smart Refresh: Client-side polling with ±20% jitter and tab visibility detection (3x slower when hidden)

Testing Notes

Tests require the server to be running (npm start in separate terminal)
Tests are CommonJS files (.cjs) that make HTTP requests to the local proxy
Shared test utilities are in tests/helpers/http-client.cjs
Test runner supports filtering: node tests/run-all.cjs <filter> to run matching tests

Code Organization

Constants: All configuration values are centralized in src/constants.js:

API endpoints and headers
Model mappings and model family detection (getModelFamily(), isThinkingModel())
Model fallback mappings (MODEL_FALLBACK_MAP)
OAuth configuration
Rate limit thresholds
Thinking model settings

Model Family Handling:

getModelFamily(model) returns 'claude' or 'gemini' based on model name
Claude models use signature field on thinking blocks
Gemini models use thoughtSignature field on functionCall parts (cached or sentinel value)
When Claude Code strips thoughtSignature, the proxy tries to restore from cache, then falls back to skip_thought_signature_validator

Error Handling: Use custom error classes from src/errors.js:

RateLimitError - 429/RESOURCE_EXHAUSTED errors
AuthError - Authentication failures
ApiError - Upstream API errors
Helper functions: isRateLimitError(), isAuthError()

Utilities: Shared helpers in src/utils/helpers.js:

formatDuration(ms) - Format milliseconds as "1h23m45s"
sleep(ms) - Promise-based delay
isNetworkError(error) - Check if error is a transient network error

Data Persistence:

Subscription and quota data are automatically fetched when /account-limits is called
Updated data is saved to accounts.json asynchronously (non-blocking)
On server restart, accounts load with last known subscription/quota state
Quota is refreshed on each WebUI poll (default: 30s with jitter)

Logger: Structured logging via src/utils/logger.js:

logger.info(msg) - Standard info (blue)
logger.success(msg) - Success messages (green)
logger.warn(msg) - Warnings (yellow)
logger.error(msg) - Errors (red)
logger.debug(msg) - Debug output (magenta, only when enabled)
logger.setDebug(true) - Enable debug mode
logger.isDebugEnabled - Check if debug mode is on

WebUI APIs:

/api/accounts/* - Account management (list, add, remove, refresh)
/api/config/* - Server configuration (read/write)
/api/claude/config - Claude CLI settings
/api/logs/stream - SSE endpoint for real-time logs
/api/stats/history - Retrieve 30-day request history (sorted chronologically)
/api/auth/url - Generate Google OAuth URL
/account-limits - Fetch account quotas and subscription data
- Returns: { accounts: [{ email, subscription: { tier, projectId }, limits: {...} }], models: [...] }
- Query params: ?format=table (ASCII table) or ?includeHistory=true (adds usage stats)

Frontend Development

CSS Build System

Workflow:

Edit styles in public/css/src/input.css (Tailwind source with @apply directives)
Run npm run build:css to compile (or npm run watch:css for auto-rebuild)
Compiled CSS output: public/css/style.css (minified, committed to git)

Component Styles:

Use @apply to abstract common Tailwind patterns into reusable classes
Example: .btn-action-ghost, .status-pill-success, .input-search
Skeleton loading: .skeleton, .skeleton-stat-card, .skeleton-chart

When to rebuild:

After modifying public/css/src/input.css
After pulling changes that updated CSS source
Automatically on npm install (via prepare hook)

Error Handling Pattern

Use window.ErrorHandler.withLoading() for async operations:

async myOperation() {
  return await window.ErrorHandler.withLoading(async () => {
    // Your async code here
    const result = await someApiCall();
    if (!result.ok) {
      throw new Error('Operation failed');
    }
    return result;
  }, this, 'loading', { errorMessage: 'Failed to complete operation' });
}

Automatically manages this.loading state
Shows error toast on failure
Always resets loading state in finally block

Frontend Configuration

Constants: All frontend magic numbers and configuration values are centralized in public/js/config/constants.js. Use window.AppConstants to access:

INTERVALS: Refresh rates and timeouts
LIMITS: Data quotas and display limits
UI: Animation durations and delay settings

Account Operations Service Layer

Use window.AccountActions for account operations instead of direct API calls:

// ✅ Good: Use service layer
const result = await window.AccountActions.refreshAccount(email);
if (result.success) {
  this.$store.global.showToast('Account refreshed', 'success');
} else {
  this.$store.global.showToast(result.error, 'error');
}

// ❌ Bad: Direct API call in component
const response = await fetch(`/api/accounts/${email}/refresh`);

Available methods:

refreshAccount(email) - Refresh token and quota
toggleAccount(email, enabled) - Enable/disable account (with optimistic update)
deleteAccount(email) - Delete account
getFixAccountUrl(email) - Get OAuth re-auth URL
reloadAccounts() - Reload from disk
canDelete(account) - Check if account is deletable

All methods return {success: boolean, data?: object, error?: string}

Dashboard Modules

Dashboard is split into three modules for maintainability:

stats.js - Account statistics calculation
- updateStats(component) - Computes active/limited/total counts
- Updates subscription tier distribution
charts.js - Chart.js visualizations
- initQuotaChart(component) - Initialize quota distribution pie chart
- initTrendChart(component) - Initialize usage trend line chart
- updateQuotaChart(component) - Update quota chart data
- updateTrendChart(component) - Update trend chart (with concurrency lock)
filters.js - Filter state management
- getInitialState() - Default filter values
- loadPreferences(component) - Load from localStorage
- savePreferences(component) - Save to localStorage
- autoSelectTopN(component) - Smart select top 5 active models
- Filter types: time range (1h/6h/24h/7d/all), display mode, family/model selection

Each module is well-documented with JSDoc comments.

Maintenance

When making significant changes to the codebase (new modules, refactoring, architectural changes), update this CLAUDE.md and the README.md file to keep documentation in sync.

22 KiB Raw Blame History Unescape Escape