Files
jgor20 a43d2332ca feat: per-account quota threshold protection (#212)
feat: per-account quota threshold protection

Resolves #135

- Adds configurable quota protection with three-tier threshold resolution (per-model → per-account → global)
- New global Minimum Quota Level slider in Settings
- Per-account threshold settings via Account Settings modal
- Draggable per-account threshold markers on model quota bars
- Backend: PATCH /api/accounts/:email endpoint, globalQuotaThreshold config
- i18n: quota protection keys for all 5 languages
2026-02-01 17:15:46 +05:30

486 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Antigravity Claude Proxy is a Node.js proxy server that exposes an Anthropic-compatible API backed by Antigravity's Cloud Code service. It enables using Claude models (`claude-sonnet-4-5-thinking`, `claude-opus-4-5-thinking`) and Gemini models (`gemini-3-flash`, `gemini-3-pro-low`, `gemini-3-pro-high`) with Claude Code CLI.
The proxy translates requests from Anthropic Messages API format → Google Generative AI format → Antigravity Cloud Code API, then converts responses back to Anthropic format with full thinking/streaming support.
## Commands
```bash
# Install dependencies (automatically builds CSS via prepare hook)
npm install
# Start server (runs on port 8080)
npm start
# Start with specific account selection strategy
npm start -- --strategy=sticky # Cache-optimized (stays on same account)
npm start -- --strategy=round-robin # Load-balanced (rotates every request)
npm start -- --strategy=hybrid # Smart distribution (default)
# Start with model fallback enabled (falls back to alternate model when quota exhausted)
npm start -- --fallback
# Start with debug logging
npm start -- --debug
# Development mode (file watching)
npm run dev # Watch server files only
npm run dev:full # Watch both CSS and server files (recommended for frontend dev)
# CSS build commands
npm run build:css # Build CSS once (minified)
npm run watch:css # Watch CSS files for changes
# Account management
npm run accounts # Interactive account management
npm run accounts:add # Add a new Google account via OAuth
npm run accounts:add -- --no-browser # Add account on headless server (manual code input)
npm run accounts:list # List configured accounts
npm run accounts:verify # Verify account tokens are valid
# Run all tests (server must be running on port 8080)
npm test
# Run individual tests
npm run test:signatures # Thinking signatures
npm run test:multiturn # Multi-turn with tools
npm run test:streaming # Streaming SSE events
npm run test:interleaved # Interleaved thinking
npm run test:images # Image processing
npm run test:caching # Prompt caching
npm run test:crossmodel # Cross-model thinking signatures
npm run test:oauth # OAuth no-browser mode
npm run test:cache-control # Cache control field stripping
# Run strategy unit tests (no server required)
node tests/test-strategies.cjs
```
## Architecture
**Request Flow:**
```
Claude Code CLI → Express Server (server.js) → CloudCode Client → Antigravity Cloud Code API
```
**Directory Structure:**
```
src/
├── index.js # Entry point
├── server.js # Express server
├── constants.js # Configuration values
├── errors.js # Custom error classes
├── fallback-config.js # Model fallback mappings and helpers
├── cloudcode/ # Cloud Code API client
│ ├── index.js # Public API exports
│ ├── session-manager.js # Session ID derivation for caching
│ ├── rate-limit-parser.js # Parse reset times from headers/errors
│ ├── request-builder.js # Build API request payloads
│ ├── sse-parser.js # Parse SSE for non-streaming
│ ├── sse-streamer.js # Stream SSE events in real-time
│ ├── message-handler.js # Non-streaming message handling
│ ├── streaming-handler.js # Streaming message handling
│ └── model-api.js # Model listing and quota APIs
├── account-manager/ # Multi-account pool management
│ ├── index.js # AccountManager class facade
│ ├── storage.js # Config file I/O and persistence
│ ├── rate-limits.js # Rate limit tracking and state
│ ├── credentials.js # OAuth token and project handling
│ └── strategies/ # Account selection strategies
│ ├── index.js # Strategy factory (createStrategy)
│ ├── base-strategy.js # Abstract base class
│ ├── sticky-strategy.js # Cache-optimized sticky selection
│ ├── round-robin-strategy.js # Load-balanced rotation
│ ├── hybrid-strategy.js # Smart multi-signal distribution
│ └── trackers/ # State trackers for hybrid strategy
│ ├── index.js # Re-exports trackers
│ ├── health-tracker.js # Account health scores
│ ├── token-bucket-tracker.js # Client-side rate limiting
│ └── quota-tracker.js # Quota-aware account selection
├── auth/ # Authentication
│ ├── oauth.js # Google OAuth with PKCE
│ ├── token-extractor.js # Legacy token extraction from DB
│ └── database.js # SQLite database access
├── webui/ # Web Management Interface
│ └── index.js # Express router and API endpoints
├── modules/ # Feature modules
│ └── usage-stats.js # Request tracking and history persistence
├── cli/ # CLI tools
│ └── accounts.js # Account management CLI
├── format/ # Format conversion (Anthropic ↔ Google)
│ ├── index.js # Re-exports all converters
│ ├── request-converter.js # Anthropic → Google conversion
│ ├── response-converter.js # Google → Anthropic conversion
│ ├── content-converter.js # Message content conversion
│ ├── schema-sanitizer.js # JSON Schema cleaning for Gemini
│ ├── thinking-utils.js # Thinking block validation/recovery
│ └── signature-cache.js # Signature cache (tool_use + thinking signatures)
└── utils/ # Utilities
├── helpers.js # formatDuration, sleep, isNetworkError
├── logger.js # Structured logging
└── native-module-helper.js # Auto-rebuild for native modules
```
**Frontend Structure (public/):**
```
public/
├── index.html # Main entry point
├── css/
│ ├── style.css # Compiled Tailwind CSS (generated, do not edit)
│ └── src/
│ └── input.css # Tailwind source with @apply directives
├── js/
│ ├── app.js # Main application logic (Alpine.js)
│ ├── config/ # Application configuration
│ │ └── constants.js # Centralized UI constants and limits
│ ├── store.js # Global state management
│ ├── data-store.js # Shared data store (accounts, models, quotas)
│ ├── settings-store.js # Settings management store
│ ├── components/ # UI Components
│ │ ├── dashboard.js # Main dashboard orchestrator
│ │ ├── account-manager.js # Account list, OAuth, & threshold settings
│ │ ├── models.js # Model list with draggable quota threshold markers
│ │ ├── logs-viewer.js # Live log streaming
│ │ ├── claude-config.js # CLI settings editor
│ │ ├── server-config.js # Server settings UI
│ │ └── dashboard/ # Dashboard sub-modules
│ │ ├── stats.js # Account statistics calculation
│ │ ├── charts.js # Chart.js visualizations
│ │ └── filters.js # Chart filter state management
│ └── utils/ # Frontend utilities
│ ├── error-handler.js # Centralized error handling with ErrorHandler.withLoading
│ ├── account-actions.js # Account operations service layer (NEW)
│ ├── validators.js # Input validation
│ └── model-config.js # Model configuration helpers
└── views/ # HTML partials (loaded dynamically)
├── dashboard.html
├── accounts.html
├── models.html
├── settings.html
└── logs.html
```
**Key Modules:**
- **src/server.js**: Express server exposing Anthropic-compatible endpoints (`/v1/messages`, `/v1/models`, `/health`, `/account-limits`) and mounting WebUI
- **src/webui/index.js**: WebUI backend handling API routes (`/api/*`) for config, accounts, and logs
- **src/cloudcode/**: Cloud Code API client with retry/failover logic, streaming and non-streaming support
- `model-api.js`: Model listing, quota retrieval (`getModelQuotas()`), and subscription tier detection (`getSubscriptionTier()`)
- **src/account-manager/**: Multi-account pool with configurable selection strategies, rate limit handling, and automatic cooldown
- Strategies: `sticky` (cache-optimized), `round-robin` (load-balanced), `hybrid` (smart distribution)
- **src/auth/**: Authentication including Google OAuth, token extraction, database access, and auto-rebuild of native modules
- **src/format/**: Format conversion between Anthropic and Google Generative AI formats
- **src/config.js**: Runtime configuration with defaults (`globalQuotaThreshold`, `maxAccounts`, `accountSelection`, etc.)
- **src/constants.js**: API endpoints, model mappings, fallback config, OAuth config, and all configuration values
- **src/modules/usage-stats.js**: Tracks request volume by model/family, persists 30-day history to JSON, and auto-prunes old data.
- **src/fallback-config.js**: Model fallback mappings (`getFallbackModel()`, `hasFallback()`)
- **src/errors.js**: Custom error classes (`RateLimitError`, `AuthError`, `ApiError`, etc.)
**Multi-Account Load Balancing:**
- Configurable selection strategy via `--strategy` flag or WebUI
- Three strategies available:
- **Sticky** (`--strategy=sticky`): Best for prompt caching, stays on same account
- **Round-Robin** (`--strategy=round-robin`): Maximum throughput, rotates every request
- **Hybrid** (`--strategy=hybrid`, default): Smart selection using health + tokens + LRU
- Model-specific rate limiting via `account.modelRateLimits[modelId]`
- Automatic switch only when rate-limited for > 2 minutes on the current model
- Session ID derived from first user message hash for cache continuity
- Account state persisted to `~/.config/antigravity-proxy/accounts.json`
**Account Selection Strategies:**
1. **Sticky Strategy** (best for caching):
- Stays on current account until rate-limited or unavailable
- Waits up to 2 minutes for short rate limits before switching
- Maintains prompt cache continuity across requests
2. **Round-Robin Strategy** (best for throughput):
- Rotates to next account on every request
- Skips rate-limited/disabled accounts
- Maximizes concurrent request distribution
3. **Hybrid Strategy** (default, smart distribution):
- Uses health scores, token buckets, quota awareness, and LRU for selection
- Scoring formula: `score = (Health × 2) + ((Tokens / MaxTokens × 100) × 5) + (Quota × 1) + (LRU × 0.1)`
- Health scores: Track success/failure patterns with passive recovery
- Token buckets: Client-side rate limiting (50 tokens, 6 per minute regeneration)
- Quota awareness: Accounts below configurable quota threshold are deprioritized
- LRU freshness: Prefer accounts that have rested longer
- **Emergency/Last Resort Fallback**: When all accounts are exhausted:
- Emergency fallback: Bypasses health check, adds 250ms throttle delay
- Last resort fallback: Bypasses both health and token checks, adds 500ms throttle delay
- Configuration in `src/config.js` under `accountSelection`
**Quota Threshold (Quota Protection):**
- Configurable minimum quota level before the proxy switches to another account
- Three-tier threshold resolution (highest priority first):
1. **Per-model**: `account.modelQuotaThresholds[modelId]` - override for specific models
2. **Per-account**: `account.quotaThreshold` - account-level default
3. **Global**: `config.globalQuotaThreshold` - server-wide default (0 = disabled)
- All thresholds are stored as fractions (0-0.99), displayed as percentages (0-99%) in the UI
- Global threshold configurable via WebUI Settings → Quota Protection
- Per-account and per-model thresholds configurable via Account Settings modal or draggable markers on model quota bars
- Used by `QuotaTracker.isQuotaCritical()` in the hybrid strategy to exclude low-quota accounts
**Account Data Model:**
Each account object in `accounts.json` contains:
- **Basic Info**: `email`, `source` (oauth/manual/database), `enabled`, `lastUsed`
- **Credentials**: `refreshToken` (OAuth) or `apiKey` (manual)
- **Subscription**: `{ tier, projectId, detectedAt }` - automatically detected via `loadCodeAssist` API
- `tier`: 'free' | 'pro' | 'ultra' (detected from `paidTier` or `currentTier`)
- **Quota**: `{ models: {}, lastChecked }` - model-specific quota cache
- `models[modelId]`: `{ remainingFraction, resetTime }` from `fetchAvailableModels` API
- **Quota Thresholds**: Per-account quota protection settings
- `quotaThreshold`: Account-level minimum quota fraction (0-0.99, `undefined` = use global)
- `modelQuotaThresholds`: `{ [modelId]: fraction }` - per-model overrides (takes priority over account-level)
- **Rate Limits**: `modelRateLimits[modelId]` - temporary rate limit state (in-memory during runtime)
- **Validity**: `isInvalid`, `invalidReason` - tracks accounts needing re-authentication
**Prompt Caching:**
- Cache is organization-scoped (requires same account + session ID)
- Session ID is SHA256 hash of first user message content (stable across turns)
- `cache_read_input_tokens` returned in usage metadata when cache hits
- Token calculation: `input_tokens = promptTokenCount - cachedContentTokenCount`
**Model Fallback (--fallback flag):**
- When all accounts are exhausted for a model, automatically falls back to an alternate model
- Fallback mappings defined in `MODEL_FALLBACK_MAP` in `src/constants.js`
- Thinking models fall back to thinking models (e.g., `claude-sonnet-4-5-thinking``gemini-3-flash`)
- Fallback is disabled on recursive calls to prevent infinite chains
- Enable with `npm start -- --fallback` or `FALLBACK=true` environment variable
**Cross-Model Thinking Signatures:**
- Claude and Gemini use incompatible thinking signatures
- When switching models mid-conversation, incompatible signatures are detected and dropped
- Signature cache tracks model family ('claude' or 'gemini') for each signature
- `hasGeminiHistory()` detects Gemini→Claude cross-model scenarios
- Thinking recovery (`closeToolLoopForThinking()`) injects synthetic messages to close interrupted tool loops
- For Gemini targets: strict validation - drops unknown or mismatched signatures
- For Claude targets: lenient - lets Claude validate its own signatures
**Cache Control Handling (Issue #189):**
- Claude Code CLI sends `cache_control` fields on content blocks for prompt caching
- Cloud Code API rejects these with "Extra inputs are not permitted"
- `cleanCacheControl(messages)` strips cache_control from ALL block types at pipeline entry
- Called at the START of `convertAnthropicToGoogle()` before any other processing
- Additional sanitizers (`sanitizeTextBlock`, `sanitizeToolUseBlock`) provide defense-in-depth
- Pattern inspired by Antigravity-Manager's `clean_cache_control_from_messages()`
**Native Module Auto-Rebuild:**
- When Node.js is updated, native modules like `better-sqlite3` may become incompatible
- The proxy automatically detects `NODE_MODULE_VERSION` mismatch errors
- On detection, it attempts to rebuild the module using `npm rebuild`
- If rebuild succeeds, the module is reloaded; if reload fails, a server restart is required
- Implementation in `src/utils/native-module-helper.js` and lazy loading in `src/auth/database.js`
**Web Management UI:**
- **Stack**: Vanilla JS + Alpine.js + Tailwind CSS (local build with PostCSS)
- **Build System**:
- Tailwind CLI with JIT compilation
- PostCSS + Autoprefixer
- DaisyUI component library
- Custom `@apply` directives in `public/css/src/input.css`
- Compiled output: `public/css/style.css` (auto-generated on `npm install`)
- **Architecture**: Single Page Application (SPA) with dynamic view loading
- **State Management**:
- Alpine.store for global state (accounts, settings, logs)
- Layered architecture: Service Layer (`account-actions.js`) → Component Layer → UI
- **Features**:
- Real-time dashboard with Chart.js visualization and subscription tier distribution
- Account list with tier badges (Ultra/Pro/Free), quota progress bars, and per-account threshold settings
- Model quota bars with draggable per-account threshold markers (color-coded, with overlap handling)
- OAuth flow handling via popup window
- Live log streaming via Server-Sent Events (SSE)
- Config editor for both Proxy and Claude CLI (`~/.claude/settings.json`)
- Skeleton loading screens for improved perceived performance
- Empty state UX with actionable prompts
- Loading states for all async operations
- **Accessibility**:
- ARIA labels on search inputs and icon buttons
- Keyboard navigation support (Escape to clear search)
- **Security**: Optional password protection via `WEBUI_PASSWORD` env var
- **Config Redaction**: Sensitive values (passwords, tokens) are redacted in API responses
- **Smart Refresh**: Client-side polling with ±20% jitter and tab visibility detection (3x slower when hidden)
- **i18n Support**: English, Chinese (中文), Indonesian (Bahasa), Portuguese (PT-BR), Turkish (Türkçe)
## Testing Notes
- Tests require the server to be running (`npm start` in separate terminal)
- Tests are CommonJS files (`.cjs`) that make HTTP requests to the local proxy
- Shared test utilities are in `tests/helpers/http-client.cjs`
- Test runner supports filtering: `node tests/run-all.cjs <filter>` to run matching tests
## Code Organization
**Constants:** All configuration values are centralized in `src/constants.js`:
- API endpoints and headers
- Model mappings and model family detection (`getModelFamily()`, `isThinkingModel()`)
- Model fallback mappings (`MODEL_FALLBACK_MAP`)
- OAuth configuration
- Rate limit thresholds
- Thinking model settings
**Model Family Handling:**
- `getModelFamily(model)` returns `'claude'` or `'gemini'` based on model name
- Claude models use `signature` field on thinking blocks
- Gemini models use `thoughtSignature` field on functionCall parts (cached or sentinel value)
- When Claude Code strips `thoughtSignature`, the proxy tries to restore from cache, then falls back to `skip_thought_signature_validator`
**Error Handling:** Use custom error classes from `src/errors.js`:
- `RateLimitError` - 429/RESOURCE_EXHAUSTED errors
- `AuthError` - Authentication failures
- `ApiError` - Upstream API errors
- Helper functions: `isRateLimitError()`, `isAuthError()`
**Utilities:** Shared helpers in `src/utils/helpers.js`:
- `formatDuration(ms)` - Format milliseconds as "1h23m45s"
- `sleep(ms)` - Promise-based delay
- `isNetworkError(error)` - Check if error is a transient network error
**Data Persistence:**
- Subscription and quota data are automatically fetched when `/account-limits` is called
- Updated data is saved to `accounts.json` asynchronously (non-blocking)
- On server restart, accounts load with last known subscription/quota state
- Quota is refreshed on each WebUI poll (default: 30s with jitter)
**Logger:** Structured logging via `src/utils/logger.js`:
- `logger.info(msg)` - Standard info (blue)
- `logger.success(msg)` - Success messages (green)
- `logger.warn(msg)` - Warnings (yellow)
- `logger.error(msg)` - Errors (red)
- `logger.debug(msg)` - Debug output (magenta, only when enabled)
- `logger.setDebug(true)` - Enable debug mode
- `logger.isDebugEnabled` - Check if debug mode is on
**WebUI APIs:**
- `/api/accounts/*` - Account management (list, add, remove, refresh, threshold settings)
- `PATCH /api/accounts/:email` - Update account quota thresholds (`quotaThreshold`, `modelQuotaThresholds`)
- `/api/config/*` - Server configuration (read/write, includes `globalQuotaThreshold`)
- `/api/claude/config` - Claude CLI settings
- `/api/claude/mode` - Switch between Proxy/Paid mode (updates settings.json)
- `/api/logs/stream` - SSE endpoint for real-time logs
- `/api/stats/history` - Retrieve 30-day request history (sorted chronologically)
- `/api/auth/url` - Generate Google OAuth URL
- `/account-limits` - Fetch account quotas and subscription data
- Returns: `{ accounts: [{ email, subscription, limits, quotaThreshold, modelQuotaThresholds, ... }], models: [...], globalQuotaThreshold }`
- Query params: `?format=table` (ASCII table) or `?includeHistory=true` (adds usage stats)
## Frontend Development
### CSS Build System
**Workflow:**
1. Edit styles in `public/css/src/input.css` (Tailwind source with `@apply` directives)
2. Run `npm run build:css` to compile (or `npm run watch:css` for auto-rebuild)
3. Compiled CSS output: `public/css/style.css` (minified, committed to git)
**Component Styles:**
- Use `@apply` to abstract common Tailwind patterns into reusable classes
- Example: `.btn-action-ghost`, `.status-pill-success`, `.input-search`
- Skeleton loading: `.skeleton`, `.skeleton-stat-card`, `.skeleton-chart`
**When to rebuild:**
- After modifying `public/css/src/input.css`
- After pulling changes that updated CSS source
- Automatically on `npm install` (via `prepare` hook)
### Error Handling Pattern
Use `window.ErrorHandler.withLoading()` for async operations:
```javascript
async myOperation() {
return await window.ErrorHandler.withLoading(async () => {
// Your async code here
const result = await someApiCall();
if (!result.ok) {
throw new Error('Operation failed');
}
return result;
}, this, 'loading', { errorMessage: 'Failed to complete operation' });
}
```
- Automatically manages `this.loading` state
- Shows error toast on failure
- Always resets loading state in `finally` block
### Frontend Configuration
**Constants**:
All frontend magic numbers and configuration values are centralized in `public/js/config/constants.js`. Use `window.AppConstants` to access:
- `INTERVALS`: Refresh rates and timeouts
- `LIMITS`: Data quotas and display limits
- `UI`: Animation durations and delay settings
### Account Operations Service Layer
Use `window.AccountActions` for account operations instead of direct API calls:
```javascript
// ✅ Good: Use service layer
const result = await window.AccountActions.refreshAccount(email);
if (result.success) {
this.$store.global.showToast('Account refreshed', 'success');
} else {
this.$store.global.showToast(result.error, 'error');
}
// ❌ Bad: Direct API call in component
const response = await fetch(`/api/accounts/${email}/refresh`);
```
**Available methods:**
- `refreshAccount(email)` - Refresh token and quota
- `toggleAccount(email, enabled)` - Enable/disable account (with optimistic update)
- `deleteAccount(email)` - Delete account
- `getFixAccountUrl(email)` - Get OAuth re-auth URL
- `reloadAccounts()` - Reload from disk
- `canDelete(account)` - Check if account is deletable
All methods return `{success: boolean, data?: object, error?: string}`
### Dashboard Modules
Dashboard is split into three modules for maintainability:
1. **stats.js** - Account statistics calculation
- `updateStats(component)` - Computes active/limited/total counts
- Updates subscription tier distribution
2. **charts.js** - Chart.js visualizations
- `initQuotaChart(component)` - Initialize quota distribution pie chart
- `initTrendChart(component)` - Initialize usage trend line chart
- `updateQuotaChart(component)` - Update quota chart data
- `updateTrendChart(component)` - Update trend chart (with concurrency lock)
3. **filters.js** - Filter state management
- `getInitialState()` - Default filter values
- `loadPreferences(component)` - Load from localStorage
- `savePreferences(component)` - Save to localStorage
- `autoSelectTopN(component)` - Smart select top 5 active models
- Filter types: time range (1h/6h/24h/7d/all), display mode, family/model selection
Each module is well-documented with JSDoc comments.
## Maintenance
When making significant changes to the codebase (new modules, refactoring, architectural changes), update this CLAUDE.md and the README.md file to keep documentation in sync.