fix: strip cache_control fields from content blocks (#189)

Claude Code CLI sends cache_control on text, thinking, tool_use, and
tool_result blocks for prompt caching. Cloud Code API rejects these
with "Extra inputs are not permitted".

- Add cleanCacheControl() to proactively strip cache_control at pipeline entry
- Add sanitizeTextBlock() and sanitizeToolUseBlock() for defense-in-depth
- Update reorderAssistantContent() to use block sanitizers
- Add test-cache-control.cjs with multi-model test coverage
- Update frontend dashboard tests to match current UI design
- Update strategy tests to match v2.4.0 fallback behavior
- Update CLAUDE.md and README.md with recent features

Inspired by Antigravity-Manager's clean_cache_control_from_messages() pattern.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Badri Narayanan S
2026-01-25 03:27:05 +05:30
parent 6cadaee928
commit 683ca41480
9 changed files with 466 additions and 30 deletions

View File

@@ -84,6 +84,8 @@ Choose one of the following methods to authorize the proxy:
2. Navigate to the **Accounts** tab and click **Add Account**.
3. Complete the Google OAuth authorization in the popup window.
> **Headless/Remote Servers**: If running on a server without a browser, the WebUI supports a "Manual Authorization" mode. After clicking "Add Account", you can copy the OAuth URL, complete authorization on your local machine, and paste the authorization code back.
#### **Method B: CLI (Desktop or Headless)**
If you prefer the terminal or are on a remote server:
@@ -280,7 +282,7 @@ Choose a strategy based on your needs:
| Strategy | Best For | Description |
| --- | --- | --- |
| **Hybrid** (Default) | Most users | Smart selection combining health score, token bucket rate limiting, and LRU freshness |
| **Hybrid** (Default) | Most users | Smart selection combining health score, token bucket rate limiting, quota awareness, and LRU freshness |
| **Sticky** | Prompt caching | Stays on the same account to maximize cache hits, switches only when rate-limited |
| **Round-Robin** | Even distribution | Cycles through accounts sequentially for balanced load |
@@ -298,6 +300,8 @@ antigravity-claude-proxy start --strategy=round-robin # Load-balanced
- **Health Score Tracking**: Accounts earn points for successful requests and lose points for failures/rate-limits
- **Token Bucket Rate Limiting**: Client-side throttling with regenerating tokens (50 max, 6/minute)
- **Quota Awareness**: Accounts with critical quota (<5%) are deprioritized; exhausted accounts trigger emergency fallback
- **Emergency Fallback**: When all accounts appear exhausted, bypasses checks with throttle delays (250-500ms)
- **Automatic Cooldown**: Rate-limited accounts recover automatically after reset time expires
- **Invalid Account Detection**: Accounts needing re-authentication are marked and skipped
- **Prompt Caching Support**: Session IDs derived from conversation enable cache hits across turns
@@ -340,13 +344,14 @@ The proxy includes a built-in, modern web interface for real-time monitoring and
- **Real-time Dashboard**: Monitor request volume, active accounts, model health, and subscription tier distribution.
- **Visual Model Quota**: Track per-model usage and next reset times with color-coded progress indicators.
- **Account Management**: Add/remove Google accounts via OAuth, view subscription tiers (Free/Pro/Ultra) and quota status at a glance.
- **Manual OAuth Mode**: Add accounts on headless servers by copying the OAuth URL and pasting the authorization code.
- **Claude CLI Configuration**: Edit your `~/.claude/settings.json` directly from the browser.
- **Persistent History**: Tracks request volume by model family for 30 days, persisting across server restarts.
- **Time Range Filtering**: Analyze usage trends over 1H, 6H, 24H, 7D, or All Time periods.
- **Smart Analysis**: Auto-select top 5 most used models or toggle between Family/Model views.
- **Live Logs**: Stream server logs with level-based filtering and search.
- **Advanced Tuning**: Configure retries, timeouts, and debug mode on the fly.
- **Bilingual Interface**: Full support for English and Chinese (switch via Settings).
- **Multi-language Interface**: Full support for English, Chinese (中文), Indonesian (Bahasa), and Portuguese (PT-BR).
---
@@ -360,9 +365,11 @@ While most users can use the default settings, you can tune the proxy behavior v
- **WebUI Password**: Secure your dashboard with `WEBUI_PASSWORD` env var or in config.
- **Custom Port**: Change the default `8080` port.
- **Retry Logic**: Configure `maxRetries`, `retryBaseMs`, and `retryMaxMs`.
- **Rate Limit Handling**: Comprehensive rate limit detection from headers and error messages with intelligent retry-after parsing.
- **Load Balancing**: Adjust `defaultCooldownMs` and `maxWaitBeforeErrorMs`.
- **Persistence**: Enable `persistTokenCache` to save OAuth sessions across restarts.
- **Max Accounts**: Set `maxAccounts` (1-100) to limit the number of Google accounts. Default: 10.
- **Endpoint Fallback**: Automatic 403/404 endpoint fallback for API compatibility.
Refer to `config.example.json` for a complete list of fields and documentation.
@@ -421,6 +428,7 @@ npm run test:interleaved # Interleaved thinking
npm run test:images # Image processing
npm run test:caching # Prompt caching
npm run test:strategies # Account selection strategies
npm run test:cache-control # Cache control field stripping
```
---