fix: strip cache_control fields from content blocks (#189)

Claude Code CLI sends cache_control on text, thinking, tool_use, and tool_result blocks for prompt caching. Cloud Code API rejects these with "Extra inputs are not permitted". - Add cleanCacheControl() to proactively strip cache_control at pipeline entry - Add sanitizeTextBlock() and sanitizeToolUseBlock() for defense-in-depth - Update reorderAssistantContent() to use block sanitizers - Add test-cache-control.cjs with multi-model test coverage - Update frontend dashboard tests to match current UI design - Update strategy tests to match v2.4.0 fallback behavior - Update CLAUDE.md and README.md with recent features Inspired by Antigravity-Manager's clean_cache_control_from_messages() pattern. Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-25 03:27:05 +05:30
parent 6cadaee928
commit 683ca41480
9 changed files with 466 additions and 30 deletions
--- a/README.md
+++ b/README.md
@@ -84,6 +84,8 @@ Choose one of the following methods to authorize the proxy:
 2. Navigate to the **Accounts** tab and click **Add Account**.
 3. Complete the Google OAuth authorization in the popup window.

+> **Headless/Remote Servers**: If running on a server without a browser, the WebUI supports a "Manual Authorization" mode. After clicking "Add Account", you can copy the OAuth URL, complete authorization on your local machine, and paste the authorization code back.
+
 #### **Method B: CLI (Desktop or Headless)**

 If you prefer the terminal or are on a remote server:
@@ -280,7 +282,7 @@ Choose a strategy based on your needs:

 | Strategy | Best For | Description |
 | --- | --- | --- |
-| **Hybrid** (Default) | Most users | Smart selection combining health score, token bucket rate limiting, and LRU freshness |
+| **Hybrid** (Default) | Most users | Smart selection combining health score, token bucket rate limiting, quota awareness, and LRU freshness |
 | **Sticky** | Prompt caching | Stays on the same account to maximize cache hits, switches only when rate-limited |
 | **Round-Robin** | Even distribution | Cycles through accounts sequentially for balanced load |

@@ -298,6 +300,8 @@ antigravity-claude-proxy start --strategy=round-robin  # Load-balanced

 - **Health Score Tracking**: Accounts earn points for successful requests and lose points for failures/rate-limits
 - **Token Bucket Rate Limiting**: Client-side throttling with regenerating tokens (50 max, 6/minute)
+- **Quota Awareness**: Accounts with critical quota (<5%) are deprioritized; exhausted accounts trigger emergency fallback
+- **Emergency Fallback**: When all accounts appear exhausted, bypasses checks with throttle delays (250-500ms)
 - **Automatic Cooldown**: Rate-limited accounts recover automatically after reset time expires
 - **Invalid Account Detection**: Accounts needing re-authentication are marked and skipped
 - **Prompt Caching Support**: Session IDs derived from conversation enable cache hits across turns
@@ -340,13 +344,14 @@ The proxy includes a built-in, modern web interface for real-time monitoring and
 - **Real-time Dashboard**: Monitor request volume, active accounts, model health, and subscription tier distribution.
 - **Visual Model Quota**: Track per-model usage and next reset times with color-coded progress indicators.
 - **Account Management**: Add/remove Google accounts via OAuth, view subscription tiers (Free/Pro/Ultra) and quota status at a glance.
+- **Manual OAuth Mode**: Add accounts on headless servers by copying the OAuth URL and pasting the authorization code.
 - **Claude CLI Configuration**: Edit your `~/.claude/settings.json` directly from the browser.
 - **Persistent History**: Tracks request volume by model family for 30 days, persisting across server restarts.
 - **Time Range Filtering**: Analyze usage trends over 1H, 6H, 24H, 7D, or All Time periods.
 - **Smart Analysis**: Auto-select top 5 most used models or toggle between Family/Model views.
 - **Live Logs**: Stream server logs with level-based filtering and search.
 - **Advanced Tuning**: Configure retries, timeouts, and debug mode on the fly.
- **Bilingual Interface**: Full support for English and Chinese (switch via Settings).
+- **Multi-language Interface**: Full support for English, Chinese (中文), Indonesian (Bahasa), and Portuguese (PT-BR).

 ---

@@ -360,9 +365,11 @@ While most users can use the default settings, you can tune the proxy behavior v
 - **WebUI Password**: Secure your dashboard with `WEBUI_PASSWORD` env var or in config.
 - **Custom Port**: Change the default `8080` port.
 - **Retry Logic**: Configure `maxRetries`, `retryBaseMs`, and `retryMaxMs`.
+- **Rate Limit Handling**: Comprehensive rate limit detection from headers and error messages with intelligent retry-after parsing.
 - **Load Balancing**: Adjust `defaultCooldownMs` and `maxWaitBeforeErrorMs`.
 - **Persistence**: Enable `persistTokenCache` to save OAuth sessions across restarts.
 - **Max Accounts**: Set `maxAccounts` (1-100) to limit the number of Google accounts. Default: 10.
+- **Endpoint Fallback**: Automatic 403/404 endpoint fallback for API compatibility.

 Refer to `config.example.json` for a complete list of fields and documentation.

@@ -421,6 +428,7 @@ npm run test:interleaved   # Interleaved thinking
 npm run test:images        # Image processing
 npm run test:caching       # Prompt caching
 npm run test:strategies    # Account selection strategies
+npm run test:cache-control # Cache control field stripping
 ```

 ---