fix: strip cache_control fields from content blocks (#189)
Claude Code CLI sends cache_control on text, thinking, tool_use, and tool_result blocks for prompt caching. Cloud Code API rejects these with "Extra inputs are not permitted". - Add cleanCacheControl() to proactively strip cache_control at pipeline entry - Add sanitizeTextBlock() and sanitizeToolUseBlock() for defense-in-depth - Update reorderAssistantContent() to use block sanitizers - Add test-cache-control.cjs with multi-model test coverage - Update frontend dashboard tests to match current UI design - Update strategy tests to match v2.4.0 fallback behavior - Update CLAUDE.md and README.md with recent features Inspired by Antigravity-Manager's clean_cache_control_from_messages() pattern. Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
12
README.md
12
README.md
@@ -84,6 +84,8 @@ Choose one of the following methods to authorize the proxy:
|
||||
2. Navigate to the **Accounts** tab and click **Add Account**.
|
||||
3. Complete the Google OAuth authorization in the popup window.
|
||||
|
||||
> **Headless/Remote Servers**: If running on a server without a browser, the WebUI supports a "Manual Authorization" mode. After clicking "Add Account", you can copy the OAuth URL, complete authorization on your local machine, and paste the authorization code back.
|
||||
|
||||
#### **Method B: CLI (Desktop or Headless)**
|
||||
|
||||
If you prefer the terminal or are on a remote server:
|
||||
@@ -280,7 +282,7 @@ Choose a strategy based on your needs:
|
||||
|
||||
| Strategy | Best For | Description |
|
||||
| --- | --- | --- |
|
||||
| **Hybrid** (Default) | Most users | Smart selection combining health score, token bucket rate limiting, and LRU freshness |
|
||||
| **Hybrid** (Default) | Most users | Smart selection combining health score, token bucket rate limiting, quota awareness, and LRU freshness |
|
||||
| **Sticky** | Prompt caching | Stays on the same account to maximize cache hits, switches only when rate-limited |
|
||||
| **Round-Robin** | Even distribution | Cycles through accounts sequentially for balanced load |
|
||||
|
||||
@@ -298,6 +300,8 @@ antigravity-claude-proxy start --strategy=round-robin # Load-balanced
|
||||
|
||||
- **Health Score Tracking**: Accounts earn points for successful requests and lose points for failures/rate-limits
|
||||
- **Token Bucket Rate Limiting**: Client-side throttling with regenerating tokens (50 max, 6/minute)
|
||||
- **Quota Awareness**: Accounts with critical quota (<5%) are deprioritized; exhausted accounts trigger emergency fallback
|
||||
- **Emergency Fallback**: When all accounts appear exhausted, bypasses checks with throttle delays (250-500ms)
|
||||
- **Automatic Cooldown**: Rate-limited accounts recover automatically after reset time expires
|
||||
- **Invalid Account Detection**: Accounts needing re-authentication are marked and skipped
|
||||
- **Prompt Caching Support**: Session IDs derived from conversation enable cache hits across turns
|
||||
@@ -340,13 +344,14 @@ The proxy includes a built-in, modern web interface for real-time monitoring and
|
||||
- **Real-time Dashboard**: Monitor request volume, active accounts, model health, and subscription tier distribution.
|
||||
- **Visual Model Quota**: Track per-model usage and next reset times with color-coded progress indicators.
|
||||
- **Account Management**: Add/remove Google accounts via OAuth, view subscription tiers (Free/Pro/Ultra) and quota status at a glance.
|
||||
- **Manual OAuth Mode**: Add accounts on headless servers by copying the OAuth URL and pasting the authorization code.
|
||||
- **Claude CLI Configuration**: Edit your `~/.claude/settings.json` directly from the browser.
|
||||
- **Persistent History**: Tracks request volume by model family for 30 days, persisting across server restarts.
|
||||
- **Time Range Filtering**: Analyze usage trends over 1H, 6H, 24H, 7D, or All Time periods.
|
||||
- **Smart Analysis**: Auto-select top 5 most used models or toggle between Family/Model views.
|
||||
- **Live Logs**: Stream server logs with level-based filtering and search.
|
||||
- **Advanced Tuning**: Configure retries, timeouts, and debug mode on the fly.
|
||||
- **Bilingual Interface**: Full support for English and Chinese (switch via Settings).
|
||||
- **Multi-language Interface**: Full support for English, Chinese (中文), Indonesian (Bahasa), and Portuguese (PT-BR).
|
||||
|
||||
---
|
||||
|
||||
@@ -360,9 +365,11 @@ While most users can use the default settings, you can tune the proxy behavior v
|
||||
- **WebUI Password**: Secure your dashboard with `WEBUI_PASSWORD` env var or in config.
|
||||
- **Custom Port**: Change the default `8080` port.
|
||||
- **Retry Logic**: Configure `maxRetries`, `retryBaseMs`, and `retryMaxMs`.
|
||||
- **Rate Limit Handling**: Comprehensive rate limit detection from headers and error messages with intelligent retry-after parsing.
|
||||
- **Load Balancing**: Adjust `defaultCooldownMs` and `maxWaitBeforeErrorMs`.
|
||||
- **Persistence**: Enable `persistTokenCache` to save OAuth sessions across restarts.
|
||||
- **Max Accounts**: Set `maxAccounts` (1-100) to limit the number of Google accounts. Default: 10.
|
||||
- **Endpoint Fallback**: Automatic 403/404 endpoint fallback for API compatibility.
|
||||
|
||||
Refer to `config.example.json` for a complete list of fields and documentation.
|
||||
|
||||
@@ -421,6 +428,7 @@ npm run test:interleaved # Interleaved thinking
|
||||
npm run test:images # Image processing
|
||||
npm run test:caching # Prompt caching
|
||||
npm run test:strategies # Account selection strategies
|
||||
npm run test:cache-control # Cache control field stripping
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user