feat: add prompt caching, sticky account selection, and non-thinking model

- Implement sticky account selection for prompt cache continuity
- Derive stable session ID from first user message (SHA256 hash)
- Return cache_read_input_tokens in usage metadata
- Add claude-sonnet-4-5 model without thinking
- Remove DEFAULT_THINKING_BUDGET (let API use its default)
- Add prompt caching test
- Update README and CLAUDE.md documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Badri Narayanan S
2025-12-25 13:26:48 +05:30
parent 943a4dcb20
commit 01cda835d9
10 changed files with 464 additions and 80 deletions

View File

@@ -35,6 +35,7 @@ npm run test:multiturn # Multi-turn with tools
npm run test:streaming # Streaming SSE events
npm run test:interleaved # Interleaved thinking
npm run test:images # Image processing
npm run test:caching # Prompt caching
```
## Architecture
@@ -57,11 +58,17 @@ Claude Code CLI → Express Server (server.js) → CloudCode Client → Antigrav
- **src/utils/helpers.js**: Shared utility functions (`formatDuration`, `sleep`)
**Multi-Account Load Balancing:**
- Round-robin rotation across configured accounts
- Automatic switch on 429 rate limit errors
- Configurable cooldown period for rate-limited accounts
- Sticky account selection for prompt caching (stays on same account across turns)
- Automatic switch only when rate-limited for > 2 minutes
- Session ID derived from first user message hash for cache continuity
- Account state persisted to `~/.config/antigravity-proxy/accounts.json`
**Prompt Caching:**
- Cache is organization-scoped (requires same account + session ID)
- Session ID is SHA256 hash of first user message content (stable across turns)
- `cache_read_input_tokens` returned in usage metadata when cache hits
- Token calculation: `input_tokens = promptTokenCount - cachedContentTokenCount`
## Testing Notes
- Tests require the server to be running (`npm start` in separate terminal)
@@ -90,4 +97,4 @@ Claude Code CLI → Express Server (server.js) → CloudCode Client → Antigrav
## Maintenance
When making significant changes to the codebase (new modules, refactoring, architectural changes), update this CLAUDE.md file to keep documentation in sync.
When making significant changes to the codebase (new modules, refactoring, architectural changes), update this CLAUDE.md and the README.md file to keep documentation in sync.