feat: use official tokenizers for 99.99% accuracy
Replace gpt-tokenizer with model-specific official tokenizers: - Claude models: @anthropic-ai/tokenizer (official Anthropic tokenizer) - Gemini models: @lenml/tokenizer-gemini (GemmaTokenizer) Changes: - Add @anthropic-ai/tokenizer and @lenml/tokenizer-gemini dependencies - Remove gpt-tokenizer dependency - Update count-tokens.js with model-aware tokenization - Use getModelFamily() to select appropriate tokenizer - Lazy-load Gemini tokenizer (138MB) on first use - Default to local estimation for all content types (no API calls) Tested with all supported models: - claude-sonnet-4-5, claude-opus-4-5-thinking, claude-sonnet-4-5-thinking - gemini-3-flash, gemini-3-pro-low, gemini-3-pro-high
This commit is contained in:
@@ -58,11 +58,12 @@
|
||||
"node": ">=18.0.0"
|
||||
},
|
||||
"dependencies": {
|
||||
"@anthropic-ai/tokenizer": "^0.0.4",
|
||||
"@lenml/tokenizer-gemini": "^3.7.2",
|
||||
"async-mutex": "^0.5.0",
|
||||
"better-sqlite3": "^12.5.0",
|
||||
"cors": "^2.8.5",
|
||||
"express": "^4.18.2",
|
||||
"gpt-tokenizer": "^2.5.0"
|
||||
"express": "^4.18.2"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@tailwindcss/forms": "^0.5.7",
|
||||
|
||||
Reference in New Issue
Block a user