imonoonoko.BitLlama
0.16.0

Pure Rust LLM inference engine with 1.58-bit ternary support and Test-Time Training
BitLlama is a Pure Rust LLM inference engine featuring 1.58-bit ternary quantization,
Test-Time Training (TTT), Soul learning system, MCP server/client, and private RAG.
Supports Llama, Gemma, Mistral, Qwen, and BitNet models.
OpenAI-compatible API server included.
Download Links For Version 0.16.0
Download Links For Version 0.15.0