← All repositories

ggml-orgllama.cpp

95,400 stars14,976 forksC++mit1 view

Llama.cpp

Features

  • Hardware Abstraction LayersSupport for multiple hardware-accelerated backends to optimize model inference across diverse CPU and GPU architectures.
  • Text-Only Inference EnginesA high-performance inference engine designed for running text-based language models locally on consumer hardware.
  • Multimodal Inference EnginesAn inference engine capable of local execution for vision-language models that process both text and image inputs.
  • Inference API ServersA lightweight HTTP server providing endpoints for chat completion, embeddings, and reranking that adheres to standard API specifications.
  • Model Quantization ToolsTools for converting and quantizing models into compressed formats to reduce memory usage and improve inference performance.
  • Command Line Inference InterfacesA command-line interface for executing models, managing chat templates, and configuring inference parameters in interactive or batch modes.