Llama.cpp
llama.cpp python library is a simple Python bindings for
@ggerganov
llama.cpp.This package provides:
- Low-level access to C API via ctypes interface.
- High-level Python API for text completion
OpenAI
-like APILangChain
compatibilityLlamaIndex
compatibility- OpenAI compatible web server
- Local Copilot replacement
- Function Calling support
- Vision API support
- Multiple Models
Overviewโ
Integration detailsโ
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
ChatLlamaCpp | langchain-community | โ | โ | โ |
Model featuresโ
Tool calling | Structured output | JSON mode | Image input | Audio input | Video input | Token-level streaming | Native async | Token usage | Logprobs |
---|---|---|---|---|---|---|---|---|---|
โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
Setupโ
To get started and use all the features shown below, we recommend using a model that has been fine-tuned for tool-calling.
We will use Hermes-2-Pro-Llama-3-8B-GGUF from NousResearch.
Hermes 2 Pro is an upgraded version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling
See our guides on local models to go deeper: