Guide for llama.cpp, the C/C++ LLM inference framework by ggml-org. Covers the C API (llama.h), GGUF format, quantization (Q4_K_M, Q8_0, IQ4_XS), CMake builds, GPU backends (CUDA, Vulkan, Metal, ROCm), HTTP server with OpenAI-compatible API, embeddings, grammar constraints, function calling, LoRA, speculative decoding, multimodal, and UE5 integration. Use when: llama.cpp, GGUF models, local LLM in
The exact command may vary by repository. Check the README on GitHub.
For the skill author
Drop this on your repo README
Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.
[](https://www.skillteca.com.br/skills/llama-cpp-maystudios?utm_source=badge&utm_medium=readme&utm_campaign=badge)