lm-evaluation-harness - LLM Benchmarking
Quick start
lm-evaluation-harness evaluates LLMs across 60+ academic benchmarks using standardized prompts and metrics.
Installation:
pip install lm-eval
Evaluate any HuggingFace model:
lm_eval --model hf \
--model_args pretrained=meta-llama/Llama-2-7b-hf \
--tasks mmlu,gsm8k,hellaswag \
--device cuda:0 \
--batch_size 8
View available tasks:
lm_eval --tasks list
Common workflows
Wo
[Description truncada. Veja o README completo no GitHub.]