llama-bench
LLM performance benchmarking CLI tool within llama.cpp.
llama-bench.cpp to generate ready to run benchmarking scripts. Contains extended functionality via Bash loop generation for scalar process parameters that llama-bench cannot parse as lists. first-last, first-last+step,
first-last*mult,
val1,val2,val3. Booleans support 0,1 sweep.
| # |
|---|
What is llama-bench? A tool withinllama.cpp that
measures how fast a model runs
on your hardware. It reports prompt processing t/s (how fast it reads input) and
generation t/s (how fast it writes output).
Key rules & interactions:
ctx-size must be ≥ n-prompt + n-gen or the benchmark is invalid.batch-size should usually be ≥ ubatch-size.-ngl 99 offloads all layers to GPU — use this unless you want CPU fallback.-fa 1 (Flash Attention) is almost always faster on modern GPUs.-p), Generate
Tokens (-n), and Combo (-pg) do not override each other. Instead,
each field adds a separate set of benchmarks to run.--threads and --cpu-mask, the thread
count determines the size of the threadpool, but the CPU mask restricts which cores they run on, potentially
causing severe bottlenecks if they mismatch. Additionally, when using
--fit-target, the context
size is automatically increased to the sum of n-prompt, n-gen, and
n-depth if the provided context is smaller.
Formula Examples (Range Syntax):
4096-131072*2: Multiplies by 2 each step (4096, 8192, 16384, ..., 131072).4-16+4: Adds 4 each step (4, 8, 12, 16).128,512,1024: Specific comma-separated list of values.0-100: Every number within a range (0, 1, 2, 3, ..., 98, 99, 100).Extended Non-Native Functionality (Bash Loops):
llama-bench sweeps inside the tool CLI, using Bash loops only for scalar process parameters that llama-bench cannot parse as lists.0-60+15), the UI instantly detects this limitation and upgrades your output from a single command into an OS-level Bash script.delay_list=('0' '15' '30')) and wraps the native llama-bench execution in nested for loops.reps, delay, numa, and prio..json as the llama-bench output format is not supported with Bash looping (the UI will automatically convert this to .jsonl to ensure your appended data remains parseable).Paste an existing command in Import Command to tweak it. Use Configuration Preview to see every run before executing.
llama-bench