🦙 Llama-Bench Command Generator 🦙

A GUI for the llama-bench LLM performance benchmarking CLI tool within llama.cpp.
This Web UI allows you to build, preview, and export massive Cartesian configuration sweeps before executing them on your hardware. It mirrors the internal C++ logic of llama-bench.cpp to generate ready to run benchmarking scripts. Contains extended functionality via Bash loop generation for scalar process parameters that llama-bench cannot parse as lists.
Range syntax: first-last, first-last+step, first-last*mult, val1,val2,val3. Booleans support 0,1 sweep.
1
Configurations
1
Measurements
1 model × 1 config = 1 llama-bench invocation
Repetitions multiply internal measurements, not command invocations.

Configuration Preview

#

Import Command

Paste an existing llama-bench command to populate the form. Supports quoted paths and all flags.

📖 Notes & Getting Started with llama-bench

What is llama-bench? A tool withinllama.cpp that measures how fast a model runs on your hardware. It reports prompt processing t/s (how fast it reads input) and generation t/s (how fast it writes output).

Key rules & interactions:

  • ctx-size must be ≥ n-prompt + n-gen or the benchmark is invalid.
  • batch-size should usually be ≥ ubatch-size.
  • -ngl 99 offloads all layers to GPU — use this unless you want CPU fallback.
  • -fa 1 (Flash Attention) is almost always faster on modern GPUs.
  • No Conflicts for Tokens: Setting values for Prompt Tokens (-p), Generate Tokens (-n), and Combo (-pg) do not override each other. Instead, each field adds a separate set of benchmarks to run.
  • Conflicts: If you use both --threads and --cpu-mask, the thread count determines the size of the threadpool, but the CPU mask restricts which cores they run on, potentially causing severe bottlenecks if they mismatch. Additionally, when using --fit-target, the context size is automatically increased to the sum of n-prompt, n-gen, and n-depth if the provided context is smaller.

Formula Examples (Range Syntax):

  • 4096-131072*2: Multiplies by 2 each step (4096, 8192, 16384, ..., 131072).
  • 4-16+4: Adds 4 each step (4, 8, 12, 16).
  • 128,512,1024: Specific comma-separated list of values.
  • 0-100: Every number within a range (0, 1, 2, 3, ..., 98, 99, 100).

Extended Non-Native Functionality (Bash Loops):

  • This command generator has additional functionality to create longer running extensive benchmarks while keeping native llama-bench sweeps inside the tool CLI, using Bash loops only for scalar process parameters that llama-bench cannot parse as lists.
  • Auto-Generated Bash Loops: If you apply range syntax to a scalar variable (e.g., setting delay to 0-60+15), the UI instantly detects this limitation and upgrades your output from a single command into an OS-level Bash script.
  • It isolates the scalar variables into Bash arrays (e.g., delay_list=('0' '15' '30')) and wraps the native llama-bench execution in nested for loops.
  • The multi-value scalar variables that trigger Bash loops are: reps, delay, numa, and prio.
  • JSON Restriction: Selecting .json as the llama-bench output format is not supported with Bash looping (the UI will automatically convert this to .jsonl to ensure your appended data remains parseable).

Paste an existing command in Import Command to tweak it. Use Configuration Preview to see every run before executing.

Models

Add one or more GGUF paths. llama-bench will compare them.

Execution

Must be non-negative integer(s) <= 2147483647. Range syntax: 4-16+4, 4,8,12
Must be non-negative integer(s) <= 2147483647. Range syntax: 0-60+15, 0,5,10

Tokens

Must be non-negative integer(s) ≤ 2147483647. Range syntax: 128-1024+128, 128,512,1024
Must be non-negative integer(s) ≤ 2147483647. Range syntax: 128-512*2, 128,256,512
Enter an even number of non-negative pp,tg values <= 2147483647. Ranges may pair with equal-length ranges or one static value.
Must be non-negative integer(s) ≤ 2147483647
Must be non-negative integer(s) ≤ 2147483647
Must be non-negative integer(s) ≤ 2147483647

GPU / Offload

Must be non-negative integer(s) ≤ 2147483647
Must be non-negative integer(s) ≤ 2147483647
Must be non-negative integer(s) ≤ 2147483647

CPU / Threads

Must be non-negative integer(s) ≤ 2147483647
Must be non-negative integer(s) ≤ 2147483647

Memory / Cache

Fit / Autotune

Must be non-negative integer(s) <= 18446744073709551615. Range syntax: 100-500*2, 100,200,400
Must be non-negative integer(s) <= 4294967295. Range syntax: 4096-32768*2, 4096,8192,16384

Bash Output Script

llama-bench