Learn to use Linux perf for hardware performance monitoring. Measure CPU cycles, cache misses, and more with practical examples for sysadmins and devs.
You’re staring at a server that’s slowing to a crawl, and top just shows high CPU usage. You need to know why the CPU is struggling. Is it stalled on memory? Spinning through too many instructions per cycle? Or maybe the cache is thrashing and wasting time? That’s where Linux perf comes in. It’s the built‑in performance monitoring toolkit that talks directly to your CPU’s hardware performance counters. No extra daemons, no expensive licenses. Just you, the kernel, and raw hardware data.
Linux perf gives you direct access to CPU hardware counters for monitoring cycles, instructions, cache misses, branch mispredictions, and more. This guide covers installation, essential commands like perf stat and perf record, decoding common hardware events, common mistakes, and advanced custom events. By the end, you’ll know exactly how to diagnose a performance bottleneck using hardware data.
What is Linux Perf and Why Use It for Hardware Monitoring?
Perf (short for “performance events”) is the Linux kernel’s official profiling and monitoring subsystem. It exposes hardware performance monitoring counters (PMCs) built into every modern CPU. These counters track low‑level silicon events: CPU cycles, instructions retired, L1 cache misses, branch mispredictions, and dozens more. Unlike software profilers that sample based on wall clock time, perf samples right at the CPU level. That means you see what the chip actually does, not what the kernel scheduler thinks it does.
For system administrators and developers, that difference matters. You might have a process that uses 90% CPU but actually spends most of its time waiting for memory. Perf reveals that directly. In 2026, with CPUs featuring 20+ cores and deep cache hierarchies, hardware monitoring isn’t a luxury. It’s the only way to understand what your processor is really doing.
Installing and Preparing Perf in 2026
Most Linux distributions ship perf as a separate package. Here’s how to get it running on common distros.
-
Check if perf is already installed
Runperf --version. If you see a version number (e.g.,perf version 6.8.x), you’re good. Otherwise, continue. -
Install the linux‑tools package for your kernel
On Ubuntu or Debian:
bash
sudo apt update && sudo apt install linux-tools-common linux-tools-$(uname -r)
On RHEL, CentOS, or Fedora:
bash
sudo dnf install perf
On openSUSE:
bash
sudo zypper install perf -
Give yourself access to hardware counters
Hardware events require two permissions: either run as root, or set/proc/sys/kernel/perf_event_paranoidto-1(disables all restrictions) or0(allows count but not sampling without root). For most monitoring,paranoid = 0works well. To set it temporarily:
bash
sudo sysctl kernel.perf_event_paranoid=0 -
Verify your CPU hardware events
List all available events with:
bash
perf list
You’ll see categories likehardware,cache,software, andtracepoint. Hardware events are the ones we care about.
The Essential Perf Commands for Hardware Monitoring
Perf includes several subcommands. For hardware monitoring, focus on these:
perf stat– Count events globally or per‑process, showing totals and averages. Great for quick benchmarks.perf record– Sample events over time and save a data file (perf.data). Use it for deeper analysis.perf report– Display the sampled data fromperf recordin an interactive browser.perf top– Live real‑time view of the hottest functions, similar totopbut using hardware events.
You’ll typically start with perf stat to get an immediate overview, then move to perf record for detailed sampling.
Decoding Hardware Events: A Practical Table
When you run perf stat without any event options, it measures a default set of hardware events. But you can specify exactly which events matter. Here’s a table of the most useful hardware events and what they tell you.
| Event Name | Hardware Counter | What It Measures | Common Use Case |
|---|---|---|---|
cycles |
CPU Cycle Counter | Number of processor clock cycles | Total CPU time consumed by a program |
instructions |
Instructions Retired | Number of instructions completed | Compare instructions per cycle (IPC) |
cache-references |
L1/L2/L3 Reference | Memory accesses that reach cache | High values indicate heavy memory traffic |
cache-misses |
Cache Miss Counter | Memory accesses that miss cache | Cache miss ratio: miss / references |
branch-instructions |
Branch Instructions | Number of branch instructions | High branch count can slow pipelines |
branch-misses |
Branch Mispredictions | Number of mispredicted branches | Miss rate over 5% hurts performance |
stalled-cycles-frontend |
Frontend Stalls | CPU waiting for instruction fetch | Indicates frontend bottlenecks |
stalled-cycles-backend |
Backend Stalls | CPU waiting for data/compute | Indicates memory or execution unit bottlenecks |
How to use one: perf stat -e cycles,instructions,cache-misses ./myapp.
Measuring Real-World Performance: A Step-by-Step Example
Let’s say you have a data processing script that feels slow. You suspect CPU is busy, but you don’t know why. Here’s how to use perf to find out.
-
Identify your hardware event of interest
Runperf list hardwareto see all hardware event names. For a first look, use the default set. Add-dtoperf statfor more detail. -
Run
perf staton your program
bash
perf stat -d ./process_data
Output will show cycles, instructions, cache misses, and branch statistics. Look at the instructions per cycle (IPC). If IPC is below 0.5 on a modern Intel/AMD CPU, you are definitely stalling. -
If IPC is low, sample with
perf recordto find the stall
bash
perf record -e cycles -c 10000 ./process_data
The-c 10000samples every 10,000 cycles (adjust based on runtime). Then:
bash
perf report
The report shows which functions consume the most cycles. Look for functions with high CPU usage but low IPC. That’s your bottleneck.
Expert tip: Don’t fixate on a single run. Always run your measurement several times with
perf stat -r 3to see variation. Hardware counters are precise, but background noise can skew a single reading. Use-r 5for production benchmarks.
Common Mistakes and How to Avoid Them
Even experienced sysadmins trip over these pitfalls.
- Running perf on a busy server without pinning – If your server runs dozens of services, perf results will show aggregate activity, not your target process. Always use
perf stat <pid>orperf record -p <pid>for process‑specific data. - Forgetting to disable frequency scaling – CPU frequency scaling (like
ondemandorpowersave) changes the cycle rate, making cycle counts across runs incomparable. Set the governor toperformance:
bash
sudo cpupower frequency-set -g performance
Run this before any serious measurement. - Ignoring multiplexing – If you request more events than hardware counters exist, perf automatically multiplexes them. Results get scaled, but accuracy drops. Limit your event set to 4–6 for reliable numbers.
- Thinking IPC of 1.0 is good – IPC varies by architecture. On Intel Skylake, 1.0 is fine for code with moderate memory access. On AMD Zen 4, higher IPC is expected. Compare against your CPU’s typical peak (often 2–4 for integer code).
Advanced Hardware Monitoring with Custom Events
Beyond the standard named events, you can access raw hardware events using hex codes. This is useful for events not listed in perf list, like specific CPU model‑specific counters.
For example, on Intel platforms, to count all L2 cache misses (not just the generic cache-misses which may map to L3 on newer chips), you can use:
perf stat -e rFF04 ./app
The hex code FF04 means: umask=0xFF (all events) and event=0x04 (L2 cache miss). You’ll need to consult your CPU’s Software Developer’s Manual (SDM) for the exact codes.
Another advanced technique is multiplexing and grouping. Use curly braces to group events that must be measured simultaneously (because they share a counter):
perf stat -e '{cycles,instructions},{cache-references,cache-misses}' ./app
This ensures both pairs are measured on the same hardware counter group, giving accurate IPC and cache miss ratios.
If you’re working with specialized hardware like FireWire devices, understanding low‑level CPU interaction can help. For instance, optimizing Linux kernel modules for enhanced hardware compatibility often relies on identifying driver‑level cache misses or stalls that perf can reveal.
Tying Perf Into Your System Administration Workflow
Perf doesn’t have to be a one‑time diagnostic tool. You can integrate it into regular monitoring. For example:
- Use
perf stat --all-cpusto produce a system‑wide summary every 60 seconds via cron. - Pair perf with
bpftraceorebpffor deeper dynamic tracing. - For persistent hardware monitoring, consider
perf scriptto dump raw counter data to a log file, then feed it to your existing metrics system (Prometheus, Grafana, etc.).
When your server is connected to external hardware like FireWire audio interfaces or video capture cards, hardware performance can drop due to contention on PCIe buses. Tools like perf stat -e bus-cycles (if available on your CPU) can reveal bandwidth saturation. For a practical example, see mastering firewire device management on linux systems.
Start Monitoring Like a Pro: Your First Hardware Check in 2026
You now have the fundamentals. Pick a program you suspect has a performance issue. Run perf stat on it. Look at the IPC and cache miss rate. If IPC is below 0.5, use perf record to find the culprit function. Keep an eye on event multiplexing and frequency scaling, and double‑check your numbers with multiple runs. Hardware monitoring isn’t just for kernel hackers anymore. It’s a daily tool for any sysadmin who wants their servers to run faster and cooler.
Fire up your terminal. Run perf list. Choose one event that matters today. You’ll be surprised what you find.




