How to Use Linux perf for Hardware Performance Monitoring in 2026
Linux Systems

How to Use Linux perf for Hardware Performance Monitoring in 2026

How to Use Linux Perf for Hardware Monitoring in 2026
Learn to use Linux perf for hardware performance monitoring. Measure CPU cycles, cache misses, and more with practical examples for sysadmins and devs.

You’re staring at a server that’s slowing to a crawl, and top just shows high CPU usage. You need to know why the CPU is struggling. Is it stalled on memory? Spinning through too many instructions per cycle? Or maybe the cache is thrashing and wasting time? That’s where Linux perf comes in. It’s the built‑in performance monitoring toolkit that talks directly to your CPU’s hardware performance counters. No extra daemons, no expensive licenses. Just you, the kernel, and raw hardware data.

Key Takeaway

Linux perf gives you direct access to CPU hardware counters for monitoring cycles, instructions, cache misses, branch mispredictions, and more. This guide covers installation, essential commands like perf stat and perf record, decoding common hardware events, common mistakes, and advanced custom events. By the end, you’ll know exactly how to diagnose a performance bottleneck using hardware data.

What is Linux Perf and Why Use It for Hardware Monitoring?

Perf (short for “performance events”) is the Linux kernel’s official profiling and monitoring subsystem. It exposes hardware performance monitoring counters (PMCs) built into every modern CPU. These counters track low‑level silicon events: CPU cycles, instructions retired, L1 cache misses, branch mispredictions, and dozens more. Unlike software profilers that sample based on wall clock time, perf samples right at the CPU level. That means you see what the chip actually does, not what the kernel scheduler thinks it does.

For system administrators and developers, that difference matters. You might have a process that uses 90% CPU but actually spends most of its time waiting for memory. Perf reveals that directly. In 2026, with CPUs featuring 20+ cores and deep cache hierarchies, hardware monitoring isn’t a luxury. It’s the only way to understand what your processor is really doing.

Installing and Preparing Perf in 2026

Most Linux distributions ship perf as a separate package. Here’s how to get it running on common distros.

  1. Check if perf is already installed
    Run perf --version. If you see a version number (e.g., perf version 6.8.x), you’re good. Otherwise, continue.

  2. Install the linux‑tools package for your kernel
    On Ubuntu or Debian:
    bash
    sudo apt update && sudo apt install linux-tools-common linux-tools-$(uname -r)

    On RHEL, CentOS, or Fedora:
    bash
    sudo dnf install perf

    On openSUSE:
    bash
    sudo zypper install perf

  3. Give yourself access to hardware counters
    Hardware events require two permissions: either run as root, or set /proc/sys/kernel/perf_event_paranoid to -1 (disables all restrictions) or 0 (allows count but not sampling without root). For most monitoring, paranoid = 0 works well. To set it temporarily:
    bash
    sudo sysctl kernel.perf_event_paranoid=0

  4. Verify your CPU hardware events
    List all available events with:
    bash
    perf list

    You’ll see categories like hardware, cache, software, and tracepoint. Hardware events are the ones we care about.

The Essential Perf Commands for Hardware Monitoring

Perf includes several subcommands. For hardware monitoring, focus on these:

  • perf stat – Count events globally or per‑process, showing totals and averages. Great for quick benchmarks.
  • perf record – Sample events over time and save a data file (perf.data). Use it for deeper analysis.
  • perf report – Display the sampled data from perf record in an interactive browser.
  • perf top – Live real‑time view of the hottest functions, similar to top but using hardware events.

You’ll typically start with perf stat to get an immediate overview, then move to perf record for detailed sampling.

Decoding Hardware Events: A Practical Table

When you run perf stat without any event options, it measures a default set of hardware events. But you can specify exactly which events matter. Here’s a table of the most useful hardware events and what they tell you.

Event Name Hardware Counter What It Measures Common Use Case
cycles CPU Cycle Counter Number of processor clock cycles Total CPU time consumed by a program
instructions Instructions Retired Number of instructions completed Compare instructions per cycle (IPC)
cache-references L1/L2/L3 Reference Memory accesses that reach cache High values indicate heavy memory traffic
cache-misses Cache Miss Counter Memory accesses that miss cache Cache miss ratio: miss / references
branch-instructions Branch Instructions Number of branch instructions High branch count can slow pipelines
branch-misses Branch Mispredictions Number of mispredicted branches Miss rate over 5% hurts performance
stalled-cycles-frontend Frontend Stalls CPU waiting for instruction fetch Indicates frontend bottlenecks
stalled-cycles-backend Backend Stalls CPU waiting for data/compute Indicates memory or execution unit bottlenecks

How to use one: perf stat -e cycles,instructions,cache-misses ./myapp.

Measuring Real-World Performance: A Step-by-Step Example

Let’s say you have a data processing script that feels slow. You suspect CPU is busy, but you don’t know why. Here’s how to use perf to find out.

  1. Identify your hardware event of interest
    Run perf list hardware to see all hardware event names. For a first look, use the default set. Add -d to perf stat for more detail.

  2. Run perf stat on your program
    bash
    perf stat -d ./process_data

    Output will show cycles, instructions, cache misses, and branch statistics. Look at the instructions per cycle (IPC). If IPC is below 0.5 on a modern Intel/AMD CPU, you are definitely stalling.

  3. If IPC is low, sample with perf record to find the stall
    bash
    perf record -e cycles -c 10000 ./process_data

    The -c 10000 samples every 10,000 cycles (adjust based on runtime). Then:
    bash
    perf report

    The report shows which functions consume the most cycles. Look for functions with high CPU usage but low IPC. That’s your bottleneck.

Expert tip: Don’t fixate on a single run. Always run your measurement several times with perf stat -r 3 to see variation. Hardware counters are precise, but background noise can skew a single reading. Use -r 5 for production benchmarks.

Common Mistakes and How to Avoid Them

Even experienced sysadmins trip over these pitfalls.

  • Running perf on a busy server without pinning – If your server runs dozens of services, perf results will show aggregate activity, not your target process. Always use perf stat <pid> or perf record -p <pid> for process‑specific data.
  • Forgetting to disable frequency scaling – CPU frequency scaling (like ondemand or powersave) changes the cycle rate, making cycle counts across runs incomparable. Set the governor to performance:
    bash
    sudo cpupower frequency-set -g performance

    Run this before any serious measurement.
  • Ignoring multiplexing – If you request more events than hardware counters exist, perf automatically multiplexes them. Results get scaled, but accuracy drops. Limit your event set to 4–6 for reliable numbers.
  • Thinking IPC of 1.0 is good – IPC varies by architecture. On Intel Skylake, 1.0 is fine for code with moderate memory access. On AMD Zen 4, higher IPC is expected. Compare against your CPU’s typical peak (often 2–4 for integer code).

Advanced Hardware Monitoring with Custom Events

Beyond the standard named events, you can access raw hardware events using hex codes. This is useful for events not listed in perf list, like specific CPU model‑specific counters.

For example, on Intel platforms, to count all L2 cache misses (not just the generic cache-misses which may map to L3 on newer chips), you can use:

perf stat -e rFF04 ./app

The hex code FF04 means: umask=0xFF (all events) and event=0x04 (L2 cache miss). You’ll need to consult your CPU’s Software Developer’s Manual (SDM) for the exact codes.

Another advanced technique is multiplexing and grouping. Use curly braces to group events that must be measured simultaneously (because they share a counter):

perf stat -e '{cycles,instructions},{cache-references,cache-misses}' ./app

This ensures both pairs are measured on the same hardware counter group, giving accurate IPC and cache miss ratios.

If you’re working with specialized hardware like FireWire devices, understanding low‑level CPU interaction can help. For instance, optimizing Linux kernel modules for enhanced hardware compatibility often relies on identifying driver‑level cache misses or stalls that perf can reveal.

Tying Perf Into Your System Administration Workflow

Perf doesn’t have to be a one‑time diagnostic tool. You can integrate it into regular monitoring. For example:

  • Use perf stat --all-cpus to produce a system‑wide summary every 60 seconds via cron.
  • Pair perf with bpftrace or ebpf for deeper dynamic tracing.
  • For persistent hardware monitoring, consider perf script to dump raw counter data to a log file, then feed it to your existing metrics system (Prometheus, Grafana, etc.).

When your server is connected to external hardware like FireWire audio interfaces or video capture cards, hardware performance can drop due to contention on PCIe buses. Tools like perf stat -e bus-cycles (if available on your CPU) can reveal bandwidth saturation. For a practical example, see mastering firewire device management on linux systems.

Start Monitoring Like a Pro: Your First Hardware Check in 2026

You now have the fundamentals. Pick a program you suspect has a performance issue. Run perf stat on it. Look at the IPC and cache miss rate. If IPC is below 0.5, use perf record to find the culprit function. Keep an eye on event multiplexing and frequency scaling, and double‑check your numbers with multiple runs. Hardware monitoring isn’t just for kernel hackers anymore. It’s a daily tool for any sysadmin who wants their servers to run faster and cooler.

Fire up your terminal. Run perf list. Choose one event that matters today. You’ll be surprised what you find.

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *