Linux Systems

How to Use Linux perf for Hardware Performance Monitoring in 2026

caleb 10 June 2026

How to Use Linux Perf for Hardware Monitoring in 2026
Learn to use Linux perf for hardware performance monitoring. Measure CPU cycles, cache misses, and more with practical examples for sysadmins and devs.

You’re staring at a server that’s slowing to a crawl, and top just shows high CPU usage. You need to know why the CPU is struggling. Is it stalled on memory? Spinning through too many instructions per cycle? Or maybe the cache is thrashing and wasting time? That’s where Linux perf comes in. It’s the built‑in performance monitoring toolkit that talks directly to your CPU’s hardware performance counters. No extra daemons, no expensive licenses. Just you, the kernel, and raw hardware data.

Key Takeaway

Linux perf gives you direct access to CPU hardware counters for monitoring cycles, instructions, cache misses, branch mispredictions, and more. This guide covers installation, essential commands like perf stat and perf record, decoding common hardware events, common mistakes, and advanced custom events. By the end, you’ll know exactly how to diagnose a performance bottleneck using hardware data.

What is Linux Perf and Why Use It for Hardware Monitoring?

Perf (short for “performance events”) is the Linux kernel’s official profiling and monitoring subsystem. It exposes hardware performance monitoring counters (PMCs) built into every modern CPU. These counters track low‑level silicon events: CPU cycles, instructions retired, L1 cache misses, branch mispredictions, and dozens more. Unlike software profilers that sample based on wall clock time, perf samples right at the CPU level. That means you see what the chip actually does, not what the kernel scheduler thinks it does.

For system administrators and developers, that difference matters. You might have a process that uses 90% CPU but actually spends most of its time waiting for memory. Perf reveals that directly. In 2026, with CPUs featuring 20+ cores and deep cache hierarchies, hardware monitoring isn’t a luxury. It’s the only way to understand what your processor is really doing.

Installing and Preparing Perf in 2026

Most Linux distributions ship perf as a separate package. Here’s how to get it running on common distros.

Check if perf is already installed
Run perf --version. If you see a version number (e.g., perf version 6.8.x), you’re good. Otherwise, continue.
Install the linux‑tools package for your kernel
On Ubuntu or Debian:
bash sudo apt update && sudo apt install linux-tools-common linux-tools-$(uname -r)
On RHEL, CentOS, or Fedora:
bash sudo dnf install perf
On openSUSE:
bash sudo zypper install perf
Give yourself access to hardware counters
Hardware events require two permissions: either run as root, or set /proc/sys/kernel/perf_event_paranoid to -1 (disables all restrictions) or 0 (allows count but not sampling without root). For most monitoring, paranoid = 0 works well. To set it temporarily:
bash sudo sysctl kernel.perf_event_paranoid=0
Verify your CPU hardware events
List all available events with:
bash perf list
You’ll see categories like hardware, cache, software, and tracepoint. Hardware events are the ones we care about.

The Essential Perf Commands for Hardware Monitoring

Perf includes several subcommands. For hardware monitoring, focus on these:

perf stat – Count events globally or per‑process, showing totals and averages. Great for quick benchmarks.
perf record – Sample events over time and save a data file (perf.data). Use it for deeper analysis.
perf report – Display the sampled data from perf record in an interactive browser.
perf top – Live real‑time view of the hottest functions, similar to top but using hardware events.

You’ll typically start with perf stat to get an immediate overview, then move to perf record for detailed sampling.

Decoding Hardware Events: A Practical Table

When you run perf stat without any event options, it measures a default set of hardware events. But you can specify exactly which events matter. Here’s a table of the most useful hardware events and what they tell you.

Event Name	Hardware Counter	What It Measures	Common Use Case
`cycles`	CPU Cycle Counter	Number of processor clock cycles	Total CPU time consumed by a program
`instructions`	Instructions Retired	Number of instructions completed	Compare instructions per cycle (IPC)
`cache-references`	L1/L2/L3 Reference	Memory accesses that reach cache	High values indicate heavy memory traffic
`cache-misses`	Cache Miss Counter	Memory accesses that miss cache	Cache miss ratio: miss / references
`branch-instructions`	Branch Instructions	Number of branch instructions	High branch count can slow pipelines
`branch-misses`	Branch Mispredictions	Number of mispredicted branches	Miss rate over 5% hurts performance
`stalled-cycles-frontend`	Frontend Stalls	CPU waiting for instruction fetch	Indicates frontend bottlenecks
`stalled-cycles-backend`	Backend Stalls	CPU waiting for data/compute	Indicates memory or execution unit bottlenecks

How to use one: perf stat -e cycles,instructions,cache-misses ./myapp.

Measuring Real-World Performance: A Step-by-Step Example

Let’s say you have a data processing script that feels slow. You suspect CPU is busy, but you don’t know why. Here’s how to use perf to find out.

Identify your hardware event of interest
Run perf list hardware to see all hardware event names. For a first look, use the default set. Add -d to perf stat for more detail.
Run perf stat on your program
bash perf stat -d ./process_data
Output will show cycles, instructions, cache misses, and branch statistics. Look at the instructions per cycle (IPC). If IPC is below 0.5 on a modern Intel/AMD CPU, you are definitely stalling.
If IPC is low, sample with perf record to find the stall
bash perf record -e cycles -c 10000 ./process_data
The -c 10000 samples every 10,000 cycles (adjust based on runtime). Then:
bash perf report
The report shows which functions consume the most cycles. Look for functions with high CPU usage but low IPC. That’s your bottleneck.

Expert tip: Don’t fixate on a single run. Always run your measurement several times with perf stat -r 3 to see variation. Hardware counters are precise, but background noise can skew a single reading. Use -r 5 for production benchmarks.

Common Mistakes and How to Avoid Them

Even experienced sysadmins trip over these pitfalls.

Running perf on a busy server without pinning – If your server runs dozens of services, perf results will show aggregate activity, not your target process. Always use perf stat <pid> or perf record -p <pid> for process‑specific data.
Forgetting to disable frequency scaling – CPU frequency scaling (like ondemand or powersave) changes the cycle rate, making cycle counts across runs incomparable. Set the governor to performance:
bash sudo cpupower frequency-set -g performance
Run this before any serious measurement.
Ignoring multiplexing – If you request more events than hardware counters exist, perf automatically multiplexes them. Results get scaled, but accuracy drops. Limit your event set to 4–6 for reliable numbers.
Thinking IPC of 1.0 is good – IPC varies by architecture. On Intel Skylake, 1.0 is fine for code with moderate memory access. On AMD Zen 4, higher IPC is expected. Compare against your CPU’s typical peak (often 2–4 for integer code).

Advanced Hardware Monitoring with Custom Events

Beyond the standard named events, you can access raw hardware events using hex codes. This is useful for events not listed in perf list, like specific CPU model‑specific counters.

For example, on Intel platforms, to count all L2 cache misses (not just the generic cache-misses which may map to L3 on newer chips), you can use:

perf stat -e rFF04 ./app

The hex code FF04 means: umask=0xFF (all events) and event=0x04 (L2 cache miss). You’ll need to consult your CPU’s Software Developer’s Manual (SDM) for the exact codes.

Another advanced technique is multiplexing and grouping. Use curly braces to group events that must be measured simultaneously (because they share a counter):

perf stat -e '{cycles,instructions},{cache-references,cache-misses}' ./app

This ensures both pairs are measured on the same hardware counter group, giving accurate IPC and cache miss ratios.

If you’re working with specialized hardware like FireWire devices, understanding low‑level CPU interaction can help. For instance, optimizing Linux kernel modules for enhanced hardware compatibility often relies on identifying driver‑level cache misses or stalls that perf can reveal.

Tying Perf Into Your System Administration Workflow

Perf doesn’t have to be a one‑time diagnostic tool. You can integrate it into regular monitoring. For example:

Use perf stat --all-cpus to produce a system‑wide summary every 60 seconds via cron.
Pair perf with bpftrace or ebpf for deeper dynamic tracing.
For persistent hardware monitoring, consider perf script to dump raw counter data to a log file, then feed it to your existing metrics system (Prometheus, Grafana, etc.).

When your server is connected to external hardware like FireWire audio interfaces or video capture cards, hardware performance can drop due to contention on PCIe buses. Tools like perf stat -e bus-cycles (if available on your CPU) can reveal bandwidth saturation. For a practical example, see mastering firewire device management on linux systems.

Start Monitoring Like a Pro: Your First Hardware Check in 2026

You now have the fundamentals. Pick a program you suspect has a performance issue. Run perf stat on it. Look at the IPC and cache miss rate. If IPC is below 0.5, use perf record to find the culprit function. Keep an eye on event multiplexing and frequency scaling, and double‑check your numbers with multiple runs. Hardware monitoring isn’t just for kernel hackers anymore. It’s a daily tool for any sysadmin who wants their servers to run faster and cooler.

Fire up your terminal. Run perf list. Choose one event that matters today. You’ll be surprised what you find.

linux1394.org

linux1394.org

How to Use Linux perf for Hardware Performance Monitoring in 2026

What is Linux Perf and Why Use It for Hardware Monitoring?

Installing and Preparing Perf in 2026

The Essential Perf Commands for Hardware Monitoring

Decoding Hardware Events: A Practical Table

Measuring Real-World Performance: A Step-by-Step Example

Common Mistakes and How to Avoid Them

Advanced Hardware Monitoring with Custom Events

Tying Perf Into Your System Administration Workflow

Start Monitoring Like a Pro: Your First Hardware Check in 2026

LEAVE A RESPONSE Cancel reply

caleb

Deep Dive into Linux Kernel Debugging for FireWire Interfaces

How to Troubleshoot Linux Boot Issues with systemd and Journalctl

How to Build a Minimal Linux Kernel for Embedded Systems in 2026

How to Configure udev for Persistent FireWire Device Names on Linux

Recent Posts

Archives

Categories

How to Use Linux perf for Hardware Performance Monitoring in 2026

What is Linux Perf and Why Use It for Hardware Monitoring?

Installing and Preparing Perf in 2026

The Essential Perf Commands for Hardware Monitoring

Decoding Hardware Events: A Practical Table

Measuring Real-World Performance: A Step-by-Step Example

Common Mistakes and How to Avoid Them

Advanced Hardware Monitoring with Custom Events

Tying Perf Into Your System Administration Workflow

Start Monitoring Like a Pro: Your First Hardware Check in 2026

LEAVE A RESPONSE Cancel reply

caleb

You Might Also Like

Deep Dive into Linux Kernel Debugging for FireWire Interfaces

How to Troubleshoot Linux Boot Issues with systemd and Journalctl

How to Build a Minimal Linux Kernel for Embedded Systems in 2026

How to Configure udev for Persistent FireWire Device Names on Linux

Recent Posts

Archives

Categories