We should first extend a kernel module. This module allows user to control access to ARMv8 PMU counters from userspace.
It was initially created just for enabling userspace access to Performance Monitors Cycle Count Register (PMCCNTR_EL0) for use in dataplane software such as DPDK framework. It has been later extended to provide a general purpose interface for managing ARMv8 PMU counters.
Further adding support to enable/disable user mode access to Counter Timer Kernel Control register.
cd kernel_module
# If compiling natively on ARMv8 host
make
# If cross compiling pass arguments as make arguments, not env vars
make CROSS_COMPILE={cross_compiler_prefix} ARCH=arm64 KDIR=/path/to/kernel/sourcesNext module has to be copied to the target board in question if it was cross-compiled. Next load it:
sudo insmod pmu_el0_cycle_counter.koLoading the module will enable userspace access to PMCCNTR counter. Unloading this module will disable userspace access to PMCCNTR.
The PMCCNTR can be read in the application with:
static inline uint64_t
read_pmccntr(void)
{
uint64_t val;
asm volatile("mrs %0, pmccntr_el0" : "=r"(val));
return val;
}Additionally module creates a device (/dev/pmuctl) which can be used to enable/disable access to PMU counters (currently only PMCCNTR is supported). This device supports the following interfaces:
-
read()- Dump current counter configuration. It is preferred to read all data in one call as the data may change in betweenreadsyscalls:$ cat /dev/pmuctl PMCCNTR=1
-
write()- Modify the configuration of a particular counter. The write buffer should have thename=valueformat. Bothnameandvalueare counter specific and described in the next chapter. Below is an example of how to use this:echo "PMCCNTR=1" > /dev/pmuctl
-
ioctl()- Similar towrite()but intended for use in user applications rather than scripts. The list of supported ioctls is located inpmuctl.hheader. Below is an example of how to use this interface:struct pmuctl_pmccntr_data arg = { .enable = 0 }; int fd = open("/dev/pmuctl", O_RDONLY); if (ioctl(fd, PMU_IOC_PMCCNTR, &arg)) { /* error handling */ }
- Performance Monitors Cycle Count Register:
nameisPMCCNTR,valueis0to disable EL0 access,1to enable EL0 access.
- Counter-timer Kernel Control Register:
nameisCNTKCTL,valueis0to disable EL0 access,1` to enable EL0 access.
To add support for managing a new counter, developer should do the following:
-
Add a new value to
enum pmu_ctlsat the end, just beforePM_CTL_CNT. This will be used to identify the new counter in read and write operations. I.e.:enum pmu_ctls { PM_CTL_PMCCNTR, PM_CTL_NEW_CNTR, /* Short description */ // <- new entry PM_CTL_CNT, };
-
Write
read()andwrite()handlers for the new counter and add a descriptor to thestruct pmu_ctl_cfg pmu_ctlsarray inpmu_el0_cycle_counter.cfile. New entry should be placed at the index matching the new entry inenum pmu_ctls. I.e.:static ssize_t new_cntr_show(char *arg, size_t size) { /* Dump configuration to arg buffer, up to size characters. * Return number of written characters or negative error code. */ } static int new_cntr_modify(const char *arg, size_t size) { /* Modify config according to value parsed from arg buffer * of size length. * Return 0 on success or a negative error code. */ } /* ... */ static struct pmu_ctl_cfg pmu_ctls[PM_CTL_CNT] = { /* ... */ [PM_CTL_NEW_CNTR] = { .name = "NEW_CNTR", .show = new_cntr_show, .modify = new_cntr_modify } };
-
For
ioctl()support, add the definition of new ioctl and its arguments to thepmuctl.hfile usingPMUCTL_IOC_MAGICas the ioctl magic and newenum pmu_ctlsas a sequence number and using macros in<linux/ioctl.h>. I.e.:struct pmuctl_new_cntr_arg { int some_argument; }; /* ... */ #define PMU_IOC_NEW_CNTR \ _IOW(PMUCTL_IOC_MAGIC, PM_CTL_NEW_CNTR, struct pmuctl_new_cntr_arg)
-
Next add a case statement in
pmuctl_ioctl()function inpmu_el0_cycle_counter.cfile to handle the new ioctl, i.e.:static long pmuctl_ioctl(struct file *f, unsigned int cmd, unsigned long arg) { /* ... */ mutex_lock(&pmuctl_lock); switch (cmd) { /* ... */ case PMU_IOC_NEW_CNTR: /* handle the ioctl() */ break; /* ...*/ } mutex_unlock(&pmuctl_lock); return ret; }
-
Compile Reverse-Engineering Benchmarks
gcc -O0 <TARGET_FEATURE>.c- Test Prime+Reset:
make NO_RESTThis will show the memory latency when the prefetcher status is not reset
make RESETThis will show the memory latency when the prefetcher status is reset
make CACHEThis will show the cached date latency when the prefetcher status is reset