Memory observability tools

ToolDescription
vmstatVirtual and physical memory statistics
PSIMemory pressure stall information
swaponSwap device usage
sarHistorical statistics
slabtopKernel slab allocator statistics
numastatNUMA statistics
psProcess status
topMonitor per-process memory usage
pmapProcess address space statistics
perfMemory PMC and tracepoint analysis
drsnoopDirect reclaim tracing
wssWorking set size estimation
bpftraceTracing programs for memory analysis
pmcarchCPU cycle usage including LLC misses
tlbstatSummarizes TLB cycles
freeCache capacity statistics
cachestatPage cache statistics
oomkillShows extra info on OOM kill events
memleakShows possible memory leak code paths
mmapsnoopTraces mmap(2) calls system-wide
brkstackShows brk() calls with user stack traces
shmsnoopTraces shared memory calls with details
faultsShows page faults, by user stack trace
ffaultsShows page faults, by filename
vmscanMeasures VM scanner shrink and reclaim times
swapinShows swap-ins by process
hfaultsShows huge page faults, by process

Observability tools

Noting down some helpful sources 🙂

Images

Images

PackageProvides
procpsps(1), vmstat(8), uptime(1), top(1)
util-linuxdmesg(1), lsblk(1), lscpu(1)
sysstatiostat(1), mpstat(1), pidstat(1), sar(1)
iproute2ip(8), ss(8), nstat(8), tc(8)
numactlnumastat(8)
linux-tools-common linux-tools-$(uname -r)perf(1), turbostat(8)
bcc-tools (aka bpfcc-tools)opensnoop(8), execsnoop(8), runqlat(8), runqlen(8), softirqs(8), hardirqs(8), ext4slower(8), ext4dist(8), biotop(8), biosnoop(8), biolatency(8), tcptop(8), tcplife(8), trace(8), argdist(8), funccount(8), stackcount(8), profile(8), and many more
bpftracebpftrace, basic versions of opensnoop(8), execsnoop(8), runqlat(8), runqlen(8), biosnoop(8), biolatency(8), and more
perf-tools-unstableFtrace versions of opensnoop(8), execsnoop(8), iolatency(8), iosnoop(8), bitesize(8), funccount(8), kprobe(8)
trace-cmdtrace-cmd(1)
nicstatnicstat(1)
ethtoolethtool(8)
tiptoptiptop(1)
msr-toolsrdmsr(8), wrmsr(8)
github.com/brendangregg/msr-cloud-toolsshowboost(8), cpuhot(8), cputemp(8)
github.com/brendangregg/pmc-cloud-toolspmcarch(8), cpucache(8), icache(8), tlbstat(8), resstalls(8)

  • vmstat(8): Virtual and physical memory statistics, system-wide
  • mpstat(1): Per-CPU usage
  • iostat(1): Per-disk I/O usage, reported from the block device interface
  • nstat(8): TCP/IP stack statistics
  • sar(1): Various statistics; can also archive them for historical reporting
  • ps(1): Shows process status, shows various process statistics, including memory and CPU usage.
  • top(1): Shows top processes, sorted by CPU usage or another statistic.
  • pmap(1): Lists process memory segments with usage statistics.
  • perf(1): The standard Linux profiler, which includes profiling subcommands.
  • profile(8): A BPF-based CPU profiler from the BCC repository (covered in Chapter 15BPF) that frequency counts stack traces in kernel context.
  • Intel VTune Amplifier XE: Linux and Windows profiling, with a graphical interface including source browsing.
  • gprof(1): The GNU profiling tool, which analyzes profiling information added by compilers (e.g., gcc -pg).
  • cachegrind: A tool from the valgrind toolkit, can profile hardware cache usage (and more) and visualize profiles using kcachegrind.
  • Java Flight Recorder (JFR): Programming languages often have their own special-purpose profilers that can inspect language context. For example, JFR for Java.
  • tcpdump(8): Network packet tracing (uses libpcap)
  • biosnoop(8): Block I/O tracing (uses BCC or bpftrace)
  • execsnoop(8): New processes tracing (uses BCC or bpftrace)
  • perf(1): The standard Linux profiler, can also trace events
  • perf trace: A special perf subcommand that traces system calls system-wide
  • Ftrace: The Linux built-in tracer
  • BCC: A BPF-based tracing library and toolkit
  • bpftrace: A BPF-based tracer (bpftrace(8)) and toolkit
  • strace(1): System call tracing
  • gdb(1): A source-level debugger
  • perf stat: performance counter statistics
Images

Ansible: Dynamic fact

The following is how you can set dynamic fact. The fact which will have it’s name as a variable. The key will be a variable and value will also be a variable.

- set_fact:
   {"{{ groups['nginx'][groups['nodejs'].index(inventory_hostname)] }}":"{{ hostvars[inventory_hostname]['ansible_eth0']['ipv4']['address'] }}"}

Here we are setting a fact whose key is the host in nginx group with same index as current host in the nodejs group. We are assigning it the value of IP address of current host.

You can print it as follows

- name: print
  debug:
    msg: " {{ hostvars[groups['nodejs'][groups['nginx'].index(inventory_hostname)]][groups['nginx'][groups['nodejs'].index(inventory_hostname)]] }} "

Ansible: access the index id of the host in group

The index can be accessed as:

- name: Print index
  debug:
    msg: "Index is {{ groups['nginx'].index(inventory_hostname) }}"

This will print the index id of the current host in the group “nginx”. It starts from 0.

How to Trace Execution of Commands in Shell Script with Shell Tracing

In this article of the shell script debugging series, we will explain the third shell script debugging mode, that is shell tracing and look at some examples to demonstrate how it works, and how it can be used.

The previous part of this series clearly throws light upon the two other shell script debugging modes: verbose mode and syntax checking mode with easy-to-understand examples of how to enable shell script debugging in these modes.

Shell tracing simply means tracing the execution of the commands in a shell script. To switch on shell tracing, use the -x debugging option.

This directs the shell to display all commands and their arguments on the terminal as they are executed.

We will use the sys_info.sh shell script below, which briefly prints your system date and time, number of users logged in and the system uptime. However, it contains syntax errors that we need to find and correct.

#!/bin/bash #script to print brief system info ROOT_ID=”0″ DATE=`date` NO_USERS=`who | wc -l` UPTIME=`uptime` check_root(){ if [ “$UID” -ne “$ROOT_ID” ]; then echo “You are not allowed to execute this program!” exit 1; } print_sys_info(){ echo “System Time : $DATE” echo “Number of users: $NO_USERS” echo “System Uptime : $UPTIME } check_root print_sys_info exit 0

Save the file and make the script executable. The script can only be run by root, therefore employ the sudo command to run it as below:

$ chmod +x sys_info.sh $ sudo bash -x sys_info.sh

Shell Tracing - Show Error in Script

From the output above, we can observe that, a command is first executed before its output is substituted as the value of a variable.

For example, the date was first executed and the its output was substituted as the value of the variable DATE.

We can perform syntax checking to only display the syntax errors as follows:

$ sudo bash -n sys_info.sh

Syntax Checking in Script

If we look at the shell script critically, we will realize that the if statement is missing a closing fi word. Therefore, let us add it and the new script should now look like below:

#!/bin/bash #script to print brief system info ROOT_ID=”0″ DATE=`date` NO_USERS=`who | wc -l` UPTIME=`uptime` check_root(){ if [ “$UID” -ne “$ROOT_ID” ]; then echo “You are not allowed to execute this program!” exit 1; fi } print_sys_info(){ echo “System Time : $DATE” echo “Number of users: $NO_USERS” echo “System Uptime : $UPTIME } check_root print_sys_info exit 0

Save the file again and invoke it as root and do some syntax checking:

$ sudo bash -n sys_info.sh

Perform Syntax Check in Shell Scripts

The result of our syntax checking operation above still shows that there is one more bug in our script on line 21. So, we still have some syntax correction to do.

If we look through the script analytically one more time, the error on line 21 is due to a missing closing double quote (”) in the last echo command inside the print_sys_info function.

We will add the closing double quote in the echo command and save the file. The changed script is below:

#!/bin/bash #script to print brief system info ROOT_ID=”0″ DATE=`date` NO_USERS=`who | wc -l` UPTIME=`uptime` check_root(){ if [ “$UID” -ne “$ROOT_ID” ]; then echo “You are not allowed to execute this program!” exit 1; fi } print_sys_info(){ echo “System Time : $DATE” echo “Number of users: $NO_USERS” echo “System Uptime : $UPTIME” } check_root print_sys_info exit 0

Now syntactically check the script one more time.

$ sudo bash -n sys_info.sh

The command above will not produce any output because our script is now syntactically correct. We can as well trace the execution of the script all for a second time and it should work well:

$ sudo bash -x sys_info.sh

Trace Shell Script Execution

Now run the script.

$ sudo ./sys_info.sh

Shell Script to Show Date, Time and Uptime

Importance of Shell Script Execution Tracing

Shell script tracing helps us identify syntax errors and more importantly, logical errors. Take for instance the check_root function in the sys_info.sh shell script, which is intended to determine if a user is root or not, since the script is only allowed to be executed by the superuser.

check_root(){ if [ “$UID” -ne “$ROOT_ID” ]; then echo “You are not allowed to execute this program!” exit 1; fi }

The magic here is controlled by the if statement expression [ "$UID" -ne "$ROOT_ID" ], once we do not use the suitable numerical operator (-ne in this case, which means not equal ), we end up with a possible logical error.

Assuming that we used -eq ( means equal to), this would permit any system user as well as the root user to run the script, hence a logical error.

check_root(){ if [ “$UID” -eq “$ROOT_ID” ]; then echo “You are not allowed to execute this program!” exit 1; fi }

Note: As we looked at before at the start of this series, the set shell built-in command can activate debugging in a particular section of a shell script.

Therefore, the line below will help us find this logical error in the function by tracing its execution:

The script with a logical error:

#!/bin/bash #script to print brief system info ROOT_ID=”0″ DATE=`date` NO_USERS=`who | wc -l` UPTIME=`uptime` check_root(){ if [ “$UID” -eq “$ROOT_ID” ]; then echo “You are not allowed to execute this program!” exit 1; fi } print_sys_info(){ echo “System Time : $DATE” echo “Number of users: $NO_USERS” echo “System Uptime : $UPTIME” } #turning on and off debugging of check_root function set -x ; check_root; set +x ; print_sys_info exit 0

Save the file and invoke the script, we can see that a regular system user can run the script without sudo as in the output below. This is because the value of USER_ID is 100 which is not equal to root ROOT_ID which is 0.

$ ./sys_info.sh

Run Shell Script Without Sudo