Post

Linux Resource and Capacity Management (Part 1)

Everytime when your computer becomes slow, I guess the first thing we do is to blame it on the computer for being not POWERFUL enough. But in reality, maybe it just us not managing the resource properly. Let’s explore some meaningful ways to debugging our computers before we decide to get a faster and stronger one.

The Big 4 Performance Metrics:

  • CPU
  • RAM
  • Disk IO
  • Network IO

CPU

Real-Time CPU Monitoring

For dynamic and real time monitoring CPU resource use top or htop. htop is typically not installed by default, and it brings in a lot of dependencies with it. Recommend to use just use top. If you like fancy application, you may want to check out bashtop, btop, bpytop. Where bashtop was the original project. btop is the c++ version of the project. Where bpytop is the python version of the project. All of them are available on Ubuntu 22.04.

USAGE: (top)

ShortcutDescription
Fre-arrange the monitoring columns
Mmonitor by memory
mtoggle visualization of memory
ztoggle color mode

Monitor one process, along with its other processes

1
top -Hp $PROCESS_ID

INSTALLING:

1
2
3
4
5
6
# BASH TOP
sudo apt install bashtop
# BTOP ++
sudo apt install btop
# BPYTOP
sudo apt install bpytop

Now, you might ask. What about frozen process. Because frozen process does not consume any CPU, top is not good at it. We could use the ps command.

1
2
3
4
5
6
7
8
# To list all the processes that is run by chanjl user.
ps --user chanjl
# Another old command
ps aux | grep $PROCESS_NAME
# a for showing all, show me everything
# u for showing who the users are
# x for showing processes that does not have a tty attached,
#   for example system services, things without a terminal

Alright, what about seeing how one process is connected another? Of course, we can use pstree.

1
2
3
4
# Run it with a pid
pstree $PROCESS_ID
# Better if we just run the command raw
pstree

Well, is there any other thing that a process might how itself into? Of course, sometimes it might be a library that is locked up the thread. Or it could be generating a lot of disk IO because it constantly has to go to the disk to find these library as it is not being cached into the RAM. One of the solutions is to move the libraries to a less contention drives for better access.

1
2
3
# List the libraries that a process is hooked onto
# pmap is a process map, typically you will need sudo to view
sudo pmap $PROCESS_ID

For example, here is an RVIZ2 application.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
sudo pmap 7616
7616:   rviz2
000000004037a000    700K rw---   [ anon ]
00000000411cf000      8K r-xs- .glXXXXXX (deleted)
000000004163a000    700K rw---   [ anon ]
0000000041d23000    700K rw---   [ anon ]
00005635523a4000      8K r---- rviz2
00005635523a6000      8K r-x-- rviz2
00007fc25b59e000   2048K ----- libnvidia-glcore.so.470.223.02
00007fc25b79e000   5488K r---- libnvidia-glcore.so.470.223.02
00007fc25bcfa000   1680K rw--- libnvidia-glcore.so.470.223.02
00007fc264000000    132K rw---   [ anon ]
00007fc264021000  65404K -----   [ anon ]
00007fc26803a000    692K r---- DejaVuSans-Bold.ttf
00007fc2680e7000      4K -----   [ anon ]
00007fc2680e8000   8192K rw---   [ anon ]
00007fc2688e8000     40K r---- libstd_msgs__rosidl_typesupport_introspection_cpp.so
00007fc2688f2000     16K r-x-- libstd_msgs__rosidl_typesupport_introspection_cpp.so
00007fc2688f6000      8K r---- libstd_msgs__rosidl_typesupport_introspection_cpp.so
00007fc2688f8000      4K ----- libstd_msgs__rosidl_typesupport_introspection_cpp.so
00007fc2688f9000      8K r---- libstd_msgs__rosidl_typesupport_introspection_cpp.so
00007fc2688fb000      8K rw--- libstd_msgs__rosidl_typesupport_introspection_cpp.so
00007fc27bf28000      8K r---- ld-linux-x86-64.so.2
...
00007fc27bf2a000      8K rw--- ld-linux-x86-64.so.2
00007fff944d1000    132K rw---   [ stack ]
00007fff945e2000     12K r----   [ anon ]
00007fff945e5000      8K r-x--   [ anon ]
ffffffffff600000      4K --x--   [ anon ]
 total           649804K

Well, pmap is good for libraries. But it is bad for config files. When we start up many processes, they usually read up some config files. Some applications like mySQL might even hold a file open for read and writing. These files are typically locked and you are not allowed to edit the file. How can we know what application is holding this file?

1
2
# List which are the files a process are locking
sudo lsof -p $PROCESS_ID

For example, here is the files an RVIZ2 application is locking.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
sudo lsof -p 9283
COMMAND  PID      USER   FD      TYPE             DEVICE SIZE/OFF     NODE NAME
rviz2   9283 developer  cwd       DIR               0,66     4096 28434892 /home/developer
rviz2   9283 developer  rtd       DIR               0,66     4096 28074818 /
rviz2   9283 developer  txt       REG               0,66    23048 31397568 /opt/ros/humble/bin/rviz2
rviz2   9283 developer  DEL       REG                0,1            170731 /memfd:/.glXXXXXX
rviz2   9283 developer  mem       REG               0,66    80264 29653946 /opt/ros/humble/lib/libstd_msgs__rosidl_typesupport_introspection_cpp.so
rviz2   9283 developer  mem       REG               0,66    14560 29649382 /opt/ros/humble/lib/libunique_identifier_msgs__rosidl_typesupport_introspection_cpp.so
rviz2   9283 developer  mem       REG               0,66    66936 29664822 /opt/ros/humble/lib/libgeometry_msgs__rosidl_typesupport_introspection_cpp.so
rviz2   9283 developer  mem       REG               0,66    96616 29653944 /opt/ros/humble/lib/libstd_msgs__rosidl_typesupport_fastrtps_cpp.so
rviz2   9283 developer  mem       REG               0,66    73264 31383140 /opt/ros/humble/lib/librcl_action.so
rviz2   9283 developer  mem       REG               0,66    77200 31380191 /var/cache/fontconfig/f682ffa3-8fac-4980-8c6d-49ccdffc3673-le64.cache-7
rviz2   9283 developer  mem       REG               0,66    43488 28645646 /usr/lib/x86_64-linux-gnu/libXcursor.so.1.0.2
rviz2   9283 developer  mem       REG               0,66   149760 27927898 /usr/lib/x86_64-linux-gnu/libgpg-error.so.0.32.1
rviz2   9283 developer  mem       REG               0,66  1296312 27927892 /usr/lib/x86_64-linux-gnu/libgcrypt.so.20.3.4
rviz2   9283 developer  mem       REG               0,66    39024 28583600 /usr/lib/x86_64-linux-gnu/libcap.so.2.44
rviz2   9283 developer  mem       REG               0,66   125152 27927914 /usr/lib/x86_64-linux-gnu/liblz4.so.1.9.3
rviz2   9283 developer  mem       REG               0,66   170456 27927916 /usr/lib/x86_64-linux-gnu/liblzma.so.5.2.5
rviz2   9283 developer  mem       REG               0,66   807936 27947104 /usr/lib/x86_64-linux-gnu/libsystemd.so.0.32.0
rviz2   9283 developer  mem       REG               0,66    31096 28589241 /usr/lib/x86_64-linux-gnu/libxcb-util.so.1.0.0
rviz2   9283 developer   29u      CHR              195,0      0t0      532 /dev/nvidia0
...
rviz2   9283 developer   30uw     REG               0,66      872 31468592 /home/developer/.cache/nvidia/GLCache/66dbeada1994f20a809e9e1189583fb5/83fad13e2a2fd784/2af83171a1bb0740.toc
rviz2   9283 developer   31uw     REG               0,66     5413 31468596 /home/developer/.cache/nvidia/GLCache/66dbeada1994f20a809e9e1189583fb5/83fad13e2a2fd784/2af83171a1bb0740.bin
rviz2   9283 developer   32u      CHR              195,0      0t0      532 /dev/nvidia0
rviz2   9283 developer   33u     unix 0xffff896e2e30f000      0t0   170732 type=STREAM

Hmm…is there any other tools we could use to monitor our CPU? Oh yes, there is! It’s mpstat, it is very much like sar but it is much more real-time. It can get pretty granular, meaning you could even tell it to monitor which CPU core.

1
2
3
4
5
6
7
8
9
10
# Monitor CPU for every two seconds
mpstat 2
# And when you ^C it will give you an average
Linux 5.4.0-150-generic (chanjl)        12/25/2023      _x86_64_        (16 CPU)

12:29:04 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:29:06 PM  all    0.53    0.00    0.28    0.00    0.00    0.00    0.00    0.00    0.00   99.18
12:29:08 PM  all    0.47    0.00    0.32    0.00    0.00    0.06    0.00    0.00    0.00   99.15
12:29:10 PM  all    0.53    0.00    0.31    0.00    0.00    0.00    0.00    0.00    0.00   99.15
Average:     all    0.51    0.00    0.30    0.00    0.00    0.02    0.00    0.00    0.00   99.16

So for testing, you might want to run the critical application. And then start mpstat at 1 second intervals.

Over Time (Historical) CPU Monitoring

For monitoring CPU resource over a period of time. We have a tool called sar which stands for System Activity Report. This tool can monitor more than just CPU resources which we will see soon!

USAGE: (sar)

Enable performance logging By default, after installing sysstat, you will have the file below:

1
2
3
sudoedit /etc/default/sysstat
# Change false to true
# ENABLED="true"

Then edit cron job

1
sudoedit /etc/cron.d/sysstat

Looking for log files

1
2
ls /var/log/sysstat/
# sa17 sa18 # if enabled

Reading log files

1
2
3
4
5
6
7
8
9
# URB(A)N, this is just a mnemonic
# U - CPU
sar -u
# R - RAM
sar -r
# B - BYTE
sar -b
# N - NETWORK
sar -n # This needs to be configured

USAGE: (uptime)

To check how long your system has been alive

1
uptime

USAGE: (w)

To verify if there is any other users

1
w

INSTALLING:

1
2
# System activity report
sudo apt install sysstat

That’s all for now, let’s continue exploring RAM in the next article!

This post is licensed under CC BY 4.0 by the author.

Trending Tags