FreeBSD gets unresponsive on commands touching disk

in FreeBSD


Imagine your server hangs even on simple, seemingly unrelated commands such as cat file, ps aux or top. This is the issue I've been dealing with today. Short version: it was MySQL/MariaDB server clogging the VPS I/O.

Important: even though a terminal window with an established SSH connection would freeze, I could always open a new Terminal window with a new SSH connection to try a different command, or to debug the one stalled in the first windown.

Below are steps to debug.

Confirm this actually is I/O – iostat

Run iostat -x 1 and check %b.

root@:~ # iostat -x 1
                        extended device statistics  
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
da0          201      98   7201.8   5448.0     5    30     0    13    1  98 
cd0            0       0      0.0      0.0     0     0     0     0    0   0 
pass0          0       0      0.0      0.0     0     0     0     0    0   0 
pass1          0       0      0.0      0.0     0     0     0     0    0   0 
                        extended device statistics  
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
da0          199     101   7276.7   6159.6     4    11     0     6    0  97 
cd0            0       0      0.0      0.0     0     0     0     0    0   0 
pass0          0       0      0.0      0.0     0     0     0     0    0   0 
pass1          0       0      0.0      0.0     0     0     0     0    0   0 
                        extended device statistics  
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
da0          195     105   7839.5   6175.6     4    22     0    10    0  97 
cd0            0       0      0.0      0.0     0     0     0     0    0   0 
pass0          0       0      0.0      0.0     0     0     0     0    0   0 
pass1          0       0      0.0      0.0     0     0     0     0    0   0 
                        extended device statistics  
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
da0          166     125   6351.7   6129.4     5    53     0    25    0  99 
cd0            0       0      0.0      0.0     0     0     0     0    0   0 
pass0          0       0      0.0      0.0     0     0     0     0    0   0 
pass1          0       0      0.0      0.0     0     0     0     0    0   0 
^C

(Output truncated, but you get the idea.)

The above does show very heavy disk utilization, and write latency spikes:

  • %b pushing 100% almost constantly confirms the disk is busy almost all the time
  • ms/w jumping to 53 ms shows the writes are slow
  • if qlen goes up occasionally, that means queued I/O bursts (backpressure)

Confirm this actually is I/O – vmstat

Run vmstat 1 and check column for da0 from the previous test.

root@:~ # vmstat 1
 procs    memory    page                      disks       faults       cpu
 r  b  w  avm  fre  flt  re  pi  po   fr   sr  da0  cd0   in   sy   cs us sy id
 0  0  0 1.1G 455M    0   0   0   0 1.9k   60  300    0  274  654 2.1k  1  1 96
 0  0  0 1.1G 457M    0   0   0   0 2.6k   60  299    0  288  537 1.8k  0  5 93
 1  0  0 1.1G 457M    0   0   0   0 2.9k   60  301    0  298  467 1.9k  0  0 99
 0  0  0 1.1G 456M    0   0   0   0 2.0k   60  298    0  294  413 1.6k  0  5 94
^C

(Output truncated, but you get the idea.)

As you can see here, the same da0 is busy at around ~300.

Do you run a database on this server?

On FreeBSD, MariaDB is almost always configured with durability-heavy defaults.

Check the status: service mysql-server onestatus

root@:~ # service mysql-server onestatus
mysql is running as pid 2313.

Run procstat kstack <PID> | egrep 'bwait|bufwait' to spot I/O buffer waits. Grep is used as a fast filter. Use a for loop to check multiple snapshots.

for i in 0 1 2 3 4 5 6 7 8 9; do procstat kstack <PID> | egrep 'bwait|bufwait'; sleep 2; done

If this prints anything, at least one thread is blocking filesystem and/or I/O. Example:

root@:~ # procstat kstack 2313 | egrep 'bwait|bufwait'
 2313 100447 mariadbd            -                   mi_switch _sleep bwait bufwait cluster_read ffs_read VOP_READ_APV vn_read vn_io_fault_doio vn_io_fault1 vn_io_fault dofileread kern_preadv sys_pread amd64_syscall fast_syscall_common 

So this is MariaDB?

Check if your MariaDB server is tuned appropriately.

  1. How much RAM does this server have?

    root@:~ # sysctl hw.physmem
    hw.physmem: 2111275008

    2 GB

  2. How large is the database?

    root@:~ # du -sh /var/db/mysql
    40G /var/db/mysql
  3. Check innodb_buffer_pool_size, innodb_flush_log_at_trx_commit and sync_binlog.

    root@:~ # mysql -e "SELECT @@innodb_buffer_pool_size/1024/1024 AS buffer_pool_mb; SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit'; SHOW VARIABLES LIKE 'sync_binlog';"
    
    +----------------+
    | buffer_pool_mb |
    +----------------+
    |   128.00000000 |
    +----------------+
    +--------------------------------+-------+
    | Variable_name                  | Value |
    +--------------------------------+-------+
    | innodb_flush_log_at_trx_commit | 1     |
    +--------------------------------+-------+
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | sync_binlog   | 0     |
    +---------------+-------+

    The above shows:

    • InnoDB buffer pool is 128 MB which is tiny
    • innodb_flush_log_at_trx_commit = 1 means doing fsync on every commit

So we’re running a 40 GB dataset with effectively 128 MB of cache. That guarantees constant cache misses. On top of that, trx_commit=1 forces frequent log flushes, which is a problem on storage with poor sync latency.

Fix

  1. Increase innodb_buffer_pool_size On a 2 GB box, a sane starting point is 1G if this machine is primarily for MySQL/MariaDB.
  2. Relax flush policy (optional but very impactful on a VPS) Set innodb_flush_log_at_trx_commit=2 to avoid fsync on every commit. Tradeoff: you can lose up to ~1 second of transactions if the host crashes.

Add this to /usr/local/etc/mysql/conf.d/server.cnf under [mysqld]

# Cache more of the working set in RAM (big read I/O reduction)
innodb_buffer_pool_size = 1G

# Reduce fsync pressure (reduce stalling on VPS storage)
innodb_flush_log_at_trx_commit = 2
#vps #mariadb