Friday, February 12, 2016

linux - finding disk usage culprits

keeping my notes here for the various methods i've discovered over the years for finding disk usage culprits.

Scenario: Filesystem is full but plenty of space is available. Likely Cause: The filesystem is out of inodes. You can check this with the following command:
$> df -i /
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
                     53780480  53780480 0    1% /
Solution: The solution to this is usually finding directories with hundreds of thousands (or even millions) of files and cleaning them up. More below on this below...

Scenario: Filesystem is full and there are plenty of free inodes
Likely Cause: Big files!
Solution: Do a find to discover big files.
$> find / -xdev -type f -size +5000M # find files in / that are larger than 5G.
You should play with the -size parameter based on the filesystem size. As an example, for a 5 terabyte filesystem you should look for larger files (so 50000M). If your first find doesn't discover anything, keep playing with the -size parameter.

Scenario: Filesystem is full, there are plenty of free inodes, and you didn't find any big files.
Likely Cause: Way too many small files
Solution: Do a find to discover directories with a ton of small files.
$> find / -xdev -type f |awk -F/ '{OFS="/";$(NF--)=""; print}' | sort | uniq -c | sort -n | tail
    747 /tmp/
    812 /usr/share/zsh/4.3.10/functions/
    898 /usr/bin/
    993 /usr/src/kernels/[redacted]/include/linux/
    993 /usr/src/kernels/[redacted]/include/linux/
   1124 /usr/share/man/man3p/
   1244 /usr/share/man/man1/
   4159 /usr/share/man/man3/
 928374 /tmp/example/
Generally one of the last few will have a HUGE number of files. Do a "du" on that directory to see if it adds up to a significant amount of space:
$> du -hsc /tmp/example # note: this will always take a while for huge directories

Scenario: Filesystem is full, there are plenty of free inodes, you didn't find any big files, and you didn't find any small files.
Likely cause: Some large files were deleted, but there are some processes holding the file descriptors open, so they're still using up space.
Solution: See if there are any files that are still open but deleted. You can do this with the lsof command:
$> lsof | grep deleted
python    58386      user    3w      REG              253,1            0   98179200 /var/log/application.log (deleted)
If you see any open file descriptors with (deleted) as the last column, that's likely the problem. Here's how you can clear them:
  • Get the process ID (that's 58386 in the example above)
  • Look at open file descriptors for the process:
$> ls -l /proc/58386/fd
l-wx------ 1 user group 64 Oct 20 18:45 3 -> /var/log/application.log (deleted)
  • Truncate the file:
$> cat /dev/null > /proc/58386/fd/3
If there are a huge number of deleted files, you can delete it with this loop (of course, you can also just restart the offending program):
pids=$(lsof | awk '{if ($NF == "(deleted)") print $2}' | sort | uniq)
for pid in $pids
    for fd in /proc/${pid}/fd/*
        if [[ "$(readlink ${fd} | awk '{if ($NF == "(deleted)") print $2}')" == "(deleted)" ]]
            cat /dev/null > ${fd}

Scenario: Filesystem is full, there are plenty of free inodes, you didn't find any big files, you didn't find any small files, and there aren't any open files that have been deleted
Likely cause: Sparse files! These are files with massive chunks of null bytes. Some filesystems (like EXT 3) handle these by not actually writing out null bytes to the disk and instead doing a compression of sorts by keeping metadata that indicates that a file is full of null blocks.
Solution: Find disparities between reported size and actual size.
find / -xdev -type f 2>/dev/null | while read file
    if [ "$(stat -c '%s -(%b * %B)' ${file} | bc)" -gt 0 ]
        echo "${file}: $(( $(stat -c '%s' $file) / 1024 / 1024 )) MB Reported"
        echo "${file}: $(( $(stat -c '(%b * %B)' $file | bc) / 1024 / 1024 )) MB Actual"