Directories with a large number of files

Once there was an interesting situation, in the same directory there were millions of files.
And some of them are necessary.

When you try to view the list of files, you will naturally get a stupor for a long time.
Alternatively, they can be viewed via FTP, which has 10,000 for the frequent standard limit on the number of displayed files, for example, the FileZilla FTP client conveniently moves files in directories, but this option is long, because time is spent on FTP requests, the load on the drive is low.

If the files are not needed, you can delete them with the command (with the confirmation request to delete)

rm -r /dir/

Or delete everything without a request along with the directory:

rm -rf /dir/

In my case, small files were unnecessary, so going to the right directory, deleted the command below with anything that is smaller than the specified size:

cd /dir/
find -size -2 -type f -print -delete

Before deleting, you can see the number of such files and the total number, but this is also a lengthy process:

find -maxdepth 1 -size -2 -type f -print | wc -l
find -maxdepth 1 -type f -print | wc -l

If, instead of -2, you specify 0, then files with zero size will be deleted, that is, empty.

If you need to sort the files by directories, go to the directory with files, create the necessary directories, for example, by dates and move the files by template (all whose names begin on 2017, -maxdepth 1 indicates that you do not need to search for files in subdirectories):

cd /dir/
mkdir 2017
find -maxdepth 1 -type f -name '2017*' -exec mv -vn -t /dir/2017 {} \+

The result of the execution can be written to the file by adding to the command “> file”, for example:

find -maxdepth 1 -type f -name '2017*' -exec mv -vn -t /dir/2017 {} \+ > /dir/dir/file.log

Installing and using iotop

iotop – a console program that displays statistics on the use of disk space.

You can install Debian, Ubuntu, Mint with the command:

sudo apt-get install iotop

In Red Hat, Fedora, CentOS:

yum install iotop

Normal startup:

iotop

Running with the option:

iotop OPTION

I’ll describe the possible startup options:
–version (view version)
-h (view help)
-o (display only active processes or threads that read or write, instead of displaying all)
-b (inclusion of non-interactive mode, convenient for example to output information to a file)
-t (display time in each line, for non-interactive mode -b)
-n NUMBER (the number of iterations after which the output will be executed, if not specified, then the output is not standardized)
-d NUMBER (the delay between iterations in seconds, you can specify not an integer, a standard value of 1)
-p PID (display statistics only for the specified processes / threads, standard for all)
-u USER (display statistics only for specified users, standard for all)
-P (display only processes)
-a (accumulation of statistics from the start of the launch of iotop)
-k (statistics in kilobytes)
-q (shortened view, some header lines are removed, when used with the -b option. There are more abbreviated, for example -qq without title names and -qqq without a general summary)

Installing and using ioping

ioping – a simple tool for monitoring disk I/O delays in real time, similar to ping showing network latency.

You can install in Ubuntu / Debian using the command:

sudo apt-get install ioping

Here is an example of a run with 10 requests for a delay test to the / tmp directory:

ioping -c 10 /tmp

An example of a query with an interval of 0.2ms and an increased query size:

ioping -i 0.2 -c 10 -s 1M -S 5M /tmp

Test to disk:

ioping -R /dev/sda
ioping -RL /dev/sda

I’ll describe the possible startup options:
-c count (stop after the specified number of requests)
-w deadline (stop after the specified amount of time)
-p period (display raw statistics after each specified number of requests)
-P period (display raw statistics after each specified number of seconds)
-i interval (the interval between requests in seconds)
-s size (request size (4k))
-S size (size of the working set)
-k (after the command is finished, leave (do not delete) the working file ioping.tmp)
-L (sequential operations instead of random ones, this will also set the query size to 256k (like -s 256k))
-A (asynchronous I/O)
-C (cached I/O)
-D (straight I/O)
-B (do not display execution information, it will only appear when the command is finished in raw format)
-q (do not display execution information, it will be displayed only when the command completes)
-h (display help)
-v (view version)