2

I am a regular user on a remote Debian installation (not an administrator). I run there a special-purpose network server which is used very extensively. The server produces daily logs which are quite large, and from time to time I need to pack them into a monthly tar-ball and, after some more time, compress them. First I tried this:

tar rf 2014-12.tar 2014-12-*.log

10 minutes later a customer called and told that the server had stopped responding. Indeed, the server process was "put aside", and all attention was given to the tar process, although the total CPU load, according to top, was only 5 percent. Then I tried:

nice -n19 tar rf 2014-12.tar 2014-12-*.log &

I sent the command to the background to be able to monitor the server process. I observed that the server process had slowed down, but not too much, so it was acceptable. It all worked for maybe 5 minutes, and then suddenly the server stalled again. I killed the tar process and saw the server immediately rush to work.

The server is listening on a TCP socket where many clients are connecting. When tar is running, the server continues to work, but after some time the select call seems to stop updating readable sockets.

All this is confusing. The server seems to be cut off after few minutes when something else is running, but otherwise it runs without problems for months. It also seems, according to top, that my processes never get more than about 5 percent of CPU total, why would that be? How can I pack the logs without disturbing my server process?

Gas Welder
  • 123
  • 2
  • 5
    `nice` controls CPU scheduling, perhaps you could try [`ionice`](http://manpages.ubuntu.com/ionice.1)? – muru Jan 05 '15 at 18:18
  • You might also run iostat 5 to see which disks are being used heavily by the tar process and by the server. The disk or disks where the old log files are kept should be distinct from the disks needed by your server. On a data-intensive cluster I once used, log files were copied to a central administrative system several times a day, and the sorting and de-duping and archiving was done there. – Mark Plotnick Jan 05 '15 at 18:43
  • There is no `iostat` installed there, alas. I'm trying `ionice` now and I'm starting to believe it helps... Do partitions matter here in any way? The system only has one disk with three partitions (the whole "machine" is virtual, by the way). – Gas Welder Jan 05 '15 at 19:04
  • `iostat` is part of the `sysstat` package. If you only have one disk and can't make it any faster, then yes, use `ionice` or reduce your i/o rate on less-important jobs, say, by running `tar` once for each file, with a pause between each run. – Mark Plotnick Jan 05 '15 at 19:24
  • Why pause between each run? – Gas Welder Jan 05 '15 at 19:31
  • That way, you only compete with the server for i/o for a few seconds rather than a few minutes at a time, and use a smaller amount of the buffer cache. – Mark Plotnick Jan 05 '15 at 21:14

2 Answers2

3

As muru stated,nice only controls CPU scheduling. You yourself noted that CPU usage was only about 5% at the time. This means that the system was heavily IO-bound, meaning reading/writing the disks was the bottleneck.

You can control the IO priority given to a process via the ionice command. You can put a process in one of three scheduling classes:

  • idle - will only get disk time when no other program needs the disk
  • best-effort - basically the default
  • realtime - this class gets first access to the disk, no matter what else is going on

You can specify the class with the -c option, 1 = realtime, 2 = best-effort, 3 = idle.

With best-effort and realtime there are also 8 different priority levels, to fine tune amongst processes of the same class.

I usually execute commands such as tar or rsync with ionice -c3, i.e. the lowest class "idle". That way other process aren't starved of disk access.

Note that IMHO ionice works best with the CFQ I/O scheduler (a kernel parameter).

wurtel
  • 15,835
  • 1
  • 29
  • 35
  • That's what the manual said too, but I'd like to add that -c1 ("real time") is not available for regular users (at least in my case). Thank you all guys for your help! – Gas Welder Jan 06 '15 at 16:00
0

Probably behavior you observe is implementation details of your filesystem - it may collect a lot of data in memory and write it in bulk. Simplest way to resolve this is to decrease amount of data you wish to write =). I would strongly suggest, to avoid the issue, just immediately achieve your file. There is no point in storage of unarchived tar if you disk is slow. To create archive just add -z option to your tar command line: tar rzf 2014-12.tar.gz 2014-12-*.log. It's better to use gzip in most cases as it is uses really few CPU in both compress and decompress. This approach would decrease your tar creation time by factor of 10 or 20 and also decrease amount of data to write by filesystem by same factor.

I would also note that daily manual tarring is kind of strange idea. I suggest you to setup normal logrorate and cron/anacron job so system would do all this stuff for you automatically.

What kind of filesystem you have by the way? (use mount|grep $(stat --printf "%m" .))

gena2x
  • 2,387
  • 14
  • 20
  • The system is ext3. I do that packing monthly, not daily. It's all still under development, so I'll come to something convenient eventually. Good point about archiving, thanks. – Gas Welder Jan 06 '15 at 16:09