4

I have a web server (specs below) with 12 TB of storage. I am moving massive amounts of csv files packaged in TAR's to the server, then extracting on the server. The problem is that when extracting the TAR files, the server become so slow that it's almost unusable. I'm not doing anything crazy, generally running 2-4 extractions at a time. But even just running one or two slows the server down noticeably. This is going to be a massive problem for me since I will be uploading and extracting TAR files while people will want to use the site and right now I can't do both. I'm really new to Linux and this community so let me know if I can provide any more useful info and I'll update the post.

I'm guessing the disk is the bottleneck?

If so, can I limit the tar extraction disk usage or give everything else priority?

I/O Stat:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.15    0.56    0.40   14.83    0.00   84.06

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.00         0.00         0.00       1907          2
sda             155.19       787.23      1484.89  604305327 1139862930
sdb             154.49       765.39      1493.48  587544552 1146456242
sdc             153.82       759.91      1485.53  583338594 1140353662
md4            1041.52      1861.40      4425.45 1428880721 3397151904
md3               4.78        46.70        11.08   35850458    8501904
md2               0.00         0.00         0.00       3641         98

TOP:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
7194 root      20   0       0      0      0 D   5.0  0.0   0:17.38 
13811 user1  20   0  121272   1620   1464 D   4.3  0.0   0:02.20 tar

Server Specs:

Intel Atom C2750, 8c/8t - 2.4GHz /2.6GHz, 16GB DDR3 ECC 1600 MHz

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
lessharm
  • 75
  • 7
  • Have you tried adding another disk with a separate filesystem and extracting the tar balls there? Assuming tar is not significantly taxing your CPU and instead is mostly taxing your disk, moving the load to another disk should give your web server some relief. – Emmanuel Rosa Apr 16 '18 at 01:23

1 Answers1

4

The ionice command is "nice for IO" and will run a command with different IO priorities, so it will (or won't) yield to other processes that want to use the disk.

ionice -c 3 tar xf ...

will run the tar command with "idle" priority, so it only uses the disk when nobody else wants to. That will prevent it interfering with other processes.

There won't be much benefit in running multiple extractions in parallel in this case. A tar file is just concatenated data and some headers, so there's nothing much except reading and writing to do. It might be useful if you're working on different disks, or for certain SSDs.

Michael Homer
  • 74,824
  • 17
  • 212
  • 233