1

I'd like to find out which parts of a frequent backup (with only few changes in between) take the longest to reduce the time needed for it and reduce the I/O stream.

I'm using backintime (BiT) for the backup on my Debian10/KDE machine.

I think options for finding out are:

  • somehow examining the running sync process
    • For example by running sudo lsof -c rsync | grep "backup/" to show which files are currently being backedup. However, this exemplary command isn't very useful.
  • analyzing rsync logs and/or
  • changing the rsync parameters (BiT has the option "paste additional options to rsync") and/or
  • somehow facilitating changes to the rsync and/or BiT software to put out such information (preferably comparative duration subprocesses or logs that include relevant information)
    • I have created an issue at BiT here and it doesn't seem to be possible with BiT as of right now.
  • and/or maybe something else
    • One indirect option would be to manually, separately check which of the included directories have the largest number of files and which files are the largest. However, these might not be the only thing that take a long time - e.g. as I have checked the BiT option "Use checksum to detect changes".

How to speed things up would be a separate question - for example it might be possible to "hash directories" to detect whether or not there was a change in it (modification/addition/removal) since the last backup instead of checking every file of directories which contain many files or by changing metainformation of files. But first I'd like to find out how to find out what is taking backups long.

mYnDstrEAm
  • 4,008
  • 13
  • 49
  • 108
  • 1
    timing is, specially with a sync mode, a very low interest indicator.... Global time is a far better one & average bitrate and of course errors & retries. you can add verbose mode to your rsync command but it will not demonstrate what you are looking for you will need to write a script to check process & openned file etc.. from /proc in parallel of the running rsync command if BiT is able to run a script here is the solution to work around. – francois P Jul 30 '20 at 11:48
  • I generally agree with francoisP, and I ended up writing a script, where for each rsync command I make the builtin keyword `time` precede the `rsync ... ` cmd. It gives you `real`, `user` and `system` times of execution, values you can compile in a log and look at after the fact... – Cbhihe Jul 30 '20 at 12:15
  • @Cbhihe Is the preceding time command the only thing the script is doing different from standard BiT/rsync backups in that regard (e.g. no output sorted by duration)? And would this work with BiT as is (probably via the "paste additional options to rsync" option)? – mYnDstrEAm Jul 30 '20 at 12:45
  • I don't use BiT, I use `rsync` as is in a script that differentiates between material to backup according to a few custom-criteria. As it is, using `time` as suggested above is just that, plus a few ad hoc bells and whistles of my making to keep time traces in a log file. You can then sort logged times out, with a simple post-processing step. – Cbhihe Jul 31 '20 at 14:51

0 Answers0