0

I run a Sheevaplug (small ARM server) with Debian 9. It does not have any third-party repos enabled in sources.list / sources.list.d.

I have a backup script which runs as root, and uses at. I think something broke on Sep 13, because I am getting these emails that look like they come from at. They are daily, like my backups. The body of the message just says Killed.

I can't think what would be sending SIGKILL to my process! Without gathering any more information than I have now, can you think of any reason this would happen?

It can't be from the OOM killer (Out of Memory condition), because I have a full kernel log in dmesg which does not show any OOM messages.

The at job is

#!/bin/sh
# at uses sh shell

set -e                   
cd /d/backup/jenkins-desktop/

for i in */; do                              
    nice ionice -c 3 rdiff-backup "$i" ../jenkins-desktop.rdiff/"$i"
done

I doubt it's systemd SystemCallFilter=, and that would send SIGSYS by default. I see that a couple of rlimits send SIGKILL. But I'm not doing anything to set rlimits myself; also it looks like in both cases you would be killed by SIGXCPU first, which defaults to fatal and should show "CPU time limit exceeded".

I have looked in journalctl --since=-2d -p notice and there are no errors, only some success messages from anacron.


Return-path: <root@brick>
Envelope-to: root@brick
Delivery-date: Thu, 13 Sep 2018 02:14:15 +0100
Received: from root by brick with local (Exim 4.89)
        (envelope-from <root@brick>)
        id 1g0GD0-0000Xr-Bz
        for root@brick; Thu, 13 Sep 2018 02:14:14 +0100
Subject: Output from your job     1843
To: root@brick
Message-Id: <E1g0GD0-0000Xr-Bz@brick>
From: root <root@brick>
Date: Thu, 13 Sep 2018 02:14:14 +0100
X-IMAPbase: 1541805998 113
Status: O
X-UID: 1

Killed
sourcejedi
  • 48,311
  • 17
  • 143
  • 296

1 Answers1

0

The body of the message just says Killed.

Sorry, this was incorrect.

The body of the first message says Killed. I think this was a one-off killing performed by an admin (me) :-).

The reason I am getting daily messages can be investigated by looking at the subsequent messages. Or, I should be careful now and say the second and last messages look the same :-).

Previous backup seems to have failed, regressing destination now.
Exception '[Errno 28] No space left on device' raised of class '<type 'exceptions.IOError'>':
  File "/usr/lib/python2.7/dist-packages/rdiff_backup/robust.py", line 32, in check_common_error
    try: return function(*args)
  File "/usr/lib/python2.7/dist-packages/rdiff_backup/restore.py", line 468, in get_fp
    Rdiff.write_patched_fp(current_fp, delta_fp, new_fp)
  File "/usr/lib/python2.7/dist-packages/rdiff_backup/Rdiff.py", line 73, in write_patched_fp
    rpath.copyfileobj(librsync.PatchedFile(basis_fp, delta_fp), out_fp)
  File "/usr/lib/python2.7/dist-packages/rdiff_backup/rpath.py", line 64, in copyfileobj
    outputfp.write(inbuf)

Exception '[Errno 28] No space left on device' raised of class '<type 'exceptions.IOError'>':
  File "/usr/lib/python2.7/dist-packages/rdiff_backup/Main.py", line 304, in error_check_Main
    try: Main(arglist)
  File "/usr/lib/python2.7/dist-packages/rdiff_backup/Main.py", line 324, in Main
    take_action(rps)
  File "/usr/lib/python2.7/dist-packages/rdiff_backup/Main.py", line 280, in take_action
    elif action == "backup": Backup(rps[0], rps[1])

You might wonder "regressing destination" seems to fail with "No space left on device". I'm not sure, because there seems to be a fair amount of space on the drive, but that's a question for another day.

sourcejedi
  • 48,311
  • 17
  • 143
  • 296