remove on stale pid file after power failure to allow software startup via systemd service

Question

i got such situation

there is a power failure
the UPS battery is drained during power failure
the servers in rack are shut off from power lose
power is restored
servers powering up and their os'es are booting
systemd is doing its work
and

now some software (systemd services startup scripts) are in deep black hole as they startup ryles on the pid's file existence and not of the check up of the PID (number) in pid file (var/run) process is still alive refusing to startup and that makes unwanted cascade of sh..t

and now my questions

as im not the software designer and im not able to be software keeper to make/force designer to do his creation to work "proper or as it should"
im not ale to constantly keep my hands on pulse to check that all services are working as it's expected to, especially after a each update

3) howto maintain such situation globally?

a) its good idea o make a script and execute it on systemd service 
on each boot of server (linux kernel) to verify of pid file
processes existence ?  in case  of ps gone delete the pid file  

b) if so where such entry point for such service should be

c) how such script should handle the systemd services 
especially multiple .services file  location and the pid file named in those service files *with use of find or grep tools" .. 
as a blocking service ? it wold be a time cost solution ... 

d) or there is already better solution if so which one ? or some already made soft ?

maybe just hook after a init (process) ? and do just rm all *.pid ? by find -name *.pid ? exec rm ? it wold be 99% of job done :D

or do a systemd unit ExecStartPre= script for such each service to delete itself ? but it wold be time consuming to trace and edit service file and there is a "update" issue when sysyemd scirpt get changed by author ???

or maybe is there a unique stanza to exec a PreExecScipt for each systemd service bulk/globally without interfering to the service scripts?

ps.

and last question .... when I ask for solution to my real problem, you (by being the victim) you are punish me for the developers' mistakes by downvoting my.. why?

i'm not responsible for other mistakes or bad knowledge or it's lack

ps2.

"I am not in a position to look at the performance of every piece of software I use and install every day, but there is a group such so im having similar problems with it and I want to be able to prevent it"

ps3. ok when it goes to run folder i know the rules on my DEBIAN and other distros like:

https://wiki.debian.org/ReleaseGoals/RunDirectory#How_to_transition_from_.2Fvar.2Frun_to_.2Frun_and_.2Fvar.2Flock_to_.2Frun.2Flock.3F

https://wiki.debian.org/ReleaseGoals/RunDirectory#Packages_using_.2Fvar.2Frun_and_.2Fvar.2Flock

https://wiki.debian.org/ReleaseGoals/RunDirectory#Why_can.27t_.2Fvar.2Frun_and_.2Fvar.2Flock_stay_under_.2Fvar.3F

it's not a simulation or theory it's REAL LIFE and it's not so colorful as you think so i ask for hel once again HOW TO COUNTERACT AGAINST ORPHANED PID FILE EXISTENCE AFTER REBOOT ON A POWER FAILURE WHEN THE PROGRAM DOESN'T HANDLE IT IN PROPER WAY

from software developer manual

Kea’s servers create PID files upon startup. These files are used by keactrl to determine whether a given server is running. If one or more servers are running when the start command is issued, the output looks similar to the following:

keactrl start
INFO/keactrl: kea-dhcp4 appears to be running, see: PID 10918, PID file: /usr/local/var/run/kea/kea.kea-dhcp4.pid.
INFO/keactrl: kea-dhcp6 appears to be running, see: PID 10924, PID file: /usr/local/var/run/kea/kea.kea-dhcp6.pid.
INFO/keactrl: kea-dhcp-ddns appears to be running, see: PID 10930, PID file: /usr/local/var/run/kea/kea.kea-dhcp-ddns.pid.
INFO/keactrl: kea-ctrl-agent appears to be running, see: PID 10931, PID file: /usr/local/var/run/kea/kea.kea-ctrl-agent.pid.
INFO/keactrl: kea-netconf appears to be running, see: PID 10123, PID file: /usr/local/var/run/kea/kea.kea-netconf.pid.

During normal shutdowns, these PID files are deleted; they may, however, be left over as remnants following a system crash. It is possible, though highly unlikely, that upon system restart the PIDs they contain may actually refer to processes unrelated to Kea. This condition will cause keactrl to decide that the servers are running, when in fact they are not. In such a case the PID files listed in the keactrl output must be manually deleted.

so we ends with pids in non volatile inode -> /usr/local/var/run

all best T. Best aka ceph3us

Well, you could prevent such situations in the future by configuring your servers on low UPS battery. — mashuptwice, Feb 27 '22 at 22:14
@mashuptwice as i understand to bind the server (eg via ups usb interface) to measure the left time of ups would be up? and to self shutdown on some left power out percentage? but what about the case when the software of ups is made for other os like windows ? or is not present a all ? and what about case of ups failure ? — ceph3us, Feb 27 '22 at 22:40

ste · Answer 1 · 2022-07-17T16:34:34.057

As you have discovered a PID file that is still present at system start-up cannot correctly represent the new state of the system. A lingering PID file will mislead the initialisation process and break it. It is a fundamental assumption in the use of PID files that they are written to a volatile path so that in the event of a power loss they are discarded.

The convention for PID files is to use the path /run which is required to be a volatile filesystem. Typically this is a tmpfs which is present only in volatile RAM. For historical reasons there is also a symbolic link called run from the non-volatile filesystem /var to /run ie /var/run -> /run. Conversely the filesystem mounted at /usr/local is not expected to be volatile. See also the Filesystem Hierarchy Standard.

Link to /run

A simple solution may be to link to /run from wherever your service wants to write its PID file. The linked directory will have the same semantics as the standard path.

The problematic PID files you have identified for the kea service are in /usr/local/var/run so:

# (first stop whatever is using /usr/local/var/run)
# rm -Rf /usr/local/var/run
# ln -s /run /usr/local/var/run

This, I would argue, is a workaround and that the correct solution is to fix the kea service scripts to use the standard path. But using this link should be a safe alternative and save you the trouble. Anything else using this path will also enjoy the benefit.

However /run is owned by root so this workaround will only work if the services using /usr/local/var/run are running as root. If not, linking to /tmp instead is an alternative workaround; see also Purge below.

Note that you may have to apply this workaround in other places in your installation if there are other places that are storing non-volatile PID files.

Reconfigure

Another option you may have is to configure the offending scripts to use the correct path. Often an initialisation script will read configuration settings from a default file eg /etc/default/kea. You will have to consult the service documentation or script to find what options if any are available.

Purge

For a service that is writing its PID file using a dedicated account (say kea-user) you can mimic the expected volatility of its PID file by deleting it at boot using a cron table entry for that account. For example:

@reboot rm -f /usr/local/var/run/kea.pid

That example would be edited with: # crontab -e -u kea-user.

remove on stale pid file after power failure to allow software startup via systemd service

it's not a simulation or theory it's REAL LIFE and it's not so colorful as you think so i ask for hel once again HOW TO COUNTERACT AGAINST ORPHANED PID FILE EXISTENCE AFTER REBOOT ON A POWER FAILURE WHEN THE PROGRAM DOESN'T HANDLE IT IN PROPER WAY

1 Answers1

Link to /run

Reconfigure

Purge