How does journald know the PID of a process that produces log data?

Question

When I look at journalctl, it tells me the PID and the program name(or service name?) of a log entry.

Then I wondered, logs are created by other processes, how do systemd-journald know the PID of these processes when processes may only write raw strings to the unix domain socket which systemd-journald is listenning. Also, do sytemd-journald always use the same technique to detect the PID of a piece of log data even when processes are producing log using functions like sd_journal_sendv()?

Is there any documentation I should read about this?

I read JdeBP's answer and know systemd-journald listen on an Unix Domian Socket, but even if can know the peer socket address who send the log message, how does it know the PID? What if that sending socket is opened by many non-parent-children processes?

See: https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html — George Udosen, Dec 29 '18 at 06:11

score 5 · Accepted Answer · 2018-12-29T18:24:57.493

It receives the pid via the SCM_CREDENTIALS ancillary data on the unix socket with recvmsg(), see unix(7). The credentials don't have to be sent explicitly.

Example:

$ cc -Wall scm_cred.c -o scm_cred
$ ./scm_cred
scm_cred: received from 10114: pid=10114 uid=2000 gid=2000

Processes with CAP_SYS_ADMIN data can send whatever pid they want via SCM_CREDENTIALS; in the case of systemd-journald, this means they can fake entries as if logged by another process:

# cc -Wall fake.c -o fake
# setcap CAP_SYS_ADMIN+ep fake

$ ./fake `pgrep -f /usr/sbin/sshd`

# journalctl --no-pager -n 1
...
Dec 29 11:04:57 debin sshd[419]: fake log message from 14202
# rm fake
# lsb_release -d
Description:    Debian GNU/Linux 9.6 (stretch)

systemd-journald handles datagrams and credentials sent via ancillary data is in the server_process_datagram() function from journald-server.c. Both the syslog(3) standard function from libc and sd_journal_sendv() from libsystemd will send their data via a SOCK_DGRAM socket by default, and getsockopt(SO_PEERCRED) does not work on datagram (connectionless) sockets. Neither systemd-journald nor rsyslogd accept SOCK_STREAM connections on /dev/log.

scm_cred.c

#define _GNU_SOURCE     1
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
#include <err.h>

int main(void){
        int fd[2]; pid_t pid;
        if(socketpair(AF_LOCAL, SOCK_DGRAM, 0, fd)) err(1, "socketpair");
        if((pid = fork()) == -1) err(1, "fork");
        if(pid){ /* parent */
                int on = 1;
                union {
                        struct cmsghdr h;
                        char data[CMSG_SPACE(sizeof(struct ucred))];
                } buf;
                struct msghdr m = {0};
                struct ucred *uc = (struct ucred*)CMSG_DATA(&buf.h);
                m.msg_control = &buf;
                m.msg_controllen = sizeof buf;
                if(setsockopt(fd[0], SOL_SOCKET, SO_PASSCRED, &on, sizeof on))
                        err(1, "setsockopt");
                if(recvmsg(fd[0], &m, 0) == -1) err(1, "recvmsg");
                warnx("received from %d: pid=%d uid=%d gid=%d", pid,
                        uc->pid, uc->uid, uc->gid);
        }else   /* child */
                write(fd[1], 0, 0);
        return 0;
}

fake.c

#define _GNU_SOURCE     1
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <err.h>

int main(int ac, char **av){
        union {
                struct cmsghdr h;
                char data[CMSG_SPACE(sizeof(struct ucred))];
        } cm;
        int fd; char buf[256];
        struct ucred *uc = (struct ucred*)CMSG_DATA(&cm.h);
        struct msghdr m = {0};
        struct sockaddr_un ua = {AF_UNIX, "/dev/log"};
        struct iovec iov = {buf};
        if((fd = socket(AF_LOCAL, SOCK_DGRAM, 0)) == -1) err(1, "socket");
        if(connect(fd, (struct sockaddr*)&ua, SUN_LEN(&ua))) err(1, "connect");
        m.msg_control = &cm;
        m.msg_controllen = cm.h.cmsg_len = CMSG_LEN(sizeof(struct ucred));
        cm.h.cmsg_level = SOL_SOCKET;
        cm.h.cmsg_type = SCM_CREDENTIALS;
        uc->pid = ac > 1 ? atoi(av[1]) : getpid();
        uc->uid = ac > 2 ? atoi(av[2]) : geteuid();
        uc->gid = ac > 3 ? atoi(av[3]) : getegid();
        iov.iov_len = snprintf(buf, sizeof buf, "<13>%s from %d",
                ac > 4 ? av[4] : "fake log message", getpid());
        if(iov.iov_len >= sizeof buf) errx(1, "message too long");
        m.msg_iov = &iov;
        m.msg_iovlen = 1;
        if(sendmsg(fd, &m, 0) == -1) err(1, "sendmsg");
        return 0;
}

I see. So get the sender's info doesn't need the sender process to send it initiatively. But what if the sender process has CAP_SYS_ADMIN and `sendmsg()` a PID different from its own? Will `systemd-journald` get tricked by this behaviour? — 炸鱼薯条德里克, Dec 29 '18 at 07:36
no, the kernel checks the credentials. that's mentioned in the unix(7) manpage under SCM_CREDENTIALS. — , Dec 29 '18 at 07:38
Yeah, but it mentioned `The sender must specify its own process ID (unless it has the capability CAP_SYS_ADMIN)`. That's why I mention the CAP_SYS_ADMIN, am I misunderstanding anything? — 炸鱼薯条德里克, Dec 29 '18 at 07:40
Yes, a process with `CAP_SYS_ADMIN` can send a pid different from its own. (Haven't tested it, though) — , Dec 29 '18 at 07:43
This has apparently changed over recent years. If a process forks children and shares stdout/stderr with them, on systemd 219, the `_PID` is always the parent pid on the journal regardless of which process wrote to stdout. On the other hand, as of systemd 247 the `_PID` in the journal correctly matches the pid of the originating child. — istepaniuk, Jan 15 '21 at 19:59
@istepaniuk No, this answer predates that change, and is about a different thing. This is about old-style daemons which are using `syslog(3)` (or systemd's `sd_journal_sendv()`) to log messages in a "stateless" manner (by just sending them to a *datagram* unix-domain socket), not about processes managed (in a "stateful" manner) by systemd, which are "logging" by just writing to their stdout and stderr (redirected to a *stream* socket by systemd). It's great that they finally fixed that bug, nonetheless ;-) — , Jan 22 '21 at 07:46
@istepaniuk also read my comments to JdeBP's answer, where I tried (in vain!) to explain the difference between the `SO_PEERCRED` and `SO_PASSCRED` mechanisms. There is a lot of confusion around them, apparently shared by the systemd people, too. `SO_PEERCRED` is especially broken, but neither of them can be reliaby used to determine that you're getting the data from the right user or process. — , Jan 22 '21 at 07:58
@mosvy Thanks for clarifying. I was puzzled about this so I created this other question: https://unix.stackexchange.com/questions/630145/how-can-i-have-the-pids-in-the-systemd-journal-for-proecesses-that-share-the-sta/630154#630154, specifically about what changed between systemd versions in this other aspect (identifying the PID of the stdout stream) — istepaniuk, Jan 22 '21 at 15:53

score 2 · Answer 2 · answered Dec 29 '18 at 07:56

2

The kernel tells it.

The EUID, EGID, and PID of the original client process that connected the AF_LOCAL stream socket at /run/systemd/journal/stdout is available from the kernel via the SO_PEERCRED socket option, which it uses. UCSPI-UNIX tools obtain this same information via the same system call.

Child service processes of course inherit their standard I/O file descriptors already opened (unless the parent service process changes this, of course), and so to systemd-journald all log output has the credentials of the original parent process.

Log output generated via the AF_LOCAL socket at /run/systemd/journal/socket that speaks the idiosyncratic systemd-journald protocol is coming over a datagram socket, rather than a stream one. This socket is flagged using the SO_PASSCRED socket option so that the kernel records the same information in each datagram sent, which is pulled out of each datagram by systemd-journald.

getsockopt(). Linux Programmers' Manual. 2017-09-15.
socket. Linux Programmers' Manual. 2018-02-02.
Jonathan de Boyne Pollard (2017). local-stream-socket-accept. nosh Guide. Softwares.
Jonathan de Boyne Pollard (2015). "Environment variables". The gen on the UNIX Client-Server Program Interface. Frequently Given Answers.

answered Dec 29 '18 at 07:56

JdeBP

66,967
12
159
343

no, it doesn't get it via `SO_PEERCRED`, but via ancillary data with `recvmsg`. I've `strace`'d `systemd-journald`. – Dec 29 '18 at 08:10
… and you haven't read what you are commenting on, or my previous answer referred to in the question, or indeed all of what the question asks. – JdeBP Dec 29 '18 at 17:05
Because the rude dress-down may give the wrong impressions wrt the **accuracy** of this answer, I want to make it clear: **this answer is wrong**. I'll try to explain why. **1.** Portable apps which are using `syslog()` do **not** connect to the stream socket from `/run/systemd/journal/stdout`; they simply send their data from an **unconnected**, **datagram** to `/dev/log`. Since `SO_PEERCRED` is getting the creds of the process that **connected** to a socket, and does not work with connectionless sockets, it cannot be and is **not** used to get the pid of the process that called `syslog()`. – Dec 30 '18 at 08:01
**2.** unless overrided by a privileged process, the creds that `systemd` gets via `recvmsg` as described in my [answer](https://unix.stackexchange.com/a/491421/308316) will be those of the process that called `send()` by way of `syslog()`, not of the parent process that created the socket or called `connect()` on it. The second paragraph is particularly misleading, because even `SO_PEERCRED` on a connection-based socket will not return the creds of the process that created the socket file descriptor, but of the process that `connect()`ed it. – Dec 30 '18 at 08:02
**3.** The `SO_PASSCRED` option should be set on the socket on which the creds are to be received, not on the socket on which they're sent, and it does **not** cause the kernel to stick the same info in each datagram sent in the way it's described in the 3rd paragraph. – Dec 30 '18 at 08:03
What you are actually making clear is that _you don't read_. You didn't read the question talking about logs going to journald from child processes, or my answer explaining how standard output and error go to journald through the _very_ mechanism that you've erroneously claimed is not used at the client end. You didn't read _this_ answer which clearly draws a distinction between that, where `SO_PEERCRED` most definitely _is_ used despite your erroneous claims to the contrary, and others. You didn't even read where this answer _showed you_ exactly where the systemd code is doing what I state. – JdeBP Jan 27 '19 at 11:54
I completely stand by the accuracy of the description from my answer and comments, that I've checked and re-checked. `systemd-journald` is only using `SO_PEERCRED` for a stream socket opened by `sd_journal_stream_fd()` (`/run/systemd/journal/stdout`) which is **absolutely not used** by the standard `syslog(3)` or the `sd_journal_send*` and `sd_journal_print*` functions, which are all using **datagram** sockets, on which `SO_PEERCRED` **does not work**. – Jan 27 '19 at 12:28
I'm not aware of any program that's using that `sd_journal_stream_fd` stream log facility and I don't think that a syslog program that keeps states of clients is a good idea in the 1st place, but that's a completely different matter, not related to this question. As to 'not reading', that simply amounts to bullying; of course I've read everything, it's simply that I prefer to base my answers on **facts**, rather than do exegesis of other people's answers and second guess what has misled them into believing things that are not true. – Jan 27 '19 at 12:42

How does journald know the PID of a process that produces log data?

2 Answers2

The kernel tells it.

Further reading