Process Injection on Linux - Prevention and Detection

Published on 2022-05-20

This article is part 3 of a 5 parts series on process injection on Linux.

In the first two (2) parts of this series, I went through defining process injection and presenting various facilities and techniques supporting them. It is now time to look at how to prevent and detect them. I'm assuming the reader is familiar with process injection and read the first two (2) parts of this series. I'm also assuming that the reader has solid Linux system administration knowledge and skills.

During this research, I would have really liked to check how various EDRs perform against various injection techniques, but found that vendors make it way too costly, finance- and effort-wise, for researchers (and potential clients!) to get a trial version. I abandoned the idea, but I think it would have been nice to check products and collaborate with vendors if any blindspots had been found (yep, not my loss here), and present a table of EDRs and their current state. However, given these posts, hopefully, defenders will be able to perform their own checks to assess the coverage of their chosen security solutions.

Preventing Injection

Inheriting from its UNIX roots, the reader might be aware that Linux, in its default configuration, isn't the system with the most bells and whistles to lock down a system. Nonetheless, there is a variety of parameters, facilities and modules that can be used by developers and system administrators to prevent injection. This section presents the privileges required to perform an injection, followed by a tour of developer-oriented prevention facilities and, finally, administrator-oriented facilities.

Injection conditions

In a default configuration, a user can ptrace any process with an effective user id (euid) equal to its uid and their uid. In other words, it cannot ptrace setuid binaries (when the executable's owner is not the user itself) or processes of an other user. In practice, this makes it impossible to elevate privileges by observing or modifying another user's process. In the same spirit, memory writes using /proc/<pid>/mem or process_vm_writev is not possible when tracing is not allowed. However, processes that have the CAP_SYS_PTRACE capability have the right to trace any process¹; remember that uid 0 (root) has all capabilities and that capabilities are associated with files/executables rather than users.

if (tracer-caps & CAP_SYS_PTRACE)
    allowed
else if ((tracer-uid == tracee-euid) && (tracer-uid == tracee-uid) && (tracer-saved-setuid == tracee-uid)
    allowed
else
    disallowed

prctl/PR_SET_DUMPABLE

prctl is a system call that manipulates various aspects of the behavior of the calling thread or process. One of the many aspects that can be changed is the "dumpable" attribute of a process. This attribute determines whether core dumps are produced when the process receives a signal whose default behavior is to produce a core dump. Preventing core dumps can be useful when a process stores sensitive data (passwords, decryption keys, etc.) in memory and that they shouldn't be copied to disk to prevent them from being exposed.

prctl(PR_SET_DUMPABLE, 0);

Not quite so intuitively, this attribute also controls whether the calling process can be attached to using ptrace when the tracer does not have the CAP_SYS_PTRACE capability in the user namespace of the target process. In other words, if the process/user doesn't have the CAP_SYS_PTRACE capability, it cannot trace processes that cleared this attribute. In practice, it restricts tracing to privileged processes.

A simple way to query this attribute on a process is to try to attach to it using the strace utility.

Privileged tracing of undumpable processes

Most developers do not bother about threat models that assume the user account has been breached. I won't go into the underlying debate, here, but merely point out how this can reduce the damage of a compromise. Well-known softwares that make use of this attribute are OpenSSH, tor and KeePassXC, among others.

Secure Computing (seccomp)

seccomp is a BPF-based filter that can be used to disable syscalls. It was originally designed to safely run untrusted code, but can equally be used to limit the impact of an exploited vulnerability. For example, it can disable all syscalls but open, close, read, write and exit. The idea being that, once a process is about to enter its "working" state, it disables the syscalls that should not, under expected conditions, be used. Afterwards, whenever a disabled syscall is invoked, the kernel refuses to proceed and returns, terminating the process or the thread.

Here's a small list of well-known softwares that use or support seccomp: OpenSSH, QEMU (when used with the --sandbox option), tor, Chrome and Firefox.

The `BINDNOW` and `RELRO` linking flags

In part 2, I presented an injection technique that leveraged a process's GOT to hook and maintain execution. This technique relies on the lazy symbol resolution of runtime dependencies of programs. Without these flags, whenever a program uses the C library's printf function, for example, on startup, the dynamic loader only loads the C library (libc.so) in memory. The actual location of printf is only resolved when the program tries to use it for the first time; subsequent calls the address that was found. This laziness makes startup times smaller since no symbols (and there can be many) needs to be resolved. As we can see, the end result is that an attacker can redirect execution by modifying the GOT.

To prevent GOT modifications, the BINDNOW feature instructs the dynamic loader to resolve all symbols at load-time. The RELRO feature, on the other end, makes the GOT as read-only. These features are generally used together, but nothing prevents a developer to use one or the other. When only the RELRO feature is used, the ELF's internal data sections (.init_array, .fini_array, .got, etc.) of a program are positioned before the program's data sections (.data and .bss). Strictly speaking, in that context, RELRO is a misnomer.

# compiling a C program "full relro"
gcc -Wl,-z,relro -Wl,-z,now -o main main.c

Note that, technically, these features do not prevent process injections. Rather, they mitigate vulnerabilities that rely on write primitives that target the GOT. Since there are well-known bypasses to this mitigation, they should be seen as part of the organization's defense-in-depth strategy. An attacker capable of executing arbitrary code on a system can still abuse the "punch through" behavior of ptrace and /proc/<pid>/mem, or use mprotect to work around non-writable memory sections.

Yama/ptrace_scope

Yama is a Linux Security Module that collects system-wide DAC security protections that are not handled by the core kernel itself.

As mentioned in Yama's documentation, PR_SET_DUMPABLE is not sufficient to prevent process injection (and other forms of abuse) entirely. In effect, any process that has the CAP_SYS_PTRACE capability can use ptrace on any process running on the system; in practice, generally, only root.

To provide more granular control about what processes may use ptrace on another process, Yama provides the ptrace_scope kernel configuration parameter and adds an option to prctl. ptrace_scope enables, for example, to only allow ptrace-ing a process (the tracee) when it is part of a parent-children relationship with the tracer, or those that have been explicitly declared using a prctl PR_SET_PTRACER call. It also enables limiting ptrace-ing to only processes that have the CAP_SYS_PTRACE capability; effectively setting PR_SET_DUMPABLE to 0 to all processes on the system. And, finally, disabling ptrace-ing entirely; without the ability to re-enable ptrace-ing. Depending on the Linux distribution, the default value of ptrace_scope either does not enforce any protection or only in the context of process hierarchies.

An environment with high security requirements may be intereted in disabling ptrace altogether.

# apply on the live system
sysctl -w3 kernel.yama.ptrace_scope

# preserve configuration across reboots
printf '\nkernel.yama.ptrace_scope=3\n' | sudo tee -a /etc/sysctl.conf

Yama also protects processes against /proc/<pid>/mem- and process_vm_writev-based injection techniques by disabling the "punch trough" mechanism.

SELinux

In part 2, I demonstrated an injection technique that uses mmap to allocate a new memory page and mprotect to setup its protection flags according to the W^X policy. One thing that I have often seen in publications (on binary exploitations and process injection) is the use of the rwx protection flags passed to mmap. While this work in most configurations because Linux does not enforce the W^X policy, it only starts with safe defaults, it's stealthier to have a legitimate-looking access mask. However, the SELinux security module allows system administrators to enforce the W^X policy and make many injection techniques unusable.

SELinux is a complex beast and, here, I'll focus on a few global boolean flags that can be used by administrators to apply restrictions on all processes. Simply note that process- or user-specific restrictions can also be applied by using policies.

execmem

This boolean flag enforces the W^X policy by preventing anonymous (non-file backed) memory mappings and writable private mappings to be made executable.

// https://github.com/SELinuxProject/selinux-kernel/blob/6b9921976f0861e04828b3aff66696c1f3fd900d/security/selinux/hooks.c#L2421
#ifndef CONFIG_PPC32
	if ((prot & PROT_EXEC) && (!file || (!shared && (prot & PROT_WRITE)))) {
		/*
		 * We are making executable an anonymous mapping or a
		 * private file mapping that will also be writable.
		 * This has an additional check.
		 */
		int rc = task_has_perm(current, current, PROCESS__EXECMEM);
		if (rc)
			return rc;
	}
#endif

This flag, and all other boolean flags, can be controlled using the following commands.

# list boolean flags
semanage boolean -l

# get the value of a specific flag
getsebool deny_execmem

# set the value of a flag
setsebool -P deny_execmem 1

execheap

This boolean flag enforces whether a process's heap can be marked executable. It is a variant of execmem whereas it only applies to heap memory.

execstack

This boolean flag enforces whether a process's or thread's stack can be marked executable. It is a variant of execmem whereas it only applies to the stack.

deny_ptrace

When set, the deny_ptrace boolean disables the ptrace system call entirely. It has the potential to, effectively, make any ptrace-based injection technique unusable. It is the equivalent of setting Yama's ptrace_scope parameter to something between 2 and 3; where the system call is disabled even for users with the CAP_SYS_PTRACE capability, but the ability the re-enable the system call, if necessary.

AppArmor

AppArmor is an other security module for Linux that constrains how a program (its processes) may interact with the system. It can control, for example, what files a process may read or write, what capabilities it can acquire, and integrates with various facilities such as mount points, signals, DBus, etc.

It is designed around the context of profiles, applied to processes, based on declarative permissions. If a permission isn't granted by an AppArmor profile, the operation is denied by the kernel. For example, the below profile for redshift specifies enables it to communicate with the X11 server or Wayland, using DBus and open files matching a particular pattern.

# /etc/apparmor.d/usr.bin.redshift
/usr/bin/redshift {
  #include <abstractions/base>
  #include <abstractions/nameservice>
  #include <abstractions/dbus-strict>
  #include <abstractions/wayland>
  #include <abstractions/X>

  dbus send
       bus=system
       path=/org/freedesktop/GeoClue2/Client/[0-9]*,

  dbus receive
       bus=system
       path=/org/freedesktop/GeoClue2/Manager,

  # Allow but log any other dbus activity
  audit dbus bus=system,

  owner @{HOME}/.config/redshift.conf r,
  owner /run/user/*/redshift-shared-* rw,

  # Site-specific additions and overrides. See local/README for details.
  #include <local/usr.bin.redshift>
}

Distributions provide a relatively small set of AppArmor profiles, but there is usually profiles for common network-exposed services; BIND9, MariaDB, tor or Squid. Profiles for softwares with high configurability potential, such as Apache, are best designed by system administrators.

Detecting Injection

Although it's much less efforts to have good preventative measures in place, some things might get through. In this section, I'll present facilities and techniques to detect when a technique is used or after they've been used (i.e. when it's too late). To this end, let's suppose that there are no preventative measures in place and that an attacker is targeting a system.

auditd

auditd is Linux's audit subsystem's events collection engine. It generates event messages for rules specified by the administrator. It can, for example, generate messages for any invocation of a syscall with particular arguments, and tag events for triaging. It can also watch filesystem files or hierarchies for various types of accesses or modifications. Generated messages can be logged locally and/or forwarded to a central logging server.

While it does its job somewhat well, auditd, is severely limited on some aspects. For instance, it does not support any kind of subpath-based filesystem events filtering. In other words, accesses to the .ssh directory of users' home directories cannot be monitored with something along the lines of /home/*/.ssh/. Instead, one has to specify a filesystem watchpoint at /home/ and then filter the events using another tool. While this behavior works, it is exceptionally noisy in busy environments. For example, this makes it particularly noisy to monitor for injection techniques leveraging /proc/<pid>/mem as every writes under /proc/ need to be logged. As such, its much better to forward the logs to a logging server with filtering capabilities that will, in turn, trigger alerts on interesting events.

Similarly, auditd misses some events when they're namespaced; i.e. in containers. This limitation makes it impossible, for example, to detect some injection techniques being used when processes run in a container. In the same spirit, the subsystem, on some filesystem operations, generate events that only contain a filename (not the full path. This makes it pretty difficult for analysts to assess quickly whether something should be further investigated.

In any case, like any ruleset, if the auditd rules aren't exhaustive, some attacks will fly through undetected (it could be argued that some incidents will pass through anyways, but). Make sure you stay up to date with the latest adversary techniques and have various telemetry collection systems to avoid creating blind spots.

A process injection detection ruleset for auditd is available at https://github.com/antifob/linux-prinj/blob/main/auditd.rules

Detecting tracees

While auditd provides detailed coverage over a system, sometimes, it's necessary to perform a manual inspection. Whenever a process is attached to by another process, the kernel marks it in the /proc/<pid>/status file under the TracerPid entry.

# list all tracers
grep ^TracerPid: /proc/*/status

# list what processes trace what
awk '/^TracerPid:/{if($2!="0"){split(FILENAME,f,"/");print $2" -> "f[3];}}' /proc/*/status

This technique isn't suitable for real-time monitoring, however, and usage of a system such as auditd or an eBPF-based monitor is strongly recommended.

Memory mappings

The /proc/<pid>/maps file provides information on the virtual address space of a process. It indicates, for each memory segment:

their beginning and end addresses;
their protections (read, write, executable and private or shared);
their offset (matches the executable's LOAD sections' offset);
the major and minor device number of the backing file (when file-backed);
the inode of the backing file (when file-backed);
the path of the backing file.

It's important to note that file-backed memory segments have a 1:1 mapping with a file region (possibly the whole file). As such, the memory region should be identical to the data on disk. Dynamically-allocated memory used for computation, on the other end, is not file-backed, or anonymous.

When inspecting a process, one can check that the file-backed memory region matches its representation on disk to evaluate whether it was altered. For example, writing a payload to a code cave, or overwriting instructions, is noticeable when comparing memory and file.

Signals disposition

As I indicated in part 2 of this series, a malware can, as a persistence mechanism, hook itself into a process's execution path by way of the signals facility. It can either make use of an unused signal number to register its own handling routine, or hook an existing handler.

For defenders, the question is: how can one check that a process's signal handlers are legitimate?

To inspect a process's signal disposition, the /proc/[pid]/status file can be inspected. The SigCgt field is a bitwise mask of "caught signals" (signals for which there is a handler). The other fields indicate the queued/pending signals, signals that are blocked (queued but not delivered) and ignored (never queued for delivery).

While we have some information available about a process's signal handlers, it is impossible to assert, without knowledge of the program's internals, that the handlers are legitimate, nor is there a facility to collect the address of these handlers. Ironically, one can use process injection to dump a process's signal handlers, or use ptrace to track how a process behaves with particular signal inputs.

Note that, usually, handlers are file-backed. If a handler is positioned in an anonymous memory segment, it, effectively, means that the handler's code was dynamically generated. While it is not impossible, it is rare enough to be a serious indicator of something potentially malicious.

Timers

The /proc/<pid>/timers file provides information about the POSIX timers that were created by a process. For each timer, it provides 4 fields: the timer identifier, the signal number used for delivery (only valid when using signal delivery), the notification attributes and the clock source identifier. For example, the below listing is associated with the program at https://github.com/antifob/linux-prinj/blob/main/misc/proc-timers It shows two (2) signals; using thread notification and signal delivery, respectively. Note that timers using setitimer are not listed in the timers file.

Analyzing a process's timers require intimate knowledge of how a program works. Many softwares don't use timers, but many others do. Assuming that the program does make use of timers, one can see that the thread-backed timer, above, has the handling code's address stored in the signal: field. This information can be used to verify that the code resides in a file-backed memory segment and that it has not been tampered with by comparing memory with the data on disk.

Since the setitimer timers are not listed in this file, one has to rely on regular signal disposition analysis (documented above) for the SIGALRM, SIGVTALRM and SIGPROF signals.

Threads

The /proc/<pid>/task/ directory provides a list of thread/task of a process. Each process has at least one (1) thread (the main thread) and each threads created will have an entry in that directory. And since threads have a set of private data, one can find it in the thread's directory. For example, each thread has its own stack and this is reflected in /proc/pid/task/tid/maps.

When one knows how a program is implemented, it is possible to evaluate whether it is possible for a process to have threads and respond accordingly. For example, the cat binary is single-threaded. If one finds an instance of cat that has multiple threads, it is highly suspicious.

Child processes

The /proc/<pid>/task/<pid>/children file provides the list of children a process has. Since the line between threads and children is blurried, it has the same indicators as threads; mentioned above.

What we've seen so far

In this article, I covered different ways to prevent and detect various injection techniques. As with everything, there's no silver bullet. However, a few simple steps can be taken to improve the security of systems. The simplest "quick win" one could take away for the vast majority of environments will be a strong (2 or 3) Yama ptrace_scope policy. Additionally, activating SELinux will make attackers' life much more difficult in general and using it in enforcement mode is strongly encouraged; even if it's just to deny execmem, execstack and execheap. In any case, defenders should assume that, to some degree, these preventative measures will fail and that efficient telemetry systems are installed.

Articles in this series

References

Unless ptrace_scope is set to 3. ↩︎

#articles #linux #process injection

git:d7833cd973f30e39d609812aa0d6a6d1df985019