Structs
anonymous struct for
BPF_BTF_LOADanonymous struct used by
BPF_MAP_*_ELEM commandsanonymous struct used by
BPF_*_GET_*_IDanonymous struct used by
BPF_OBJ_GET_INFO_BY_FDanonymous struct used by
BPF_MAP_CREATE commandanonymous struct used by
BPF_OBJ_* commandsanonymous struct used by
BPF_PROG_ATTACH/DETACH commandsanonymous struct used by
BPF_PROG_LOAD commandanonymous struct used by
BPF_PROG_TEST_RUN commandanonymous struct used by
BPF_PROG_QUERY commandKey of a
BPF_MAP_TYPE_LPM_TRIE entryUse
bpf_sock_addr struct to access socket fields and sockaddr struct passed
by user and intended to be used by socket (e.g. to bind to, depends on
attach attach type).User
bpf_sock_ops struct to access socket values and specify request ops
and their replies.
Some of this fields are in network (bigendian) byte order and may need
to be converted before use (bpf_ntohl() defined in samples/bpf/bpf_endian.h).
New fields can only be added at the end of this structureArguments for the clone3 syscall.
POSIX 1003.1g - ancillary data object information
Ancillary data consits of a sequence of pairs of
(
cmsghdr, cmsg_data[])IA64 and
x86_64 need to avoid the 32-bit padding at the end,
to be compatible with the i386 ABIfrom struct
btrfs_ioctl_file_extent_same_infofrom struct
btrfs_ioctl_file_extent_same_argsAnd dynamically-tunable limits and defaults:
Structure for
FS_IOC_FSGETXATTR[A] and FS_IOC_FSSETXATTR.A waiter for vectorized wait.
Cache for
getcpu() to speed it up. Results might be a short time
out of date, but will be faster.Internet address.
struct
inotify_event - structure read from the inotify device for each eventread() from
/dev/aio returns these structures.Filled with the offset for
mmap(2)IO completion data structure (Completion Queue Entry)
Passed in for
io_uring_setup(2). Copied back with updated info on successIO submission data structure (Submission Queue Entry)
we always use a 64bit
off_t when communicating
with userland. its up to libraries to do the
proper padding and aio_error abstractionBerkeley style UIO structures
Request struct for multicast socket ops
The generic
ipc64_perm structure:
Note extra padding because this structure is passed back and forth
between kernel and user space.These are used to wrap system calls.
See architecture code for ugly details..
Obsolete, used only for backwards compatibility and libc5 compiles
Slot for
KCMP_EPOLL_TFDlegacy timeval structure, only embedded in structures that
traditionally used ‘timeval’ to pass time intervals (not absolute times).
Do not add new users. If user space fails to compile here,
this is probably because it is not y2038 safe and needs to
be changed to use another interface.
This structure is used to hold the arguments that are used when
loading kernel binaries.
From fs/readir.c
For recvmmsg/sendmmsg
Obsolete, used only for backwards compatibility and libc5 compiles
message buffer for msgsnd and msgrcv calls
As we do 4.4BSD message passing we use a 4.4BSD message passing
system, not 4.3. Thus
msg_accrights(len) are now missing. They
belong in an obscure libc emulation or the bin.buffer for msgctl calls
IPC_INFO, MSG_INFOGeneric
msqid64_ds structure.single taken branch record layout:
Hardware
event_id to monitor via a performance monitoring event:Structure of the page that can be mapped via mmap
Structure used by below
PERF_EVENT_IOC_QUERY_BPF command
to query bpf programs attached to the same perf tracepoint
as the given perf event.This structure provides new memory descriptor
map which mostly modifies
/proc/pid/stat[m]
output for a task. This mostly done in a
sake of checkpoint/restore functionality.Per-thread list head:
Support for robust futexes: the kernel cleans up held futexes at thread exit time.
struct
rseq_cs is aligned on 4 * 8 bytes to ensure it is always
contained within a single cache-line. It is usually declared as
link-time constant data.struct rseq is aligned on 4 * 8 bytes to ensure it is always
contained within a single cache-line.
Extended scheduling parameters data structure.
struct
seccomp_data - the format the BPF program executes over.
@nr: the system call number
@arch: indicates system call convention as an AUDIT_ARCH_* value
as defined in <linux/audit.h>.
@instruction_pointer: at the time of the system call.
@args: up to 6 system call arguments always stored as 64-bit values
regardless of the architecture.semop system calls takes an array of these.
Obsolete, used only for backwards compatibility and libc5 compiles
Serial input interrupt line counters – external structure
Four lines can interrupt: CTS, DSR, RI, DCD
Serial interface for controlling ISO7816 settings on chips with suitable
support. Set with TIOCSISO7816 and get with TIOCGISO7816 if supported by
your platform.
Multiport serial configuration structure — external structure
Serial interface for controlling RS485 settings on chips with suitable
support. Set with TIOCSRS485 and get with TIOCGRS485 if supported by your
platform. The set function returns the new state, with any unsupported bits
reverted appropriately.
Obsolete, used only for backwards compatibility and libc5 compiles
Obsolete, used only for backwards compatibility
kill()
POSIX.1b signals
SIGCHLD
SIGPOLL
SIGSYS
POSIX.1b timers
user accessible metadata for
SK_MSG packet hook, new fields must
be added to the end of this structure1003.1g requires
sa_family_t and that sa_data is char.This matches struct stat64 in glibc2.1, hence the absolutely
insane amounts of padding around
dev_t.ARM needs to avoid the 32-bit padding at the end, for consistency
between EABI and OABI
Structures for the extended file attribute retrieval system call
(statx()).
Timestamp structure for the timestamps in struct statx.
syscall interface - used (mainly by NTP daemon)
to discipline kernel clock oscillator
Note on 64bit base and limit is ignored and you cannot set DS/ES/CS
not to the default values if you still want to do syscalls. This
call is more for 32bit mode therefore.
Most architectures have straight copies of the x86 code, with
varying levels of bug fixes on top. Usually it’s a good idea
to use this generic version instead, but be careful to avoid
ABI changes.
New architectures should not provide their own version.
user accessible metadata for XDP packet hook
new fields must be added to the end of this structure
Enums
values to program into
branch_sample_type when PERF_SAMPLE_BRANCH is setThe format of the data returned by
read() on a perf event fd
as specified by attr.read_format:Bits that can be set in
attr.sample_type to request information
in the overflow packets.perf_event_typeGeneralized hardware cache events:
Generalized performance event
event_id types, used by the
attr.event_id parameter of the sys_perf_event_open()
syscall:
Common hardware events, generalized by the kernel:Values to determine ABI of the registers dump.
Special “software” events provided by the kernel, even if the hardware
does not support performance events. These events measure various
physical and sw events of the kernel (and allow the profiling of them as
well):
attr.type
Constants
/proc/sys/abi
default handler for coff binaries
default handler for ELF binaries
default handler for procs using lcall7
default handler for an libc.so ELF interp
fake target utsname information
tracing flags
disable randomization of VA space
switch between adjtime/adjtimex modes
estimated time error
frequency offset
maximum time error
select microsecond resolution
select nanosecond resolution
Mode codes (timex.mode)
time offset
read-only adjtime
read-only adjtime
add ‘time’ to current time
clock status
set TAI offset
tick value
pll time constant
Algorithm sockets
AppleTalk DDPAsh
ATM PVCs
ATM SVCs
Amateur Radio AX.25
Bluetooth sockets
Multiprotocol bridge
CAIF sockets
Controller Area Network
Reserved for
DECnet projectAcorn Econet
Native InfiniBand addressIEEE802154 sockets
Internet IP Protocol
IP version 6
Novell IPX
IRDA sockets
mISDN socketsIUCV sockets
Kernel Connection Multiplexor
PF_KEY key management APILinux LLC
POSIX name for
AF_UNIXFor now..
MPLS
Reserved for 802.2LLC project
Amateur Radio NET/ROM
NFC sockets
Packet family
Phonet sockets
PPPoX socketsQualcomm IPC Router
RDS sockets
Amateur Radio X.25 PLP
Alias to emulate 4.4BSD
RxRPC socketsSecurity callback pseudo AF
smc sockets: reserve number for
PF_SMC protocol family that reuses
AF_INET address familyLinux SNA Project (nutters!)
TIPC sockets
Unix domain sockets
Supported address families.
vSockets
Wanpipe API Sockets
Reserved for X.25 project
XDP sockets
bytes of args + environ for exec()
For the close wait times, 0 means wait forever for serial port to
flush its output. 65535 means don’t wait at all.
Allow empty relative pathname
Special value used to indicate openat should use the current working directory.
Suppress terminal automount traversal
Apply to the entire subtree
Remove directory instead of unlinking file.
Type of synchronisation required from statx()
Follow symbolic links.
Do not follow symbolic links.
hang up
Mode for
BPF_FUNC_skb_adjust_room helper.alu mode in double word width
flags for
BPF_MAP_UPDATE_ELEM command
create new element or update existingsign extending arithmetic shift right
function call
ld/ldx fields
double word (64-bit)
change endianness of a register
flags for endianness conversion:
update existing element
function return
(symbol + offset) or addr(symbol + offset) or addrtp name
tp name
filename + offsetfilename + offsetdest is blackholed; can be dropped
fragmentation required to fwd
fwding is not enabled on ingress
packet is not forwarded
no neighbor entry for nh
dest not allowed; can be dropped
lookup successful
dest is unreachable; can be dropped
fwd requires encapsulation
DIRECT: Skip the FIB rules and go to FIB table associated with device
OUTPUT: Do lookup from egress perspective; default is ingress
cgroup-bpf attach flags used in
BPF_PROG_ATTACH commandIf
BPF_F_ANY_ALIGNMENT is used in BPF_PROF_LOAD command, the
verifier will allow any alignment whatsoever. On platforms
with strict alignment requirements for loads ands stores (such
as sparc and mips) the verifier validates that all loads and
stores provably follow this requirement. This flag turns that
checking and enforcement off.BPF_FUNC_perf_event_output for sk_buff input context.Current network namespace
flags used by
BPF_FUNC_get_stackid only.BPF_FUNC_l3_csum_replace and BPF_FUNC_l4_csum_replace flags.BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
BPF_FUNC_perf_event_read_value flags.BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags.spin_lock-ed
map_lookup/map_updateInstead of having one common LRU list in the
BPF_MAP_TYPE_LRU_[PERCPU_] HASH map, use a percpu LRU list
which can scale and perform better.
Note, the LRU nodes (including free nodes) cannot be moved
across different LRU lists.flags for
BPF_MAP_CREATE commandSpecify numa node during map creation
BPF_FUNC_l4_csum_replace flags.flags for
BPF_PROG_QUERYFlags for accessing BPF object
BPF_FUNC_skb_store_bytes flags.flags for both
BPF_FUNC_get_stackid and BPF_FUNC_get_stack.Flag for
stack_map, store build_id+offset instead of pointerIf
BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
verifier will perform strict alignment checking as if the kernel
has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
and NET_IP_ALIGN defined to 2.BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags.flags used by
BPF_FUNC_get_stack only.BPF_FUNC_skb_set_tunnel_key flags.Zero-initialize hash function seed. This should only be used for testing.
Mode for
BPF_FUNC_skb_load_bytes_relative helper.LE is unsigned, ‘<=’
LT is unsigned, ‘<’
Extended instruction set based on top of classic BPF
instruction classes
jmp mode in word width
jmp encodings
jump !=
SGE is signed ‘>=’, GE in x86
SGT is signed ‘>’, GT in x86
SLE is signed, ‘<=’
SLT is signed, ‘<’
Encapsulation type for
BPF_FUNC_lwt_push_encap helper.BPF syscall commands, see
bpf(2) man-page for details.alu/jmp fields
mov reg to reg
create new element if it didn’t exist
Generic BPF return codes which all BPF program types may support.
The values are binary compatible with their
TC_ACT_* counter-part to
provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
programs.Note that tracing related programs such as
BPF_PROG_TYPE_{KPROBE,TRACEPOINT,PERF_EVENT,RAW_TRACEPOINT}
are not subject to a stable API since kernel internal data
structures can change from release to release and may
therefore break existing tracing BPF programs. Tracing BPF
programs correspond to a specific kernel which is to be
analyzed, and not a specific kernel and all future ones.when
bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
offset to another bpf functionwhen
bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fdRegister numbers
Calls BPF program when an active connection is established
Mask of all currently supported cb flags
Get base RTT. The correct value is based on the path and may be
dependent on the congestion control algorithm. In general it indicates
a congestion threshold. RTTs above this indicate congestion
If connection’s congestion control needs ECN
Calls BPF program when a passive connection is established
Called when skb is retransmitted.
Arg1: sequence number of 1st byte
Arg2: # segments
Arg3: return value of
tcp_transmit_skb (0 => success)Called when an RTO has triggered.
Arg1: value of
icsk_retransmits
Arg2: value of icsk_rto
Arg3: whether RTO has expiredDefinitions for
bpf_sock_ops_cb_flagsShould return initial advertized window (in packets) or -1 if default
value should be used
Called when TCP changes state.
Arg1:
old_state
Arg2: new_stateCalls BPF program right before an active connection is initialized
Called on listen(2), right after socket transition to LISTEN state.
Should return SYN-RTO value to use or -1 if default value should be used
List of known BPF
sock_ops operators.
New entries can only be added at the enduser space need an empty entry to identify end of a trace
couldn’t get
build_id, fallback to ipwith valid
build_id and offsetNow a valid state
List of TCP states. There is a build check in net/ipv4/tcp.c to detect
changes between the TCP and BPF versions. Ideally this should never happen.
If it does, we need to add code to convert them before calling
the BPF
sock_ops function.Leave at the end!
convert to big-endian
convert to little-endian
exclusive add
SIGBUS
si_codes
invalid address alignmentnon-existent physical address
/proc/sys/bus/isa
hardware memory error detected in process but not consumed: action optional
hardware memory error consumed on a machine check: action required
object specific hardware error
Allow configuration of audit via unicast netlink socket.
Allow reading the audit log via multicast netlink socket.
Allow writing the audit log via unicast netlink socket.
Allow preventing system suspends.
POSIX-draft defined capabilities.
In a system with the
_POSIX_CHOWN_RESTRICTED option defined, this
overrides the restriction of changing file ownership and group ownership.Override all DAC access, including ACL execute access if
_POSIX_ACL is defined. Excluding DAC access covered by
CAP_LINUX_IMMUTABLE.Overrides all DAC restrictions regarding read and search on files
and directories, including ACL restrictions if
_POSIX_ACL is
defined. Excluding DAC access covered by CAP_LINUX_IMMUTABLE.Overrides all restrictions about allowed operations on files, where
file owner ID must be equal to the user ID, except where
CAP_FSETID
is applicable. It doesn’t override MAC and DAC restrictions.Overrides the following restrictions that the effective user ID
shall match the file owner ID when setting the
S_ISUID and S_ISGID
bits on that file; that the effective group ID (or one of the
supplementary group IDs) shall match the file owner ID when setting
the S_ISGID bit on that file; that the S_ISUID and S_ISGID bits are
cleared on successful return from chown(2) (not implemented).Allow locking of shared memory segments
Allow mlock and mlockall (which doesn’t really have anything to do with IPC)
Override IPC ownership checks
Overrides the restriction that the real or effective user ID of a
process sending a signal must match the real or effective user ID
of the process receiving the signal.
Allow taking of leases on files.
Allow modification of
S_IMMUTABLE and S_APPEND file attributesAllow MAC configuration or state changes.
Override MAC access.
Allow the privileged aspects of
mknod().Allow interface configuration
Allow administration of IP firewall, masquerading and accounting
Allow setting debug option on sockets
Allow modification of routing tables
Allow setting arbitrary process / process group ownership on sockets
Allow binding to any address for transparent proxying (also via
NET_RAW)
Allow setting TOS (type of service)
Allow setting promiscuous mode
Allow clearing driver statistics
Allow multicasting
Allow read/write of device-specific registers
Allow activation of ATM control socketsAllows binding to TCP/UDP sockets below 1024
Allows binding to ATM VCIs below 32
Allow broadcasting, listen to multicast
Allow use of RAW sockets
Allow use of PACKET sockets
Allow binding to any address for transparent proxying (also via
NET_ADMIN)Allows setgid(2) manipulation
Allows setgroups(2)
Allows forged gids on socket credentials passing.
Linux-specific capabilities
Without VFS support for capabilities:
Transfer any capability in your permitted set to any pid,
remove any capability in your permitted set from any pid
With VFS support for capabilities (neither of above, but)
Add any capability from current’s capability bounding set
to the current process’ inheritable set
Allow taking bits out of capability bounding set
Allow modification of the securebits for a process
Allows set*uid(2) manipulation (including fsuid).
Allows forged pids on socket credentials passing.
Allow configuring the kernel’s syslog (printk behaviour).
Allow configuration of the secure attention key
Allow administration of the random device
Allow examination and configuration of disk quotas
Allow setting the domainname
Allow setting the hostname
Allow calling
bdflush()
Allow mount() and umount(), setting up new smb connection
Allow some autofs root ioctls
Allow nfsservctl
Allow VM86_REQUEST_IRQ
Allow to read/write pci config on alpha
Allow irix_prctl on mips (setstacksize)
Allow flushing all cache on m68k (sys_cacheflush)
Allow removing semaphores
Used instead of CAP_CHOWN to “chown” IPC message queues, semaphores and shared memory
Allow locking/unlocking of shared memory segment
Allow turning swap on/off
Allow forged pids on socket credentials passing
Allow setting readahead and flushing buffers on block devices
Allow setting geometry in floppy driver
Allow turning DMA on/off in xd driver
Allow administration of md devices (mostly the above, but some extra ioctls)
Allow tuning the ide driver
Allow access to the nvram device
Allow administration of apm_bios, serial and bttv (TV) device
Allow manufacturer commands in isdn CAPI support driver
Allow reading non-standardized portions of pci configuration space
Allow DDI debug ioctl on sbpcd driver
Allow setting up serial ports
Allow sending raw qic-117 commands
Allow enabling/disabling tagged queuing on SCSI controllers and sending arbitrary SCSI commands
Allow setting encryption key on loopback filesystem
Allow setting zone reclaim policyAllow use of reboot()
Allow use of
chroot()Insert and remove kernel modules - modify kernel without limit
Allow raising priority and setting priority on other (different UID) processes.
Allow configuration of process accounting
Allow
ptrace() of any processAllow ioperm/iopl access
Allow sending USB messages to any device via
/dev/bus/usbOverride resource limits. Set resource limits.
Allow manipulation of system clock.
Allow configuration of tty devices.
Allow triggering something that will wake the system.
c_cflag bit meaninginput baud rate
stopped child has continued
child terminated abnormally
There is an additional set of SIGTRAP
si_codes used by ptrace
that are of the form: ((PTRACE_EVENT_XXX << 8) | SIGTRAP)
SIGCHLD si_codes
child has exitedchild was killed
child has stopped
traced child has trapped
The IDs of the various system clocks (for POSIX.1b interval timers):
The driver implementing this got removed. The clock ID is kept as
a place holder. Do not reuse!
sizeof first published struct
sizeof second published struct
sizeof third published struct
clear the TID in the child
set the TID in the child
Flags for the
clone3() syscall.
Clear any signal handler and reset to SIG_DFL.Unused, ignored
set if open files shared between processes
set if fs info shared between processes
Clone into a specific cgroup given the right permissions.
Clone io context
New cgroup namespace
New ipc namespace
New network namespace
New mount namespace group
New pid namespace
cloning flags intersect with CSIGNAL so can be used with unshare and clone3()
syscalls only:
New time namespace
New user namespace
New utsname namespace
set if we want to have the same parent as the cloner
set the TID in the parent
set if a pidfd should be placed in parent
set if we want to let tracing continue on the child too
create a new TLS for the child
set if signal handlers and blocked signals shared
share system V
SEM_UNDO semanticsSame thread group?
set if the tracing process can’t force
CLONE_PTRACE on this cloneset if the parent wants the child to wake it up on
mm_releaseset if VM shared between processes
This cluster is free
This cluster is backing a transparent huge page
This cluster has no next cluster
mark or space (stick) parity
Flag
swap_map continuation for full countflow control
cloning flags:
signal mask to be sent at exit
Binary emulation
arlan wireless driver
Busses
CTL_BUS names:CPU stuff (speed scaling, etc)
Debugging
Devices
frv specific sysctls
Filesystems
Top-level names:
how many path components do we allow in a call to sysctl
In other words, what is the largest acceptable value for the nlen member
of a struct
sysctl_args_t to have?Networking
frv power management
removal breaks strace(1) compilation
s390 debug
sunrpc debug
VM management
CTL_DEV names:/proc/sys/dev/cdrom
/proc/sys/dev/ipmi
/proc/sys/dev/mac_hid/proc/sys/dev/parport
/proc/sys/dev/parport/default
/proc/sys/dev/parport/parport n/devices/
/proc/sys/dev/parport/parport n/devices/device n
/proc/sys/dev/parport/parport n
/proc/sys/dev/raid
/proc/sys/dev/scsi
Used by the DIPC package, try and avoid reusing it
Types of directory notifications that may be requested.
File accessed
File changed attibutes
File created
File removed
File modified
Don’t remove notifier
File renamed
Kernel internal flags invisible to userspace
Root squash enabled (for v1 quota format)
Quota stored in a system file
16
these are defined by POSIX and also present in glibc’s dirent.h
SIGEMT
si_codes
tag overflowSet the Edge Triggered behaviour for the target file descriptor
Set exclusive wakeup mode for the target file descriptor
Epoll event masks
Set the One Shot behaviour for the target file descriptor
Request the handling of system wakeup events so as to prevent system suspends
from happening while those events are being processed.
Flags for
epoll_create1().Valid opcodes to issue to
sys_epoll_ctl()fcntl, for BSD compatibility
userspace function ptrs point to descriptors (signal handling)
For
F_[GET|SET]FLThis allows for 1024 file descriptors: if
NR_OPEN is ever grown
beyond that you’ll have to change this too. But 1024 fd’s seem to be
enough even for such “real” unices like OSF/1, so hopefully this is
one limit that doesn’t have to be changed again.extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions
Some arches already define FIOQSIZE due to a historical
conflict with a Hayes modem-specific ioctl value.
Socket-level I/O control calls.
trap on condition
floating point divide by zero
floating point invalid operation
floating point overflow
floating point inexact result
subscript out of range
floating point underflow
undiagnosed floating-point exception
SIGFPE
si_codes
integer divide by zerointeger overflow
the read-only stuff doesn’t really belong here, but any other place is
probably as bad and I don’t want to create yet another include file.
Max chars for the interface; each fs may differ
system-wide maximum number of aio requests
current system-wide number of aio requests
writes to file may only append
btree format dir
One or more compressed clusters
Compress file
dirsync behaviour (directories only)
Reserved for compression usage…
int: directory notification enabled
disc quota usage statistics and control
/proc/sys/fs/quota/
Inode used for large EA
Encryption algorithms
Removed, do not use.
Removed, do not use.
End compression flags — maybe not all used
Encrypted file
Reserved for ext4
Extents
User modifiable flags
User visible flags
Reserved for ext4
AFS directory
Immutable file
hash-indexed directory
Reserved for ext4
inotify submenu
Reserved for ext3
File system encryption support
Policy provided via an ioctl on the topmost directory
Parameters for passing an encryption key into the kernel keyring
int: leases enabled
int: maximum time to wait for a lease break
int:maximum number of dquots that can be allocated
int:maximum number of filedescriptors that can be allocated
int:maximum number of inodes that can be allocated
int:maximum number of
super_blocks that can be allocatedStructure that userspace passes to the kernel keyring
do not update atime
Don’t compress
Do not cow file
do not dump file
file tail should not be merged
int:current number of allocated dquots
int:current number of allocated filedescriptors
CTL_FS names:int:current number of allocated
super_blocksocfs2
int: overflow GID
int: overflow UID
use master key directly
Create with parents projid
reserved for ext2 lib
Inode flags (
FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)Synchronous updates
Top of directory hierarchies
Undelete
all writes append
CoW extent size allocator hintuse DAX for IO
extent size allocator hint
inherit inode extent size
use filestream allocator
no DIFLAG for this
file cannot be modified
do not update access time
do not defragment
do not include in backups
disallow symlink creation
preallocated file extents
create with parents projid
Flags for the
fsx_xflags field
data in realtime volumecreate with rt bit set
all writes synchronous
struct: control xfs parameters
fs on-disk file types.
Flags to specify the bit length of the futex word for futex2 syscalls.
Currently, only 32 is supported.
bitset with all bits set for the
FUTEX_xxx_BITSET OPs to request a
match of any bit.*(int *)UADDR2 += OPARG;
*(int *)UADDR2 &= ~OPARG;
if (oldval == CMPARG) wake
if (oldval >= CMPARG) wake
if (oldval > CMPARG) wake
if (oldval <= CMPARG) wake
if (oldval < CMPARG) wake
if (oldval != CMPARG) wake
Use (1 << OPARG) instead of OPARG.
*(int *)UADDR2 |= OPARG;
*(int *)UADDR2 = OPARG;
*(int *)UADDR2 ^= OPARG;
The kernel signals via this bit that a thread holding a futex
has exited without unlocking the futex. The kernel also does
a
FUTEX_WAKE on such futexes, after setting the bit, to wake
up any possible waiters:The rest of the robust-futex field is for the TID:
Second argument to futex syscall
Are there any waiters for this robust futex:
Max numbers of elements in a
futex_waitv_t array.Set/Get seals
Cancel a blocking posix lock; internal use only until we expose an
asynchronous lock api to userspace:
Create a file descriptor with
FD_CLOEXEC set.for old implementation of bsd
flock()using ‘struct flock64’
Set/Get write life time hints.
{GET,SET}_RW_HINT operate on the
underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
the specific file.Request nofications on a directory.
See below for events that may be notified.
Open File Description Locks
Check for file existence.
for posix
fcntl() and lockf()prevent future writes while mapped
prevent file from growing
Types of seals
prevent further seals from being set
prevent file from shrinking
prevent writes
Set and get of pipe page size array
get all semval’s
get semncnt
semctl Command Definitions.
get sempid
get semval
get semzcnt
Shift from CBAUD to CIBAUD
c_iflag bitsStructure used for setting quota information about file via quotactl
Following flags are used to specify which fields are valid
unimplemented instruction address
internal stack error
coprocessor error
illegal addressing mode
SIGILL
si_codes
illegal opcodeillegal operand
illegal trap
privileged opcode
privileged register
224.0.0.1
224.0.0.2
224.0.0.106
Address to accept any incoming messages.
Address to send to all hosts.
Address to loopback in software to local host.
127.0.0.1
224.0.0.255
Address indicating an error return.
Defines for Multicast INADDR
224.0.0.0
/proc/sys/fs/inotify/
max instances per user
max watches per user
Fixed constants first:
Initial setting for nfile rlimits
Hard limit for nfile rlimits
the following are legal, implemented events that user-space can watch for
File was accessed
All of the events - we build the list by hand so that we can add flags in
the future and not break backward compatibility. Apps will get only the
events that they originally wanted. Be sure to add new events here!
Metadata changed
Flags for
sys_inotify_init1.close
Unwrittable file closed
Writtable file was closed
Subfile was created
Subfile was deleted
Self was deleted
don’t follow a sym link
exclude events on unlinked objects
File was ignored
event occurred against dir
Network number for local host loopback.
add to the mask of an already existing watch
only create watches
File was modified
moves
File was moved from X
File was moved to Y
Self was moved
only send event once
special flags
only watch the path if it is a directory
File was opened
Event queued overflowed
the following are legal events. they are sent as needed to any watch
Backing fs was unmounted
Valid flags for the
aio_flags member of the struct iocb.
IOCB_FLAG_RESFD - Set if the aio_resfd member of the struct iocb is valid.
IOCB_FLAG_IOPRIO - Set if the aio_reqprio member of the struct iocb is valid.…and for the drivers/sound files…
Direction bits, which any architecture can choose to override
before including this file.
Let any architecture override either of the following before including this file.
8 best effort priority levels are supported
These are the io priority groups as implemented by CFQ. RT is the realtime
class, it always gets premium service. BE is the best-effort scheduling
class, the default for any process. IDLE is the idle scheduling class, it
is only served when no one else is using the disk.
Gives us 8 prio classes with 13-bits of data for each class
Fallback BE priority
cqe->flagsio_uring_enter(2) flagsio_uring_params->features flagssqe->fsync_flagsMagic offsets for the application to mmap the data it needs
io_uring_register(2) opcodes and argumentsattach to existing wq
clamp SQ/CQ ring sizes
app defines CQ size
io_uring_setup() flags
io_context is polledSQ poll thread
sq_thread_cpu is validsq_ring->flags
needs io_uring_enter wakeupsqe->timeout_flagsalways go async
select buffer from
sqe->buf_groupsqe->flags
use fixed filesetissue after inflight IO
like LINK, but stronger
links next sqe
resource get request flags
create if key is nonexistent
these fields are used by the DIPC package so the kernel as standard
should avoid using them if possible
make it distributed
fail if key exists
See ipcs
return error on wait
Version flags for semctl, msgctl, and shmctl commands
These are passed as bitflags or-ed with the actual command
this machine is the DIPC owner
Control commands used with semctl, msgctl and shmctl
see also specific commands in sem.h, msg.h and shm.h
remove resource
Set
ipc_perm optionsGet
ipc_perm optionsAuthentication Header protocol
IP option pseudo header for BEET
Compression Header Protocol
Datagram Congestion Control Protocol
IPv6 destination options
Exterior Gateway Protocol
Encapsulation Header
Encapsulation Security Payload protocol
IPv6 fragmentation header
Cisco GRE tunnels (rfc 1701,1702)
IPV6 extension headers
IPv6 hop-by-hop options
Internet Control Message Protocol
ICMPv6XNS IDP protocol
Internet Group Management Protocol
INET: An implementation of the TCP/IP protocol suite for the LINUX
operating system. INET is implemented using the BSD Socket
interface as the means of communication with the user level.
IPIP tunnels (older KA9Q tunnels use 94)
IPv6-in-IPv4 tunnelling
IPv6 mobility header
MPLS in IP (RFC 4023)
Multicast Transport Protocol
IPv6 no next header
Protocol Independent Multicast
PUP protocol
Raw IP packets
IPv6 routing header
RSVP Protocol
Stream Control Transport Protocol
Transmission Control Protocol
SO Transport Protocol Class 4
User Datagram Protocol
UDP-Lite (RFC 3828)
IPV6 socket options
RFC5014: Source address selection
obsolete
Bitmask constant declarations to help applications select out the
flow label and priority fields.
Flowlabel
RFC5082: Generalized Ttl Security Mechanism
IPV6_MTU_DISCOVER valuessame as
IPV6_PMTUDISC_PROBE, provided for symetry with IPv4
also see comments on IP_PMTUDISC_INTERFACEweaker version of
IPV6_PMTUDISC_INTERFACE, which allows packets to
get fragmented if they exceed the interface mtuThese definitions are obsolete
Advanced API (RFC3542) (1)
Note:
IPV6_RECVRTHDRDSTOPTS does not exist.Advanced API (RFC3542) (2)
RFC 5570
home address option
IPv6 TLV options.
IPX options
These need to appear somewhere around here
Proxy original addresses
Always DF
IP_MTU_DISCOVER values
Never send DF framesAlways use interface mtu (ignores dst pmtu) but don’t set DF flag.
Also incoming ICMP
frag_needed notifications will be ignored on
this socket to prevent accepting spoofed ones.Ignore dst pmtu
Use per route hints
BSD compatibility
c_lflag bitsNames of the interval timers, and structure defining a timer setting:
Comparison type - enum
kcmp_type.BSD process accounting parameters
int: flags for setting up video after ACPI sleep
int: boot loader type
int: PID of the process to notify on CAD
int: print compat layer messages
string: pattern for core-file names
int: use core or core.%pid
int: allow ctl-alt-del to reboot
string: domainname
string: path to uevent helper (deprecated)
int: hppa soft-power enable
int: hppa unaligned-trap enable
int: hz timer on or off
int: ia64 unaligned userland trap enable
int: unimplemented ieee instructions
int: rtmutex’s maximum lock depth
int: Maximum nr of threads in the system
string: modprobe path
int: Maximum size of a messege
int: Maximum message queue size
int: msg queue identifiers
int: Maximum system message pool size
Name translation
int:
NGROUPS_MAXint: enable/disable nmi watchdog
string: hostname
string: system release
int: system revision
CTL_KERN names:int: overflow GID
int: overflow UID
int: panic timeout
int: whether we will panic on an unrecovered
int: whether we will panic on an oops
int: call panic() in WARN() functions
ulong: bitmask to print system info on panic
int: PID # limit
turn htab reclaimation on/off on PPC
l2cr register on PPC
use nap mode for power saving
turn idle page zeroing on/off on PPC
struct: control printk logging parameters
int: tune printk ratelimiting
int: tune printk ratelimiting
table: profiling information
dir: pty driver
Random driver
int: randomize virtual address space
real root device to mount after initrd
Max queuable
Number of rt sigs queued
int: dumps of user faults
struct: maximum rights mask
struct: sysv semaphore limits
int: behaviour of dumps for setuid core
int: sg driver reserved buffer size
int: Maximum size of shared memory
long: Maximum shared memory segment
int: shm array identifiers
string: path to shm fs
reboot command on Sparc
int: serial console power-off halt
int: Sparc Stop-A enable
int: number of spinlock retries
int: Sysreq enable
int: various kernel tainted flags
int: unknown nmi panic flag
string: compile time info
These values match the ELF architecture values.
Unless there is a good reason that should continue to be the case.
Kexec file load interface flags.
kexec system call - It loads the new kernel to boot into.
kexec does not sync, or unmount filesystems so if you need
that to happen you need to do that yourself.
kexec flags for different usage scenarios
The artificial cap on the number of segments passed to
kexec_load.Key is built into kernel
Override the check on restricted keyrings
add to quota, reject if would overrun
not in quota
add to quota, permit even if overrun
allocating a user or user session keyring
authentication token / access credential / keyring
set if key is built in to the kernel
set if key type has been deleted
set if key has been invalidated
set if key consumes quota
set if key should not be removed
set if key had been revoked
set if key can be cleared by root without permission
set if key can be invalidated by root without permission
set if key is a user or user session keyring
set if key is being constructed in userspace
group permissions…
Positively instantiated
All the above permissions
Require permission to link
Require permission to read content
Require permission to search (keyring) or find (key)
Require permission to change attributes
The permissions required on a key that we’re looking up.
Require permission to view attributes
Require permission to update / modify
third party permissions…
possessor can create a link to a key/keyring
possessor can read key payload / view keyring
possessor can find a key in search / search a keyring
possessor can set key attributes
possessor can view a key’s attributes
possessor can update key payload / add link to keyring
user permissions…
ldt.h
Definitions of structures used with the
modify_ldt system call.
Maximum number of LDT entries supported.The size of each LDT entry.
links a file may have
Backwardly compatible definition for source code - trapped in a
32-bit world. If you find you need this, please consider using
libcap to untrap yourself…
User-level do most of the mapping between kernel and user
capabilities based on the version tag given by the kernel. The
kernel might be somewhat backwards compatible, but don’t bet on it.
Note,
cap_t, is defined by POSIX (draft) to be an “opaque” pointer to
a set of three capability sets. The transposition of 3*the
following structure to such a composite is better handled in a user
library since the draft standard requires the use of malloc/free etc..deprecated - use v3
Commands accepted by the
_reboot() system call.Magic values required to use
_reboot() system call.exclusive lock
This is a mandatory flock …
or’d with one of the above to prevent blocking
which allows concurrent read operations
which allows concurrent read & write ops
operations for bsd
flock(), also used by the kernel implementation shared lockremove lock
which allows concurrent write operations
deactivate these pages
Clear the
MADV_DONTDUMP flagdo inherit across fork
Explicity exclude from the core dump, overrides the coredump filter bits
don’t inherit across fork
don’t need these pages
common parameters: try to keep these consistent across architectures
free pages only if memory pressure
Worth backing with hugepages
poison a page for testing
Undo
MADV_WIPEONFORKKSM may merge identical pages
Not worth backing with hugepages
no further special treatment
reclaim these pages
expect random page references
remove these pages & resources
expect sequential page references
soft offline page for testing
KSM may not merge identical pages
will need these pages
Zero memory on fork, child only
don’t use a file
ETXTBSY
mark it as an executable
compatibility flags
Interpret addr exactly
MAP_FIXED which doesn’t unmap underlying mappingFrom
uapi/asm-generic/mman.h
stack-like segmentcreate a huge page mapping
Huge page size encoding when
MAP_HUGETLB is specified, and a huge page
size other than the default is desired. See hugetlb_encode.h.
All known huge page size encodings are provided here. It is the
responsibility of the application to know which sizes are supported on
the running system. See mmap(2) man page for details.pages are locked
do not block on IO
don’t check for reservations
0x0100 - 0x4000 flags are defined in asm-generic/mman.h
populate (prefault) pagetables
Changes are private
Share changes
share + validate extension flags
give out an address that is best suited for process/thread stacks
perform synchronous page faults for the mapping
0x01 - 0x03 are defined in linux/mman.h
Mask for type of mapping
For anonymous mmap, memory could be uninitialized
max frequency error (ns/s)
max phase error (ns)
max interval between updates (s)
maximum time constant (shift)
BPF has 10 general purpose 64-bit registers and stack frame.
size of the canonical input queue
size of the type-ahead buffer
MAX_SWAPFILES defines the maximum number of swaptypes: things which can
be swapped to. The swap type and the offset into that swap type are
encoded into pte’s and into pgoff_t's in the swapcache. Using five bits
for the type means that the maximum number of swapcache pages is 27 bits
on 32-bit-pgoff_t architectures. And that assumes that the architecture packs
the type/offset into the pte as 5/27 as well.lock all current mappings
lock all future mappings
lock all pages that are faulted in
enum
membarrier_cmd - membarrier system call command
@MEMBARRIER_CMD_QUERY: Query the set of supported commands. It returns
a bitmask of valid commands.
@MEMBARRIER_CMD_GLOBAL: Execute a memory barrier on all running threads.
Upon return from system call, the caller thread
is ensured that all running threads have passed
through a state where all memory accesses to
user-space addresses match program order between
entry to and return from the system call
(non-running threads are de facto in such a
state). This covers threads from all processes
running on the system. This command returns 0.
@MEMBARRIER_CMD_GLOBAL_EXPEDITED:
Execute a memory barrier on all running threads
of all processes which previously registered
with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
Upon return from system call, the caller thread
is ensured that all running threads have passed
through a state where all memory accesses to
user-space addresses match program order between
entry to and return from the system call
(non-running threads are de facto in such a
state). This only covers threads from processes
which registered with
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
This command returns 0. Given that
registration is about the intent to receive
the barriers, it is valid to invoke
MEMBARRIER_CMD_GLOBAL_EXPEDITED from a
non-registered process.
@MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
Register the process intent to receive
MEMBARRIER_CMD_GLOBAL_EXPEDITED memory
barriers. Always returns 0.
@MEMBARRIER_CMD_PRIVATE_EXPEDITED:
Execute a memory barrier on each running
thread belonging to the same process as the current
thread. Upon return from system call, the
caller thread is ensured that all its running
threads siblings have passed through a state
where all memory accesses to user-space
addresses match program order between entry
to and return from the system call
(non-running threads are de facto in such a
state). This only covers threads from the
same process as the caller thread. This
command returns 0 on success. The
“expedited” commands complete faster than
the non-expedited ones, they never block,
but have the downside of causing extra
overhead. A process needs to register its
intent to use the private expedited command
prior to using it, otherwise this command
returns -EPERM.
@MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
Register the process intent to use
MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always
returns 0.
@MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
In addition to provide memory ordering
guarantees described in
MEMBARRIER_CMD_PRIVATE_EXPEDITED, ensure
the caller thread, upon return from system
call, that all its running threads siblings
have executed a core serializing
instruction. (architectures are required to
guarantee that non-running threads issue
core serializing instructions before they
resume user-space execution). This only
covers threads from the same process as the
caller thread. This command returns 0 on
success. The “expedited” commands complete
faster than the non-expedited ones, they
never block, but have the downside of
causing extra overhead. If this command is
not implemented by an architecture, -EINVAL
is returned. A process needs to register its
intent to use the private expedited sync
core command prior to using it, otherwise
this command returns -EPERM.
@MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
Register the process intent to use
MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.
If this command is not implemented by an
architecture, -EINVAL is returned.
Returns 0 on success.
@MEMBARRIER_CMD_SHARED:
Alias to MEMBARRIER_CMD_GLOBAL. Provided for
header backward compatibility.Alias for header backward compatibility.
flags for
memfd_create(2) (unsigned int)Huge page size encoding when
MFD_HUGETLB is specified, and a huge page
size other than the default is desired. See hugetlb_encode.h.
All known huge page size encodings are provided here. It is the
responsibility of the application to know which sizes are supported on
the running system. See mmap(2) man page for details.min interval between updates (s)
Flags for mlock
Lock pages in range after they are faulted in, do not prefault
NTP userland likes the MOD_ prefix better
Policies
look up vma using address
preferred local allocation
return allowed memories
this policy wants migrate on fault
Migrate On protnone Reference On Node
Flags for
get_mempolicy
return next IL mode instead of node maskInternal flags that share the struct mempolicy flags word with
“mode flags”. These flags are allocated from bit 0 up, as they
are never OR’ed into the mode in mempolicy API arguments.
identify shared policies
Flags for
set_mempolicyalways last member of enum
Internal flags start here
Modifies ’_MOVE: lazy migrate on fault
Move pages owned by this process to conform to policy
Move every page to conform to policy
Flags for mbind
Verify existing pages in the mapping
MPOL_MODE_FLAGS is the union of all possible optional mode flags passed to
either set_mempolicy() or mbind().per-uid limit of kernel memory used by mqueue, in bytes
Parameters used to convert the timespec values:
number of entries in message map
<=
INT_MAX, max size of message (bytes)<=
INT_MAX, default max size of a message queueMSGMNI, MSGMAX and MSGMNB are default values which can be
modified by sysctl.
unused
max no. of segments
message segment size
number of system message headers
sendmmsg(): more messages comingSet
close_on_exec for file descriptor received through SCM_RIGHTSWe never have 32 bit fixups
Confirm path validity
copy (not remove) all queue messages
Nonblocking io
End of record
Fetch message from error queue
recv any msg except of specified type.
Send data in TCP SYN
Sender will send more
msgrcv options
no error if message is too big
Do not generate SIGPIPE
sendpage() internal : page frags are not sharedFlags we can use with send/ and recv.
Added those for 1003.1g not all are supported yet
Do not send. Only probe path f.e. for MTU
sendpage() internal : page may carry plain text and require encryptionsendpage() internal : do no apply policysendpage() internal : not the last pageipcs ctl commands
Synonym for
MSG_DONTROUTE for DECnetWait for a full request
recvmmsg(): block until 1+ packets availUse user data in kernel path
sync memory asynchronously
Directory modifications are synchronous
invalidate the caches
Update inode
I_version fieldthis is a
kern_mount callUpdate the on-disk
acm times lazilyAllow mandatory locks on an FS
Old magic mount flag and mask
Do not update access times.
Disallow access to device special files
Do not update directory access times
Disallow program execution
Ignore suid and sgid bits
VFS does not apply the umask
change to private
These are the fs-independent mount-flags: up to 32 flags are supported
Update atime relative to mtime/ctime.
Alter flags of a mounted FS
Superblock flags that can be altered by
MS_REMOUNTchange to shared
change to slave
Always perform atime updates
These sb flags are internal to the kernel
synchronous memory sync
Writes are synced at once
change to unbindable
MS_VERBOSE is deprecated.chars in a file name
/proc/sys/net/appletalk
/proc/sys/net/ax25
/proc/sys/net/bridge
CTL_NET names:was
NET_CORE_DESTROY_DELAY/proc/sys/net/core
/proc/sys/net/dccp
/proc/sys/net/decnet/conf//
/proc/sys/net/decnet/conf/
/proc/sys/net/decnet/
/proc/sys/net/ipv4
v2.0 compatibile variables
/proc/sys/net/ipv4/netfilter
obsolete since 2.6.38
obsolete since 2.6.25
obsolete since 2.6.25
/proc/sys/net/ipv6
/proc/sys/net/ipv6/icmp
/proc/sys/net/ipx
/proc/sys/net/llc
/proc/sys/net/llc/llc2/timeout
/proc/sys/net/llc/llc2
/proc/sys/net/llc/station
/proc/sys/net//neigh/
/proc/sys/net/netrom
/proc/sys/net/netfilter
/proc/sys/net/rose
/proc/sys/net/sctp
/proc/sys/net/token-ring
/proc/sys/net/ethernet
/proc/sys/net/802
/proc/sys/net/unix
/proc/sys/net/x25
supplemental group IDs are available
SIGEV_THREAD implementation:NET: An implementation of the SOCKET network access protocol.
This is the master header file for the Linux NET layer,
or, in plain English: the networking handling part of the kernel.
number of available namespaces
NTP API version
c_oflag bitsset
close_on_execnot fcntl
direct disk access hint
must be a directory
used to be
O_SYNC, see belownot fcntl
not fcntl
don’t follow links
a horrid kludge trying to make sure that this will fail on old kernels
not fcntl
chars in a path name including nul
sizeof first published struct
add: config2
add:
branch_sample_typeadd:
sample_regs_user
add: sample_stack_useradd:
sample_regs_intradd:
aux_watermarksample collided with another
snapshot from overwrite mode
record contains gaps
PERF_RECORD_AUX::flags bits
record was truncated to fitfunction call
conditional
conditional function call
conditional function return
indirect
indirect function call
function return
syscall
syscall return
unconditional
Common flow change classification
unknown
O_CLOEXECpid=cgroup id, per-cpu mode only
Ioctls that can be done on a perf event fd:
locked transaction
locked instruction
not available
5-0xa available
Any cache
hit level
I/O memory
Line Fill Buffer
Local DRAM
miss level
memory hierarchy (memory level, hit or miss)
not available
Remote Cache (1 hop)
Remote Cache (2 hops)
Remote DRAM (1 hop)
Remote DRAM (2 hops)
Uncached memory
code (execution)
load instruction
type of opcode (load/store/prefetch,code)
not available
prefetch
store instruction
Remote
forward
1 free
snoop hit
snoop hit modified
snoop miss
snoop mode
not available
no snoop
hit level
miss level
TLB access
not available
OS fault handler
Hardware Walker
These
PERF_RECORD_MISC_* flags below are safely reused
for the following events:Reserve the last bit to indicate some extended misc field
Following
PERF_RECORD_MISC_* are used on different
events, so can reuse the same bit position:Indicates that
/proc/PID/maps parsing are truncated by time out.bits 32..63 are reserved for the abort code
Instruction not related
Capacity read abort
Capacity write abort
Conflict abort
Values for the memory transaction event qualifier, mostly for
abort events. Multiple bits can be set.
From elision
non-ABI
Retry possible
Instruction is related
From transaction
Security-relevant compatibility flags that must be
cleared upon setuid or setgid exec:
IRIX5 32-bit
IRIX6 64-bit
IRIX6 new 32-bit
Personality types.
OSF/1 v4
Protocol families, same as address families.
bytes in atomic write to a pipe
The clock frequency of the i8253/i8254 PIT
currently only for epoll
These are specified by
iBCS2The rest seem to be more-or-less nonstandard. Check them!
i/o error
device disconnected
SIGPOLL (or any other signal without signal specific
si_codes) si_codes
data input availableinput message available
output buffers available
high priority input available
Oxford Semiconductor
usurped by cyclades.c
RSA-DV II/S card
usurped by cyclades.c
These are the supported serial types.
Don’t need these pages.
Data will be accessed once.
No further special treatment.
Expect random page references.
Expect sequential page references.
Will need these pages.
page can be executed
mprotect flag: extend change to start of growsdown vma
mprotect flag: extend change to end of growsup vma
0x10 reserved for arch-specific use
0x20 reserved for arch-specific use
page can not be accessed
From
uapi/asm-generic/mman-common.h
page can be readpage may be used for atomic ops
page can be written
Get/set the capability bounding set (as per
security/commoncap.c)Control the ambient capability set
True little endian mode
PowerPC pseudo little endiansilently emulate fp operations accesses
don’t emulate fp operations, send
SIGFPE insteadasync recoverable exception mode
FP exceptions disabled
floating point divide by zero
floating point invalid operation
async non-recoverable exc. mode
floating point overflow
precise exception mode
floating point inexact result
Use FPEXC for FP exception enables
floating point underflow
64b FP registers
32b compatibility
Get/set
current->mm->dumpableGet/set process endian
Get/set floating-point emulation control bits (if meaningful)
Get/set floating-point exception mode (if meaningful)
Get/set whether or not to drop capabilities on
setuid() away from
uid 0 (as per security/commoncap.c)Get process name
Second arg is a ptr to return the signal
Get/set process seccomp mode
Get/set securebits (as per
security/commoncap.c)Per task speculation control
Get/set whether we use statistical process timing or accurate timestamp
based process timing
Get/set the process’ ability to use the timestamp counter instruction
Get/set unaligned access control bits (if meaningful)
Set early/late kill mode for hwpoison memory corruption.
This influences when the process gets killed on a memory corruption.
No longer implemented, but left here to ensure the numbers stay reserved:
Reset arm64 pointer authentication keys
Control reclaim behavior when allocating memory
Tune up process memory map specifics.
Set process name
If
no_new_privs is set, then operations that grant new privileges (i.e.
execve) will either fail or not grant them. This affects suid/sgid,
file capabilities, and LSMs.Values to pass as first argument to
prctl()
Second arg is a signalSet specific pid that is allowed to ptrace the current task.
A value of 0 mean “no process”.
Tagged user address controls for arm64
Get/set the timerslack as used by
poll/select/nanosleep
A value of 0 means “use default”Return and control values for
PR_SET/GET_SPECULATION_CTRLSpeculation control variants
get task vector length
arm64 Scalable Vector Extension controls
Flag values must be kept in sync with ptrace
NT_ARM_SVE interface
set task vector lengthdefer effect until exec
inherit across exec
Bits common to
PR_SVE_SET_VL and PR_SVE_GET_VLNormal, traditional, statistical process timing
Accurate timestamp based process timing
allow the use of the timestamp counter
throw a SIGSEGV instead of reading the TSC
silently fix up unaligned user accesses
generate SIGBUS on unaligned user access
These values are stored in
task->ptrace_message
by tracehook_report_syscall_* to describe the current syscall-stop.Wait extended result codes for the above trace options.
Extended result codes which enabled by means other than options.
Arbitrarily choose the same ptrace numbers as used by the Sparc code.
Generic ptrace interface that exports the architecture specific regsets
using the corresponding
NT_* types (which are also used in the core dump).
Please note that the NT_PRSTATUS note type in a core dump contains a full
struct elf_prstatus. But the user_regset for NT_PRSTATUS contains just the
elf_gregset_t that is the pr_reg field of struct elf_prstatus. For all the
other user_regset flavors, the user_regset layout and the ELF core dump note
payload are exactly the same layout.only useful for access 32bit programs / kernels
eventless options
Options set using
PTRACE_SETOPTIONS or using PTRACE_SEIZE @data paramRead signals from a shared (process wide) queue
0x4200-0x4300 are reserved for architecture-independent additions.
resume execution until next branch
/proc/sys/kernel/pty
First argument to waitid:
Quota format type IDs
Quota structure used for communication with userspace via quotactl
Following flags are used to specify which fields are valid
Size of block in which space limits are passed through the quota interface
Masks for quota types when used as a bitmask
Usage got below block hardlimit
Block hardlimit reached
Usage got below block softlimit
Block grace time expired
Block softlimit reached
Usage got below inode hardlimit
Inode hardlimit reached
Usage got below inode softlimit
Inode grace time expired
Inode softlimit reached
Definitions for quota netlink interface
get quota format used on given filesystem
get information about quota files
get disk limits and usage >= ID
get user quota structure
turn quotas off
turn quotas on
set information about quota files
set user quota structure
sync disk copy of a filesystems quotas
/proc/sys/kernel/random
Exchange source and dest
Don’t overwrite target
Whiteout source
address space limit
max core file size
Resource limit IDs
max data size
Maximum filesize
maximum file locks held
max locked-in-memory address space
maximum bytes in POSIX mqueues
max nice prio allowed to raise to 0-39 for nice level 19 .. -20
max number of open files
max number of processes
max resident set size
maximum realtime priority
timeout for RT tasks in us
max number of pending signals
max stack size
SuS says limits have to be unsigned.
Which makes a ton more sense anyway.This limit protects against a deliberately circular list.
(Not worth introducing an rlimit for it)
sys_wait4() uses thisResource control/accounting header file for linux
Definition of struct rusage taken from BSD 4.3 Reno
only the calling thread
per-IO
O_APPENDper-IO
O_DSYNChigh priority request, poll if possible
per-IO, return
-EAGAIN if operation would blockmask of flags supported by the kernel
per-IO
O_SYNC(1U << 31) is reserved for signed error codes
Check file is readable.
non-uapi in-kernel
SA_FLAGS for those indicates ABI for a signal frame.SA_FLAGS values:New architectures should not define the obsolete
SA_RESTORER 0x04000000sizeof first published struct
add:
util_{min,max}For the
sched_{set,get}attr() callsSCHED_ISO: reserved but not implemented yetScheduling policies
Can be
ORed in to make sure the process is reverted back to SCHED_NORMAL on forkrw: struct ucred
Ancillary data object information MACROS
Table 5-14 of POSIX 1003.1g
“Socket”-level control message types:
rw: access rights (array of int)
rw: security label
Valid flags for
SECCOMP_SET_MODE_FILTERValid values for seccomp.mode and prctl()
seccomp is not in use.
PR_SET_SECCOMP, uses user-supplied filter.
uses hard-coded filter.
Masks for the return value sections.
allow
returns an errno
All BPF programs must return a 32-bit value.
The bottom 16-bits are for optional return data.
The upper 16-bits are ordered from least permissive values to most,
as a signed value (so 0x8000000 is negative).
kill the thread
allow after logging
pass to a tracer or disallow
disallow and force a SIGSYS
notifies userspace
Valid operations for seccomp syscall.
seek relative to current file position
seek to the next data
seek relative to end of file
seek to the next hole
seek relative to beginning of file
ADI not enabled for mapped object
invalid permissions for mapped object
Disrupting MCD error
Precise MCD exception
failed address bound checks
SIGSEGV
si_codes
address not mapped to objectfailed protection key checks
adjust on exit max value
of entries in semaphore map
SEMMNI, SEMMSL and SEMMNS are default values which can be
modified by sysctl.
The values has been chosen to be larger than necessary for any
known configuration.
<=
INT_MAX max # of semaphores in systemnum of undo structures system wide
<=
INT_MAX max num of semaphores per id<= 1 000 max num of ops per semop call
unused
max num of undo entries per process
sizeof struct
sem_undo<= 32767 semaphore maximum value
ipcs ctl cmds
semop flags
undo the operation on exit
If enabled
Logical level for RTS pin after sent
Logical level for RTS pin when sending
Enable bus termination (if supported)
set all semval’s
set semval
FLL frequency factor (shift)
SHIFT_PLL is used as a dampening factor to define how much we
adjust the frequency correction for a given offset in PLL mode.
It also used in dampening the offset correction, to define how
much of the current value in time_offset we correct for each
second. Changing this value changes the stiffness of the ntp
adjustment code. A lower value makes it more flexible, reducing
NTP convergence time. A higher value makes it stiffer, increasing
convergence time, but making the clock more stable.SHIFT_USEC defines the scaling (shift) of the time_freq and
time_tolerance variables, which represent the current frequency
offset and maximum frequency tolerance.
frequency offset scale (shift)max shm system wide (pages)
max shared seg size (bytes)
SHMMNI, SHMMAX and SHMALL are default upper limits which can be
modified by sysctl. The SHMMAX and SHMALL values have been chosen to
be as large possible without facilitating scenarios where userspace
causes overflows when adjusting the limits via operations of the form
“retrieve current limit; add X; update limit”. It is therefore not
advised to make SHMMAX and SHMALL any larger. These limits are
suitable for both 32 and 64-bit systems.
min shared seg size (bytes)
max num of segs system wide
max shared segs per process
execution access
Bits 9 & 10 are
IPC_CREAT and IPC_EXCL
segment will use huge TLB pagesHuge page size encoding when
SHM_HUGETLB is specified, and a huge page
size other than the default is desired. See hugetlb_encode.hsuper user shmctl commands
don’t check for reservations
shmget() shmflg values.
The bottom nine bits are the same as open(2) mode flags
or S_IRUGO from <linux/stat.h>shmat() shmflg values
read-only accesstake-over region on attach
round attach address to SHMLBA boundary
ipcs ctl commands
or
S_IWUGO from <linux/stat.h>enum
sock_shutdown_cmd - Shutdown types:Shutdown receptions/transmissions
Shutdown transmissions
other notification: meaningless
sigevent definitions
deliver via thread creation
deliver to thread
These should not be considered constants from userland.
for blocking signals
default signal handling
error return from signal
ignore signal
for setting the signal mask
for unblocking signals
Get stamp (timeval)
Get stamp (timespec)
sent by AIO completion
sent by glibc async name lookup completion
sent by
execve() killing subsidiary threadssent by the kernel from somewhere
sent by real time mesq state change
sent by sigqueue
sent by queued SIGIO
sent by timer expiration
sent by tkill system call
How these fields are to be accessed.
si_code values
Digital reserves positive values for kernel-generated signals.
sent by kill, sigsend, raiseHistorically,
SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA were located
in sock->flags, but moved into sk->sk_wq->flags to be RCU protected.
Eventually all flags will be in sk->sk_wq->flags.Flags for socket, socketpair, accept4
Datagram Congestion Control Protocol socket
Datagram (conn.less) socket
Linux specific way of getting packets at the dev level.
Raw socket
Reliably-delivered message
sequential packet socket
Structure describing an Internet (IP) socket address.
sizeof(struct sockaddr)
enum
sock_type - Socket types
For writing rarp and other similar things on the user level.Mask which covers at least up to
SOCK_MASK - 1.
The remaining bits are used as flags.ATM Adaption Layer (packet level)
ATM layer (cell level)
setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxxFor setsockopt(3)
#define
SOL_ICMP 1 No-no-no! Due to Linux :-) we cannot use SOL_ICMP=1UDP-Lite (RFC 3828)
Maximum queue length specifiable by listen.
Socket filtering
Instruct lower device to use last 4-bytes of skb data as FCS
powerpc only differs in these
Security levels - as per NRL IPv6 - don’t actually do anything
on 64-bit and x32, avoid the ?: operator
sqe->splice_flagspages passed in are a gift
expect more data
Flags passed in from splice/tee/vmsplice
move pages instead of copying
don’t block on the pipe splicing (but we may still block on the fd we splice from/to, of course
bit-flags
disable sas during sighandling
connected to socket
in process of connecting
in process of disconnecting
mask for all
SS_xxx flagsnot allocated
unconnected to any socket
All currently supported flags
Want/got
stx_atimeI File is append-onlyDir: Automount trigger
Attributes to be found in
stx_attributes and masked in stx_attributes_mask.I File requires key to decrypt in fsI File is marked immutableI File is not to be dumpedThe stuff in the normal stat struct
Want/got
stx_blocksWant/got
stx_btimeWant/got
stx_ctimeWant/got
stx_gidWant/got
stx_inoWant/got
stx_mode & ~S_IFMTWant/got
stx_mtimeWant/got
stx_nlinkWant/got
stx_sizeFlags to be
stx_maskWant/got
stx_uidReserved for future struct statx expansion
clock source (0 = A, 1 = B) (ro)
clock hardware fault (ro)
delete leap (rw)
select frequency-lock mode (rw)
hold frequency (rw)
insert leap (rw)
mode (0 = PLL, 1 = FLL) (ro)
resolution (0 = us, 1 = ns) (ro)
Status codes (timex.status)
enable PLL updates (rw)
PPS signal calibration error (ro)
enable PPS freq discipline (rw)
PPS signal jitter exceeded (ro)
PPS signal present (ro)
enable PPS time discipline (rw)
PPS signal wander exceeded (ro)
read-only bits
clock unsynchronized (rw)
Command definitions for the ‘quotactl’ system call.
The commands are broken into a main command defined below
and a subcommand that is used to convey the type of
quota that is being manipulated (see above).
One swap address space for each 64M swap space
Special value in each
swap_map continuation.enable discard for swap
discard swap area at swapon-time
discard page-clusters after use
set if swap priority specified
Bit flag in
swap_map.Note page is bad
Special value in first
swap_map.Owned by shmem/tmpfs
sys_accept4(2)sys_accept(2)sys_bind(2)sys_connect(2)sys_getpeername(2)sys_getsockname(2)sys_getsockopt(2)sys_listen(2)sys_recvfrom(2)sys_recvmmsg(2)sys_recvmsg(2)sys_recv(2)SIGSYS
si_codes
seccomp triggered
Return from SYS_SECCOMP as it is already used by an syscall num.sys_sendmmsg(2)sys_sendmsg(2)sys_sendto(2)sys_send(2)sys_setsockopt(2)sys_shutdown(2)sys_socketpair(2)sys_socket(2)struct dirent file types
exposed to user via
getdents(2), readdir(3)0x54 is just a magic number to make these relatively unique (‘T’)
SYS5 TCGETX compatibility
tcflush() and TCFLSH use these
tcflow() and TCXONC use these
ECN was negociated at TCP session init
we received at least one packet with ECT
SYN-ACK acked data in SYN sent or rcvd
for
TCP_INFO socket optionSet TCP initial congestion window
Set
sndcwnd_clampGet Congestion Control (optional) info
Congestion control algorithm
Never send partially complete segments
Wake up listener only when data arrive
Enable
FastOpen on listenersAttempt
FastOpen with connectSet the key for Fast Open (cookie)
Enable TFO without a TFO cookie
Information about this connection.
Notify bytes available to read as a cmsg on read
Number of keepalives before death
Start keeplives after this period
Interval between keepalives
Life time of orphaned FIN-WAIT-2 state
Limit MSS
TCP MD5 Signature (RFC2385)
TCP MD5 Signature with extensions
ifindex set
tcp_md5sig extension flags for TCP_MD5SIG_EXT
address prefix lengthfor
TCP_MD5SIG socket optionTCP general constants
IPv4 (RFC1122, RFC2581)
IPv6 (tunneled), EDNS0 (RFC3226)
TCP socket options
Turn off Nagle’s algorithm.
limit number of unsent bytes in write queue
Block/reenable quick acks
TCP sock is under repair right now
Turn off without window probes
Get/set window parameters
Get SYN headers recorded for connection
Record SYN headers for new connections
Number of SYN retransmits
Fast retrans. after 1 dupack
Use linear timeouts for thin streams
delay outgoing packets by XX usec
Attach a ULP to a TCP connection
How long for loss retry before timeout
Bound advertised window
tcsetattr uses these
Needed for POSIX tcsendbreak()
CAREFUL: Check include/asm-generic/fcntl.h when defining
new flags, since they might collide with
O_* ones.Located here for
timespec[64]_valid_strictThe various flags for setting POSIX.1b interval timers:
bw compat
delete leap second
clock not synchronized
insert leap second
Clock states (
time_state)
clock synchronized, no leap secondleap second in progress
Limits for settimeofday():
leap second has occurred
BSD compatibility
Get primary device node of /dev/console
Get exclusive mode state
read serial port inline interrupt counts
Get packet mode state
Get Pty lock state
Get Pty Number (of pty-mux device)
Safely open the slave
Return the session ID of FD
wait for a change on serial input line(s)
modem lines
Used for packet mode
BSD compatibility
Get line status register
Get multiport config
For debugging only
Set multiport config
Transmitter physically empty
pty: generate signal
Lock/unlock Pty
process taken branch trap
SIGTRAP
si_codes
process breakpointhardware breakpoint/watchpoint
process trace trap
undiagnosed trap
UIO_MAXIOV shall be at least 16 1003.1g (5.4.1.1)Flags for bug emulation.
c_cc charactersblock dump mode
dirty_background_ratiodirty_expire_centisecsdirty_ratiodirty_writeback_centisecsint: nuke lots of pagecache
permitted hugetlb group
int: Number of available Huge Pages
vm laptop mode
legacy/compatibility virtual address space layout
reservation ratio for lower memory zones
int: Maximum number of mmaps/address-space
Minimum free kilobytes to maintain
Percent pages ignored by zone reclaim
Set min percent of unmapped pages
nr_pdflush_threadsTurn off the virtual memory safety limit
percent of RAM to allow overcommit in
struct: Control pagebuf parameters
int: set number of pages to swap together
panic at out-of-memory
int: fraction of pages in each
percpu_pagelistTendency to steal mapped memory
default time for token time out
CTL_VM names:was; int: Linear or sqrt() swapout for hogs
was: struct: Set free page thresholds
Spare
was: struct: Set buffer memory thresholds
was: struct: Set cache memory thresholds
was: struct: Control kswapd behaviour
was: struct: Set page table cache parameters
map VDSO into new processes?
dcache/icache reclaim pressure
reclaim local zone memory before going off node
Don’t reap, just poll status.
Check file is writable.
set value, fail if attr already exists
Security namespace
size of extended attribute namelist (64k)
chars in an extended attribute name
Namespaces
set value, fail if attr does not exist
size of an extended attribute value (64k)
User return codes for XDP prog type.
Check file is executable.
Most things should be clean enough to redefine this at will, if care is taken to make libc match.
Limit the stack by to some sane default: root can always
increase this limit if needed.. 8MB seems reasonable.
decimal division by zero
packed decimal error
decimal overflow
invalid ASCII digit
invalid decimal digit
bundle-update (modification) in progress
illegal break
Before Linux 2.6.33 only
O_DSYNC semantics were implemented, but using
the O_SYNC flag. We continue to use the existing numerical value
for O_DSYNC semantics now, but using the correct symbolic name for it.
This new value is used to request true Posix O_SYNC semantics. It is
defined in this strange way to make sure applications compiled against
new headers get at least O_DSYNC semantics on older kernels.performed a listen
Wait on all children, regardless of type
Wait only on non-SIGCHLD children
Don’t wait on children of other threads in this group
Functions
FUTEX_WAKE_OP will perform atomically.Used to create numbers.
used to decode ioctl numbers..
Definitions of the bits in an Internet address integer.
On subnets, host and network parts are found according
to the subnet mask, not these masks.
Type Definitions
anything below here should be completely generic
Most 64-bit platforms use ‘long’, while most 32-bit platforms use ‘__u32’.
Yes, they differ in signedness as well as size.
Special cases can override it for themselves – except for S390x, which
is just a little too special for us. And MIPS, which I’m not touching
with a 10’ pole.
The default
si_band type is “long”, as specified by POSIX.
However, some architectures want to override this to “int”
for historical compatibility reasons, so we allow that.key handle permissions mask
key handle serial number
Type of a SYSV IPC key.
Anything below here should be completely generic.
at least 32 bits
The type of an index into the pagecache.
Type in which we store ids in memory
Type in which we store sizes
Flags for
preadv2/pwritev2:sa_sigaction_fn_t as usizeThe type used for indexing onto a disc or disc partition.
Type of a signal handler.
signalfn_t as usizerestorefn_t as usizeMost 32 bit architectures use
unsigned int size_t,
and all 64 bit architectures use unsigned long size_t.socket-state enum.
Unions
inputs to lookup
IPv6 address structure
arg for semctl system calls.