Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. "Hahaha, look at how Rust failed here."

"Hahaha, look at how Rust failed here."

Scheduled Pinned Locked Moved Uncategorized
15 Posts 8 Posters 26 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • isotopp@infosec.exchangeI This user is from outside of this forum
    isotopp@infosec.exchangeI This user is from outside of this forum
    isotopp@infosec.exchange
    wrote last edited by
    #1

    RE: https://infosec.exchange/@lcamtuf/116517194178120536

    "Hahaha, look at how Rust failed here."

    Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

    Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

    Of course we shouldn't break userspace. We can still provide the old, broken calls.

    But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

    barubary@infosec.exchangeB aheadofthekrauts@social.tchncs.deA iwein@mas.toI monospace@floss.socialM masek@infosec.exchangeM 7 Replies Last reply
    0
    • isotopp@infosec.exchangeI isotopp@infosec.exchange

      RE: https://infosec.exchange/@lcamtuf/116517194178120536

      "Hahaha, look at how Rust failed here."

      Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

      Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

      Of course we shouldn't break userspace. We can still provide the old, broken calls.

      But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

      barubary@infosec.exchangeB This user is from outside of this forum
      barubary@infosec.exchangeB This user is from outside of this forum
      barubary@infosec.exchange
      wrote last edited by
      #2

      @isotopp Yes! The file system is just a big ball of race conditions bundled together (and so are processes/PIDs).

      1 Reply Last reply
      0
      • isotopp@infosec.exchangeI isotopp@infosec.exchange

        RE: https://infosec.exchange/@lcamtuf/116517194178120536

        "Hahaha, look at how Rust failed here."

        Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

        Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

        Of course we shouldn't break userspace. We can still provide the old, broken calls.

        But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

        aheadofthekrauts@social.tchncs.deA This user is from outside of this forum
        aheadofthekrauts@social.tchncs.deA This user is from outside of this forum
        aheadofthekrauts@social.tchncs.de
        wrote last edited by
        #3

        @isotopp Yet you both are right. 😑

        1 Reply Last reply
        0
        • isotopp@infosec.exchangeI isotopp@infosec.exchange

          RE: https://infosec.exchange/@lcamtuf/116517194178120536

          "Hahaha, look at how Rust failed here."

          Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

          Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

          Of course we shouldn't break userspace. We can still provide the old, broken calls.

          But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

          iwein@mas.toI This user is from outside of this forum
          iwein@mas.toI This user is from outside of this forum
          iwein@mas.to
          wrote last edited by
          #4

          @isotopp That shit is hard, and the implementations on all operating systems are weirdly different. If only we could improve that instead of another magic "paradigm shift" eh?

          isotopp@infosec.exchangeI 1 Reply Last reply
          0
          • isotopp@infosec.exchangeI isotopp@infosec.exchange

            RE: https://infosec.exchange/@lcamtuf/116517194178120536

            "Hahaha, look at how Rust failed here."

            Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

            Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

            Of course we shouldn't break userspace. We can still provide the old, broken calls.

            But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

            monospace@floss.socialM This user is from outside of this forum
            monospace@floss.socialM This user is from outside of this forum
            monospace@floss.social
            wrote last edited by
            #5

            @isotopp There is no hope for a better past. But we can at least learn from it.

            1 Reply Last reply
            0
            • isotopp@infosec.exchangeI isotopp@infosec.exchange

              RE: https://infosec.exchange/@lcamtuf/116517194178120536

              "Hahaha, look at how Rust failed here."

              Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

              Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

              Of course we shouldn't break userspace. We can still provide the old, broken calls.

              But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

              masek@infosec.exchangeM This user is from outside of this forum
              masek@infosec.exchangeM This user is from outside of this forum
              masek@infosec.exchange
              wrote last edited by
              #6

              @isotopp You're aware, that this is a start of the next systemd-like discussion 😃?

              I may not be 100% in agreement with everything systemd project does. But the people in that project know things much better than I do and even I clearly can see the desperate need for modernization.

              So I would give them a lot of leeway and am totally aghast about the amount of hate they receive.

              What you (correctly) said above would (as a project) draw 10 times the ire systemd did.

              hikhvar@norden.socialH 1 Reply Last reply
              0
              • isotopp@infosec.exchangeI isotopp@infosec.exchange

                RE: https://infosec.exchange/@lcamtuf/116517194178120536

                "Hahaha, look at how Rust failed here."

                Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

                Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

                Of course we shouldn't break userspace. We can still provide the old, broken calls.

                But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

                slink@fosstodon.orgS This user is from outside of this forum
                slink@fosstodon.orgS This user is from outside of this forum
                slink@fosstodon.org
                wrote last edited by
                #7

                @isotopp https://fosstodon.org/@slink/116486258791687186

                1 Reply Last reply
                0
                • iwein@mas.toI iwein@mas.to

                  @isotopp That shit is hard, and the implementations on all operating systems are weirdly different. If only we could improve that instead of another magic "paradigm shift" eh?

                  isotopp@infosec.exchangeI This user is from outside of this forum
                  isotopp@infosec.exchangeI This user is from outside of this forum
                  isotopp@infosec.exchange
                  wrote last edited by
                  #8

                  @iwein I don't care much. If the Linux kernel does it, the rest will eventually follow. Or not, but they are sidelined already anyway, so who cares.

                  1 Reply Last reply
                  0
                  • masek@infosec.exchangeM masek@infosec.exchange

                    @isotopp You're aware, that this is a start of the next systemd-like discussion 😃?

                    I may not be 100% in agreement with everything systemd project does. But the people in that project know things much better than I do and even I clearly can see the desperate need for modernization.

                    So I would give them a lot of leeway and am totally aghast about the amount of hate they receive.

                    What you (correctly) said above would (as a project) draw 10 times the ire systemd did.

                    hikhvar@norden.socialH This user is from outside of this forum
                    hikhvar@norden.socialH This user is from outside of this forum
                    hikhvar@norden.social
                    wrote last edited by
                    #9

                    @masek

                    The biggest thing systemd brought was consistent, well documented behaviour.

                    That behaviour is questionable sometimes, but consistent for the user.

                    Start a systemd SYSCALL API and both the fediverse and reddit will DDOS itself.

                    @isotopp

                    1 Reply Last reply
                    0
                    • isotopp@infosec.exchangeI isotopp@infosec.exchange

                      RE: https://infosec.exchange/@lcamtuf/116517194178120536

                      "Hahaha, look at how Rust failed here."

                      Maybe writing a utility like cp without TOCTOU, race conditions, symlink exploits and the like shouldn't be hard. Maybe copying a file shouldn't require more than a single line in userspace.

                      Maybe the UNIX file API is incomplete and could do with a number of revisions and updates. Maybe, after 40, 50 years we have learned a few things and should go through it with a fine comb.

                      Of course we shouldn't break userspace. We can still provide the old, broken calls.

                      But maybe we should discuss how we can come up with something systematic that doesn't suck and invite these kinds of bugs. In any language.

                      isotopp@infosec.exchangeI This user is from outside of this forum
                      isotopp@infosec.exchangeI This user is from outside of this forum
                      isotopp@infosec.exchange
                      wrote last edited by
                      #10

                      Part of that work is already done.

                      Linux’s syscall surface has a pattern: take a narrow primitive, remove implicit global state, make it composable, and push work into the kernel to avoid copies or races. clone(), openat(), and splice() fit that pattern well.

                      There are several other clusters of similar “upgrades”.

                      First, the at family generalizes path-based syscalls to operate relative to a directory file descriptor, which eliminates reliance on the process-wide CWD and closes race windows.

                      Besides openat(), there are fstatat(), linkat(), renameat(), unlinkat(), mkdirat(), symlinkat(), and more recently openat2() with a struct-based argument that lets you constrain resolution (no symlinks, stay beneath a dir, etc.).

                      POSIX standardized a subset of this idea in POSIX.1-2008: the basic *at() calls exist there, but Linux-specific extensions like openat2() and its resolution flags are not in POSIX.

                      Second, file-descriptor–centric design is pushed much further than POSIX.

                      Linux prefers operations that take FDs instead of paths and adds syscalls to obtain stable references: O_PATH, name_to_handle_at() and open_by_handle_at() (exportable file handles), pidfd_open() and the broader pidfd API for race-free process management, and memfd_create() for anonymous in-kernel files.

                      POSIX largely sticks to PIDs and pathnames; pidfds, memfd, and file handles are Linux-only.

                      Third, race-free event and I/O multiplexing. Linux moved from select()/poll() to epoll (edge-triggered, scalable readiness notification) and then to io_uring, which is a much bigger step: shared submission/completion queues, batching, fixed buffers/files, and true async operations with fewer syscalls.

                      POSIX includes select() and poll(), and optionally AIO (aio_*), but epoll and io_uring are Linux-specific.

                      Fourth, zero-copy and in-kernel data movement. Beyond sendfile() → splice(), there’s tee() (duplicate a pipe buffer without copying) and vmsplice() (map user pages into a pipe).

                      These let you build pipelines where data stays in kernel space. POSIX has sendfile() only via non-standard extensions on some systems; splice/tee/vmsplice are not in POSIX.

                      Fifth, vector and message-oriented batching. readv()/writev() exist in POSIX, but Linux extends batching with preadv2()/pwritev2() flags, recvmmsg()/sendmmsg() to amortize syscall overhead for datagrams, and various flags for finer control.

                      The mmsg calls are Linux-specific.

                      Sixth, futexes for user-space synchronization. futex() lets user space do uncontended locking without syscalls and only enter the kernel on contention.

                      This is the basis for efficient pthread mutexes/condvars on Linux.

                      POSIX defines the pthread APIs, not the futex primitive; futex is Linux-specific.

                      Seventh, namespaces and capabilities. Syscalls like unshare(), setns(), and clone() flags create per-process views of resources (mount, PID, net, user namespaces).

                      This is foundational for containers.

                      POSIX has no concept of namespaces or Linux capabilities.

                      Eighth, timers, event FDs, and signal improvements. timerfd_create(), eventfd(), and signalfd() turn timers, counters, and signals into file descriptors that integrate with epoll.

                      POSIX has timers and signals, but not these FD-based forms.

                      Ninth, process creation refinement. clone3() is a modern, extensible variant of clone() with a struct argument, similar in spirit to openat2().

                      POSIX sticks with fork() and posix_spawn(); clone* is Linux-specific.

                      Tenth, memory management extensions. mremap(), madvise() flags beyond POSIX, userfaultfd() (handle page faults in user space), memfd_secret (restricted mappings).

                      POSIX defines mmap()/mprotect()/msync(); the rest are Linux extensions.

                      Eleventh, mount API overhaul. The newer mount API (open_tree(), move_mount(), fsopen(), fsconfig(), fsmount()) replaces the legacy mount() string interface with FD-based, race-resistant operations.

                      This is Linux-only.

                      Twelfth, BPF as a syscall-backed subsystem. The bpf() syscall exposes a programmable kernel data path and observability tools.

                      Entirely Linux-specific.

                      On POSIX coverage, the pattern is consistent: when Linux introduces a generalization that reduces races and global state in a way that’s broadly portable, a conservative subset may eventually appear in POSIX (the *at() family, readv/writev, posix_spawn). The more ambitious pieces that depend on Linux’s internal models or aim at performance and containerization (epoll, io_uring, pidfds, namespaces, futex, BPF, new mount API, zero-copy pipe primitives) are not in POSIX and are unlikely to be standardized in their current form.

                      isotopp@infosec.exchangeI 1 Reply Last reply
                      0
                      • isotopp@infosec.exchangeI isotopp@infosec.exchange

                        Part of that work is already done.

                        Linux’s syscall surface has a pattern: take a narrow primitive, remove implicit global state, make it composable, and push work into the kernel to avoid copies or races. clone(), openat(), and splice() fit that pattern well.

                        There are several other clusters of similar “upgrades”.

                        First, the at family generalizes path-based syscalls to operate relative to a directory file descriptor, which eliminates reliance on the process-wide CWD and closes race windows.

                        Besides openat(), there are fstatat(), linkat(), renameat(), unlinkat(), mkdirat(), symlinkat(), and more recently openat2() with a struct-based argument that lets you constrain resolution (no symlinks, stay beneath a dir, etc.).

                        POSIX standardized a subset of this idea in POSIX.1-2008: the basic *at() calls exist there, but Linux-specific extensions like openat2() and its resolution flags are not in POSIX.

                        Second, file-descriptor–centric design is pushed much further than POSIX.

                        Linux prefers operations that take FDs instead of paths and adds syscalls to obtain stable references: O_PATH, name_to_handle_at() and open_by_handle_at() (exportable file handles), pidfd_open() and the broader pidfd API for race-free process management, and memfd_create() for anonymous in-kernel files.

                        POSIX largely sticks to PIDs and pathnames; pidfds, memfd, and file handles are Linux-only.

                        Third, race-free event and I/O multiplexing. Linux moved from select()/poll() to epoll (edge-triggered, scalable readiness notification) and then to io_uring, which is a much bigger step: shared submission/completion queues, batching, fixed buffers/files, and true async operations with fewer syscalls.

                        POSIX includes select() and poll(), and optionally AIO (aio_*), but epoll and io_uring are Linux-specific.

                        Fourth, zero-copy and in-kernel data movement. Beyond sendfile() → splice(), there’s tee() (duplicate a pipe buffer without copying) and vmsplice() (map user pages into a pipe).

                        These let you build pipelines where data stays in kernel space. POSIX has sendfile() only via non-standard extensions on some systems; splice/tee/vmsplice are not in POSIX.

                        Fifth, vector and message-oriented batching. readv()/writev() exist in POSIX, but Linux extends batching with preadv2()/pwritev2() flags, recvmmsg()/sendmmsg() to amortize syscall overhead for datagrams, and various flags for finer control.

                        The mmsg calls are Linux-specific.

                        Sixth, futexes for user-space synchronization. futex() lets user space do uncontended locking without syscalls and only enter the kernel on contention.

                        This is the basis for efficient pthread mutexes/condvars on Linux.

                        POSIX defines the pthread APIs, not the futex primitive; futex is Linux-specific.

                        Seventh, namespaces and capabilities. Syscalls like unshare(), setns(), and clone() flags create per-process views of resources (mount, PID, net, user namespaces).

                        This is foundational for containers.

                        POSIX has no concept of namespaces or Linux capabilities.

                        Eighth, timers, event FDs, and signal improvements. timerfd_create(), eventfd(), and signalfd() turn timers, counters, and signals into file descriptors that integrate with epoll.

                        POSIX has timers and signals, but not these FD-based forms.

                        Ninth, process creation refinement. clone3() is a modern, extensible variant of clone() with a struct argument, similar in spirit to openat2().

                        POSIX sticks with fork() and posix_spawn(); clone* is Linux-specific.

                        Tenth, memory management extensions. mremap(), madvise() flags beyond POSIX, userfaultfd() (handle page faults in user space), memfd_secret (restricted mappings).

                        POSIX defines mmap()/mprotect()/msync(); the rest are Linux extensions.

                        Eleventh, mount API overhaul. The newer mount API (open_tree(), move_mount(), fsopen(), fsconfig(), fsmount()) replaces the legacy mount() string interface with FD-based, race-resistant operations.

                        This is Linux-only.

                        Twelfth, BPF as a syscall-backed subsystem. The bpf() syscall exposes a programmable kernel data path and observability tools.

                        Entirely Linux-specific.

                        On POSIX coverage, the pattern is consistent: when Linux introduces a generalization that reduces races and global state in a way that’s broadly portable, a conservative subset may eventually appear in POSIX (the *at() family, readv/writev, posix_spawn). The more ambitious pieces that depend on Linux’s internal models or aim at performance and containerization (epoll, io_uring, pidfds, namespaces, futex, BPF, new mount API, zero-copy pipe primitives) are not in POSIX and are unlikely to be standardized in their current form.

                        isotopp@infosec.exchangeI This user is from outside of this forum
                        isotopp@infosec.exchangeI This user is from outside of this forum
                        isotopp@infosec.exchange
                        wrote last edited by
                        #11

                        File naming has been decoupled from the API that does things with files through these fd-based calls. So once you have an fd you should be set.

                        But:

                        Linux at the upper kernel layer does not change POSIX filename requirements. Filenames can be random garbage as long as they do not contain pathsep (the slash) and Nullbytes.

                        There is no clean, portable Linux syscall that says: “this directory accepts exactly UTF-8” or “this directory accepts arbitrary bytes.” The safe model is still: treat filenames as byte strings, not text, until you must display or create human-facing names.

                        That means your programming language must work with byte-arrays as filenames, even when that seems to be silly.

                        Linux pathname rules are byte-oriented: a pathname is a null-terminated byte sequence, interior null bytes are forbidden, and / is the separator, not a filename byte.

                        Directory reads likewise return null-terminated d_name entries, not Unicode strings.

                        For creating new human-facing filenames, emit valid UTF-8, preferably normalized to NFC at the application layer. But still be prepared for EINVAL, ENAMETOOLONG, EEXIST, or filesystem-specific rejection – some filesystems have magic filenames such as nul, prn or con and you won't be able to use them.

                        For accepting existing names, accept arbitrary bytes except / and \0. A UTF-8-only application that refuses to operate on invalid names will break on normal Unix trees, backups, tar extractions, old mounts, removable media, and network filesystems.

                        For detecting constraints, the options are weak:

                        statfs() tells you the filesystem type, so you can special-case ext4, vfat, ntfs3, btrfs, overlayfs, etc., but that is not a semantic contract. You will need to know the rules for each filesystem type, there is no way to query them.

                        pathconf(path, _PC_NAME_MAX) tells you name length limits, not encoding.

                        statx() gives richer file metadata, but not a general filename-encoding capability, and surely not lists of reserved names or other fancyness.

                        Some filesystems have feature-specific behavior. ext4 casefolding, for example, stores a filesystem-wide encoding model for case-insensitive directories, defaulting to UTF-8 in the kernel documentation. That does not turn Linux pathname handling generally into Unicode.

                        So we DO have an API that could be race-free if you used to fully, and in order to do that you'd use Linux specific syscalls.

                        We LACK an API that can handle structured filenames, and answer questions about naming restrictions properly.

                        There is no

                        query_name_policy(dirfd) → accepted encoding, normalization rules, case-sensitivity/case-folding, max component length in bytes and characters, reserved names, forbidden code points/bytes, equivalence rules, stable display form, and whether invalid legacy names may already exist.

                        isotopp@infosec.exchangeI barubary@infosec.exchangeB 2 Replies Last reply
                        0
                        • isotopp@infosec.exchangeI This user is from outside of this forum
                          isotopp@infosec.exchangeI This user is from outside of this forum
                          isotopp@infosec.exchange
                          wrote last edited by
                          #12

                          @hllizi See followup post: This has largely alread happened (improving the kernel fd API).

                          1 Reply Last reply
                          0
                          • isotopp@infosec.exchangeI isotopp@infosec.exchange

                            File naming has been decoupled from the API that does things with files through these fd-based calls. So once you have an fd you should be set.

                            But:

                            Linux at the upper kernel layer does not change POSIX filename requirements. Filenames can be random garbage as long as they do not contain pathsep (the slash) and Nullbytes.

                            There is no clean, portable Linux syscall that says: “this directory accepts exactly UTF-8” or “this directory accepts arbitrary bytes.” The safe model is still: treat filenames as byte strings, not text, until you must display or create human-facing names.

                            That means your programming language must work with byte-arrays as filenames, even when that seems to be silly.

                            Linux pathname rules are byte-oriented: a pathname is a null-terminated byte sequence, interior null bytes are forbidden, and / is the separator, not a filename byte.

                            Directory reads likewise return null-terminated d_name entries, not Unicode strings.

                            For creating new human-facing filenames, emit valid UTF-8, preferably normalized to NFC at the application layer. But still be prepared for EINVAL, ENAMETOOLONG, EEXIST, or filesystem-specific rejection – some filesystems have magic filenames such as nul, prn or con and you won't be able to use them.

                            For accepting existing names, accept arbitrary bytes except / and \0. A UTF-8-only application that refuses to operate on invalid names will break on normal Unix trees, backups, tar extractions, old mounts, removable media, and network filesystems.

                            For detecting constraints, the options are weak:

                            statfs() tells you the filesystem type, so you can special-case ext4, vfat, ntfs3, btrfs, overlayfs, etc., but that is not a semantic contract. You will need to know the rules for each filesystem type, there is no way to query them.

                            pathconf(path, _PC_NAME_MAX) tells you name length limits, not encoding.

                            statx() gives richer file metadata, but not a general filename-encoding capability, and surely not lists of reserved names or other fancyness.

                            Some filesystems have feature-specific behavior. ext4 casefolding, for example, stores a filesystem-wide encoding model for case-insensitive directories, defaulting to UTF-8 in the kernel documentation. That does not turn Linux pathname handling generally into Unicode.

                            So we DO have an API that could be race-free if you used to fully, and in order to do that you'd use Linux specific syscalls.

                            We LACK an API that can handle structured filenames, and answer questions about naming restrictions properly.

                            There is no

                            query_name_policy(dirfd) → accepted encoding, normalization rules, case-sensitivity/case-folding, max component length in bytes and characters, reserved names, forbidden code points/bytes, equivalence rules, stable display form, and whether invalid legacy names may already exist.

                            isotopp@infosec.exchangeI This user is from outside of this forum
                            isotopp@infosec.exchangeI This user is from outside of this forum
                            isotopp@infosec.exchange
                            wrote last edited by
                            #13

                            What we do need is a Linux-centric update to W. Richard Stevens of APUE, an APLE book.

                            It would be discussing working with the Linux kernel API correctly, maybe implementing a libc or a libc-replacement, or a Python or Rust kernel API interface, correctly, with error handling, using code.

                            Stevens had a wonderful writing style, in English and in Code, showcasing the point made in a chapter without compromising on correctness and production-ready error handling.

                            But his books and the API he describes are old, and from the Linux Kernel PoV also inferior and outdated.

                            1 Reply Last reply
                            0
                            • isotopp@infosec.exchangeI isotopp@infosec.exchange

                              File naming has been decoupled from the API that does things with files through these fd-based calls. So once you have an fd you should be set.

                              But:

                              Linux at the upper kernel layer does not change POSIX filename requirements. Filenames can be random garbage as long as they do not contain pathsep (the slash) and Nullbytes.

                              There is no clean, portable Linux syscall that says: “this directory accepts exactly UTF-8” or “this directory accepts arbitrary bytes.” The safe model is still: treat filenames as byte strings, not text, until you must display or create human-facing names.

                              That means your programming language must work with byte-arrays as filenames, even when that seems to be silly.

                              Linux pathname rules are byte-oriented: a pathname is a null-terminated byte sequence, interior null bytes are forbidden, and / is the separator, not a filename byte.

                              Directory reads likewise return null-terminated d_name entries, not Unicode strings.

                              For creating new human-facing filenames, emit valid UTF-8, preferably normalized to NFC at the application layer. But still be prepared for EINVAL, ENAMETOOLONG, EEXIST, or filesystem-specific rejection – some filesystems have magic filenames such as nul, prn or con and you won't be able to use them.

                              For accepting existing names, accept arbitrary bytes except / and \0. A UTF-8-only application that refuses to operate on invalid names will break on normal Unix trees, backups, tar extractions, old mounts, removable media, and network filesystems.

                              For detecting constraints, the options are weak:

                              statfs() tells you the filesystem type, so you can special-case ext4, vfat, ntfs3, btrfs, overlayfs, etc., but that is not a semantic contract. You will need to know the rules for each filesystem type, there is no way to query them.

                              pathconf(path, _PC_NAME_MAX) tells you name length limits, not encoding.

                              statx() gives richer file metadata, but not a general filename-encoding capability, and surely not lists of reserved names or other fancyness.

                              Some filesystems have feature-specific behavior. ext4 casefolding, for example, stores a filesystem-wide encoding model for case-insensitive directories, defaulting to UTF-8 in the kernel documentation. That does not turn Linux pathname handling generally into Unicode.

                              So we DO have an API that could be race-free if you used to fully, and in order to do that you'd use Linux specific syscalls.

                              We LACK an API that can handle structured filenames, and answer questions about naming restrictions properly.

                              There is no

                              query_name_policy(dirfd) → accepted encoding, normalization rules, case-sensitivity/case-folding, max component length in bytes and characters, reserved names, forbidden code points/bytes, equivalence rules, stable display form, and whether invalid legacy names may already exist.

                              barubary@infosec.exchangeB This user is from outside of this forum
                              barubary@infosec.exchangeB This user is from outside of this forum
                              barubary@infosec.exchange
                              wrote last edited by
                              #14

                              @isotopp How do you read the contents of a directory in a race-free way?

                              isotopp@infosec.exchangeI 1 Reply Last reply
                              0
                              • barubary@infosec.exchangeB barubary@infosec.exchange

                                @isotopp How do you read the contents of a directory in a race-free way?

                                isotopp@infosec.exchangeI This user is from outside of this forum
                                isotopp@infosec.exchangeI This user is from outside of this forum
                                isotopp@infosec.exchange
                                wrote last edited by
                                #15

                                @barubary You do not read a directory race-free in the snapshot sense.

                                A directory fd gives you a stable ref to THAT directory object, not a stable list of its children.

                                You CAN do

                                dirfd = openat2(parentfd, "subdir", ...);
                                getdents64(dirfd, ...); // or fdopendir/readdir
                                openat(dirfd, name, ...); // act relative to the same directory

                                That avoids races involving CWD, replaced parent paths, symlinked path components, and “I checked one path but opened another”.

                                The entries themselves can still change while you read. Another process can create, delete, rename, or replace name after readdir() returns it and before openat() uses it. Linux does not make readdir() a frozen transaction. A directory fd pins the directory, not its contents.

                                So you'd

                                • open directory by fd
                                • read entry name as bytes
                                • openat(dirfd, entry_name, flags that express intent)
                                • fstat the returned fd
                                • decide based on the object actually opened
                                • operate on the fd, not the path

                                For recursive traversal, you extend the same rule: open child directories with openat() or openat2(), reject symlinks with flags/resolution constraints, keep dirfds on a stack, and perform later operations relative to those dirfds.

                                The oss-sec report’s uutils examples are mostly failures of this kind: path-based second operations, permission changes after creation, missing O_NOFOLLOW, missing O_EXCL, or creating too broadly and tightening later.

                                A truly race-free directory listing would mean one of three things:

                                • A filesystem snapshot.
                                • Kernel support for transactional directory enumeration plus later object resolution against that transaction.
                                • Locking/excluding all concurrent mutation, which Unix generally does not provide as a normal directory API.

                                None of that is desireable, because it will scale like shit.

                                A snapshot is desirable for backups, indexing, forensics, package database consistency, and reproducible tree copies. But then the right answer is usually “use a filesystem snapshot”, not “make readdir() magic”.

                                1 Reply Last reply
                                1
                                0
                                • R relay@relay.infosec.exchange shared this topic
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups