83 comments

  • sedatk a day ago ago

    That's pretty much the Linux equivalent of Device Simulation Framework we had for Windows back in 2000's.

    In the presentation below, only the USB capabilities of it is discussed, but it was able to simulate PCI devices too.

    https://download.microsoft.com/download/5/b/9/5b97017b-e28a-...

  • tiernano 2 days ago ago

    Hmmm.... Wondering if this could be eventually used to emulate a PCIe card using another device, like a RaspberryPi or something more powerful... Thinking the idea of a card you could stick in a machine, anything from a 1x to 16x slot, that emulates a network card (you could run VPN or other stuff on the card and offload it from the host) or storage (running something with enough power to run ZFS and a few disks, and show to the host as a single disk, allowing ZFS on devices that would not support it). but this is probably not something easy...

    • cakehonolulu a day ago ago

      Hi! Author here! You can technically offload the transactions the real driver on your host does to wherever you want really. PCI is very delay-tolerant and it usually negotiates with the device so I see not much of an issue doing that proven that you can efficiently and performantly manage the throughput throughout the architecture. The thing that kinda makes PCIem special is that you are pretty much free to do whatever you want with the accesses the driver does, you have total freedom. I have made a simple NVME controller (With a 1GB drive I basically malloc'd) which pops up on the local PCI bus (And the regular Linux's nvme block driver attaches to it just fine). You can format it, mount it, create files, folders... it's kinda neat. I also have a simple dumb rasteriser that I made inside QEMU that I wanted to write a driver for, but since it doesn't exist, I used PCIem to help me redirect the driver writes to the QEMU instance hosting the card (Thus was able to run software-rendered DOOM, OpenGL 1.X-based Quake and Half-Life ports).

      • yndoendo a day ago ago

        Just to hijack this thread on how resilient PCIe is. PS4 Linux hackers ran PCIe over UART serial connection to reverse engineer the GPU. [0] [1]

        [0] https://www.psdevwiki.com/ps4/PCIe

        [1] https://fail0verflow.com/blog/2016/console-hacking-2016-post...

      • topspin a day ago ago

        > PCI is very delay-tolerant

        That fascinates me. Intel deserves a lot of credit for PCI. They built in future proofing for use cases that wouldn't emerge for years, when their bread and butter was PC processors and peripheral PC chips, and they could have done far less. The platform independence and general openness (PCI-SIG) are also notable for something that came from 1990 Intel.

      • tonyplee a day ago ago

        Can one make a PCIe analyzer out of your code base by proxy all transactions thru a virtual PCIem driver to a real driver?

        • cakehonolulu a day ago ago

          You can definitely proxy the transactions wherever you may see fit, but I'm not completely sure how that'd work.

          As in, PCIem is going to populate the bus with virtually the same card (At least, in terms of capabilities, vendor/product id... and whanot) so I don't see how you'd then add another layer of indirection that somehow can transparently process the unfiltered transaction stream PCIem provides to it to an actual PCIe card on the bus. I feel like there's many colliding responsabilities in this.

          I would instead suggest to have some sort of behavioural model (As in, have a predefined set of data to feed from/to) and have PCIem log all the accesses your real driver does. That way the driver would have enough infrastructure not to crash and at the same time you'd get the transport layer information.

          • tonyplee a day ago ago

            Maybe: if PCIe device in on BDF 00AA:BB:00, create the proxy device on 00AA:BB:01 and the typical PCIe utils that talk to default 00AA:BB:00 will stead be config to talk to 00AA:BB:01 node. Some wireshark plugin will get the sniffed data (io, memory read/write, DMA read/write, etc) from the virtual device interface.

            Ideally, the setup might be genetic enough to apply to all (most?) of the pcie device/driver....

      • baruch a day ago ago

        Is it possible to put such a driver for nvme under igb_uio or another uio interface? I have an app that uses raw nvme devices and being able to tests strange edge cases would be a real boon!

      • jacquesm a day ago ago

        Fantastic tool, thank you for making this it is one of those things that you never knew you needed until someone took the time to put it together.

      • s4mbh4 a day ago ago

        I wonder if it's possible to create a wire shark plugin for analyzing PCIE?

      • gigatexal a day ago ago

        This is really interesting. Could it be used to carve up a host GPU for use in a guest VM?

        • anonymous123 a day ago ago

          Depends on the use-case. For the standard hardware-accelerated guest GPU in virtualized environments, there's already QEMU's virtio-gpu device.[1]

          For "carving up" there are technologies like SR-IOV (Single Root I/O Virtualization).[2]

          For advanced usage, like prototyping new hardware (host driver), you could use PCIem to emulate a not-yet-existing SR-IOV-capable GPU. This would allow you to develop and test the host-side driver (the one that manages the VFs) in QEMU without needing the actual hardware.

          Another advanced use-case could be a custom vGPU solution: Instead of SR-IOV, you could try to build a custom paravirtualized GPU from scratch. PCIem would let you design the low-level PCIe interface for this new device, while you write a corresponding driver on the guest. This would require significant effort but it'd provide you complete control.

          [1] https://qemu.readthedocs.io/en/v8.2.10/system/devices/virtio...

          [2] https://en.wikipedia.org/wiki/Single-root_input/output_virtu...

          • gigatexal 17 hours ago ago

            Really cool stuff thanks for creating it

        • cakehonolulu a day ago ago

          As in, getting the PCIem shim to show up on a VM (Like, passthrough)? If that's what you're asking for, then; it's something being explored currently. Main challenges come from the subsystem that has to "unbind" the device from the host and do the reconfiguration (IOMMU, interrupt routing... and whatnot). But from my initial gatherings, it doesn't look like an impossible task.

        • fc417fc802 a day ago ago

          > carve up

          Passthru or time sharing? The latter is difficult because you need something to manage the timeslices and enforce process isolation. I'm no expert but I understand it to be somewhere between nontrivial and not realistic without GPU vendor cooperation.

          Note that the GPU vendors all deliberately include this feature as part of their market segmentation.

          • benreesman a day ago ago

            It would need to implement a few dozen ioctls, correctly stub the kernel module in guests, do a probably memory-safe assignment of GPU memory to guest, and then ultimately map that info to BAR/MSI-X semantics of a real kernel module. You could get VFIO pretty fast for a full start by correctly masking LTR bits, but to truly make it free you'd need a user space io_uring broker that had survived hundreds of millions of adversarial fuzz runs because there's only so fast the firmware blob can run even if it's preloaded into initramfs.

            Serious work, detail intense, but not so different in design to e.g. Carmack's Trinity engine. Doable.

    • MisterTea a day ago ago

      This kind of stuff is stupid easy on an OS like Plan 9 where you speak a single protocol: 9P. Ethernet devices are abstracted and served by the kernel as a file system explained in ether(3). Since it's all 9P the system doesn't care where the server is running; could be a local in-kernel/user-space server or remote server over ANY 2-way link including TCP, IL, PCIe link, RS232 port, SPI, USB, etc. This means you can mount individual pieces of hardware or networking stacks like ip(3), any 9P server, from other machines to a processes local namespace. Per-process name spaces let you customize the processes view of the file system and hence all its children allowing you to customize each and every programs resource view.

      There is interest in getting 9front running on the Octeon chips. This would allow one to run anything they want on an Octeon card (Plan 9 cross platform is first class) so one could boot the card using the hosts root file system, write and test a program on the host, change the objtype env variable to mips/arm, build the binary for the Octeon and then run it on the Octeon using rcpu (like running a command remotely via ssh.) All you need is a working kernel on the Octeon and a host kernel driver and the rest is out of the box.

      • 3PS a day ago ago

        This is also the case with Google Fuchsia, just replace 9P with FIDL. I'm really hoping Fuchsia doesn't end up just being vaporware since it has made some very interesting technical decisions (often borrowing from Plan 9, NixOS, and others.)

    • pjc50 2 days ago ago

      > emulate a PCIe card using another device

      The other existing solution to this is FPGA cards: https://www.fpgadeveloper.com/list-of-fpga-dev-boards-for-pc... - note the wide spread in price. You then also have to deal with FPGA tooling. The benefit is much better timing.

      • cakehonolulu a day ago ago

        Indeed, and even then, there's some sw-hw-codesign stuff that kinda helps you do what PCIem does but it's usually really pricey; so I kinda thought it'd be a good thing to have for free.

        PCIe prototyping is usually not something super straightforward if you don't want to pay hefty sums IME.

        • immibis a day ago ago

          The "DMA cards" used for video game cheating are generic PCIe cards and (at least the one I got) comes with open documentation (schematics, example projects etc).

          • the_biot a day ago ago

            What's this? Hardware specifically for game cheating? Got any links?

            • selectodude a day ago ago

              If you search “DMA card”, there’s a lot of DMA cards all over the internet.

            • idiotsecant a day ago ago

              Direct Memory Access (DMA) via PCI-e bypasses anti-cheat in the OS because the OS doesn't see the call to read or write the memory. There's no process to spy on, weird drivers, system calls, etc. You can imagine that maybe the anticheat could detect writes that perform a cheat by this method, but it has zero chance of detecting a wallhack style cheat that just reads memory. This is getting to be less relevant with modern OSs, though. Window 11 has IOMMU which only allows DMA to a given memory region defined per device. I think it should be impossible to do this on win11.

              • ranger_danger 6 hours ago ago

                Don't you still need a driver to control the DMA card to tell it what to do with the memory?

    • Palomides a day ago ago

      some ARM chips can do PCIe endpoint mode, and the kernel has support for pretending to be an nvme ssd https://docs.kernel.org/nvme/nvme-pci-endpoint-target.html

    • xerxes901 2 days ago ago

      Something like the stm32mp2 series of MCUs can run Linux and act as a PCIe endpoint you can control from a kernel module on the MCU. So you can program an arbitrary PCIe device that way (although it won’t be setting any speed records, and I think the PHY might be limited to PCIe 1x)

      • jdub a day ago ago

        (Ha, nice to see Jon Corbet's name on the PCI Endpoint documentation...)

      • tiernano 2 days ago ago

        interesting... x1 would too slow for large amounts of storage, but as a test, a couple small SSDs could potentially be workable... sounds like im doing some digging...

        • jacquesm a day ago ago

          There are many workloads that would not be able to saturate even an x1 link, it all depends on how much of the processing can be done internally to whatever lives on the other side of that link. Raw storage and layer-to-layer communications in AI applications are probably the worst cases but there are many more that are substantially better than that.

        • cakehonolulu a day ago ago

          If there's any particular feature you feel you are missing on PCIem or anything, feel free to open an Issue and I'll look into it ;)

    • unsnap_biceps a day ago ago

      I ordered a pair of Orange PI 5+'s to work on playing with programming a PCIe device, but haven't made the time to get it working yet.

      https://blog.reds.ch/?p=1759 and https://blog.reds.ch/?p=1813 is what inspired me to play with it.

      • tiernano 13 hours ago ago

        ohh now thats cool! Thanks for the links!

    • asdefghyk a day ago ago

      Could add one or more (reprograble?) FPGA's for extra? processing power OR reconfiguration ease to such a card ......

      I've often wondered why such a card (with FPGA) is not available for retro? computer emulation or simulation ??

    • hsbauauvhabzb 2 days ago ago

      … or pcie over ethernet ;)

    • wmf a day ago ago

      This is what DPUs are for.

    • hhh a day ago ago

      this is what dma cards do

    • justsomehnguy a day ago ago
    • immibis a day ago ago

      I recently bought a DMA cheating card because it's secretly just an FPGA PCIe card. Haven't tried to play around with it yet.

      Seems unlikely you'd emulate a real PCIe card in software because PCIe is pretty high-speed.

  • krupan a day ago ago

    So just to be clear, you have to boot up the physical machine with a kernel command-line argument to reserve some RAM for this to work? And the amount of RAM you reserve is for BAR memory? If you wanted multiple PCIem devices (can you do that?) you'd need to reserve RAM for each of them?

    • cakehonolulu a day ago ago

      Hi! That's correct. We need a way to have a chunk of what Linux calls "Reserved" memory for the virtual BAR trick. Currently, PCIem only thinks about a single device (Since I first needed to have something that worked in order to check how feasible this all was), but there's planned support for multiple devices that can share a "Reserved" memory pool dynamically so you can have multiple BARs for multiple devices.

  • Surac 2 days ago ago

    that is a huge win if you are developing drivers or even real hardware. it allows to iterate on protokols just with the press of a button

    • cakehonolulu a day ago ago

      Indeed, the project has gone through a few iterations already (It was first a monolithic kernel module that required a secondary module to call into the API and whatnot). I've went towards a more userspace-friendly usage mainly so that you can iterate your changes much, much faster. Creating the synthetic PCI device is as easy as opening the userspace shim you program, it'll then appear on your bus. When you want to test new changes, you close the shim normally (Effectively removing it from the bus) and you can do this process as many times as needed.

      • LarsKrimi a day ago ago

        Latching on to this thread, but can you make as simple as possible of an example?

        Something like just a single BAR with a register that printfs whatever is written

        • cakehonolulu a day ago ago

          Hi! I do have some rudimentary docs on which I made a simple device for example pruposes: https://cakehonolulu.github.io/docs/pciem/simple_device_walk...

          Hopefully this is what you're searching for!

          • LarsKrimi a day ago ago

            Hi, thanks. That's almost it. The remaining problem is just how to tie it together (where do I put the handle_mmio_read pointer or which event should it be handled in?)

            PCIEM_EVENT_MMIO_READ is defined but not used anywhere in the codebase

            • cakehonolulu a day ago ago

              Hi! Sorry, this is an issue on my side; I forgot to update the documentation's example with the latest changes.

              You basically have the kernel eventfd notify you about any access triggered (Based on your configuration), so from userspace, you have the eventfd and then you mmap the shared lock-less ring buffer that actually contains the events PCIem notifies (So you don't end up busy polling).

              You basically mmap a struct pciem_shared_ring where you'll have your usual head/tail pointers.

              From then on, on your main, you'd have a select() or a poll() for the eventfd; when PCIem notifies the userspace you'd check head != tail (Which means there are events to process) and you can basically do:

              struct pciem_event *event = &event_ring->events[head]; atomic_thread_fence(memory_order_acquire); if (event->type == PCIEM_EVENT_MMIO_WRITE) handle_mmio_read(...);

              And that's it, don't forget to update the head pointer!

              I'll go and update the docs now. Hopefully this clears stuff up!

              • LarsKrimi a day ago ago

                Is this stuff written by an AI?

                The documentation still refers to PCIEM_EVENT_MMIO_READ but it's never referenced in the code on the main branch

                I'll admit that I asked for a simple compilable example illustrating something simple like the read events because it looks like it's just reading from the shared memory, and maybe generating an event for any read or write access with the PCIEM_EVENT_MMIO_WRITE event type

                • cakehonolulu 14 hours ago ago

                  Hi!

                  PCIEM_EVENT_MMIO_READ is kept for reference on the while (head != tail) loop just to remind the user that there can be more than 1 event registered in terms of access type.

                  Let's say that you register your watchpoint in READ mode (I still have to change the IOCTL for that as currently is hardcoded for writes: attr.bp_type = HW_BREAKPOINT_W), then you'd be consuming PCIEM_EVENT_MMIO_READ events instead of PCIEM_EVENT_MMIO_WRITE.

                  The fact that the PCIEM_EVENT_MMIO_READ define is there is to help me remind me to incorporate that missing logic.

    • asimovDev a day ago ago

      Could you explain in layman terms how it would help with developing PCIE hardware / drivers? I can immediately imagine something like writing more robust unit tests and maybe developing barebones drivers before you get access to actual hardware, but that's where my imagination runs out of fuel.

      • cakehonolulu a day ago ago

        Sure! Let's say you (Or the company you work for) are trying to develop an NVME controller card, or a RAID card, or a NIC...

        Usually, without actual silicon, you are pretty limited on what you can do in terms of anticipating the software that'll run.

        What if you want to write a driver for it w/o having to buy auxiliary boards that act as your card? What happens if you already have a driver and want to do some security testing on it but don't have the card/don't want to use a physical one for any specific reason (Maybe some UB on the driver pokes at some register that kills the card? Just making disastrous scenarios to prove the point hah).

        What if you want to add explicit failures to the card so that you can try and make the driver as tamper-proof and as fault-tolerant as possible (Think, getting the PCI card out of the bus w/o switching the computer off)?

        Testing your driver functionally and/or behaviourally on CI/CD on any server (Not requiring the actual card!)?

        There's quite a bunch of stuff you can do with it, thanks to being in userspace means that you can get as hacky-wacky as you want (Heck, I have a dumb-framebuffer-esque and OpenGL 1.X capable QEMU device I wanted to write a driver for fun and I used PCIem to forward the accesses to it).

  • iamoutoftouch a day ago ago

    How is that better than emulating the device in QEMU or with something like libvfio-user (which also works on top of QEMU)?

    • cakehonolulu a day ago ago

      I feel like libfvio-user is a cool project and works perfectly fine, that is, if you want to have the device on the host's userspace but exposed to a VM (QEMU, in this case).

      PCIem kinda does that, but it's down a level; in terms of, it basically pops the device on your host PCI bus, which lets real, unmodified drivers to interact with the userspace implementation of your card, no QEMU, no VM, no hypervisors.

      Not saying that you can then, for instance, forward all the accesses to QEMU (Some people/orgs already have their cards defined in QEMU so it'd be a bit pointless to redefine the same stuff over and over, right?) so they're free to basically glue their QEMU stuff to PCIem in case they want to try the driver directly on the host but maintaining the functional emulation on QEMU. PCIem takes care of abstracting the accesses and whatnot with an API that tries to mimick that the cool people over at KVM do.

  • throwaway132448 2 days ago ago

    Tangential question: PCIe is a pretty future-proof technology to learn/invest in, right? As in, it is very unlikely to become obsolete in the next 5-10 years (like USB)?

    • pjc50 2 days ago ago

      Neither of those is going to be obsolete in 5 years. Might get rebadged and a bunch of extensions, but there's such a huge install base that rapid change is unlikely. Neither Firewire nor Thunderbolt unseated USB.

      • formerly_proven a day ago ago

        USB4 is the ~third USB protocol stack though (USB1/2 being basically the same iirc, USB3 being a completely separate protocol that neither logically nor physically interacts with USB1/2 at all), heavily based on Thunderbolt to the point of backwards compatibility.

        • p_l a day ago ago

          USB4 is essentially thunderbolt with some new features and some features being optional instead of mandatory.

          • formerly_proven a day ago ago

            A very noticeable feature is that USB4 can tunnel USB3, which means it works like an USB hub, instead of an external PCIe USB controller (like in Thunderbolt). USB2 is still just physically separately transported over the D+/D- pins.

            • p_l a day ago ago

              USB4 actually provides both USB 1/2 and 3 tunnelling, but it's incorrect to say it behaves like a hub because it involves needing an appropriate endpoint on the other end. Effectively a virtual cable, iirc, though there are at least two different mechanisms.

    • CupricTea a day ago ago

      PCIe is probably the most future proof technology we have right now. Even if it is upheaveled at the hardware level, from the software perspective it just exposes a device's arbitrary registers to some memory mapped location. Software drivers for PCIe devices will continue to work the same.

    • neocron 2 days ago ago

      Might as well be replaced by optical connectors next years, but who knows in advance. Currently there is no competition

      • tiernano 2 days ago ago

        even though it would be optical, it still is using PCIe protocols in the background...

        • bobmcnamara a day ago ago

          PCIe is still using PCI protocol just over serdes

        • embedding-shape a day ago ago

          How could you possibly know exactly what protocol they'd be using for the potential future optical PCIe connection? Your guess is as good as anyone's, no?

          • p_l a day ago ago

            Probably because optical PCI-E is an old thing by now.

            In fact, "zero~th generation" of thunderbolt used optical link, too. Also both thunderbolt and DisplayPort reuse a lot of common elements from PCI-E

      • pjc50 a day ago ago

        Hmm. What's the current maths on distance vs edge rate vs transceiver latency vs power consumption on when that would be a benefit? Not to mention how much of a pain it is to have good optical connectors.

        I wouldn't expect that to be mainstream until after optical networking becomes more common, and for consumer hardware that's very rare (apart from their modem).

    • checker659 2 days ago ago

      Curious what you mean by learning? Learning about TLPs? Learning about FPGA DMA Engines like XDMA? Learning about PCIe switches / retimers? Learning about `lspci`?

    • GrowingSideways a day ago ago

      PCIe expertise will certainly outlive anyone on this forum.

  • agent013 a day ago ago

    I've been burned before by driver bugs that only manifested under very specific timing conditions or malformed responses from the device, tnx

    • cakehonolulu a day ago ago

      Anytime, hopefully it fits your needs and helps you not spend more time than needed tracing issues like this. Thanks for the comment!

  • _lunix a day ago ago

    very interesting work! I've been exploring a different idea on the side, using SPDK+libvfio-user [0] to emulate PCIe devices inside QEMU, which doesn't require a kernel module but it's a bit less flexible than this approach.

    [0] https://movementarian.org/blog/posts/2025-08-27-vfio-user-cl...

    • cakehonolulu 9 hours ago ago

      Highly interesting! I kinda wanted not to rely on QEMU as a default "end" for the emulation (As in, I want the end user to be able to choose whatever userspace shim/transport layer/thing they want), but for some of my tests I did forward accesses to QEMU itself (And worked wonders). Thanks for that link! Super cool stuff!

  • JoshTriplett a day ago ago

    Any plans to upstream the kernel-side support?

    • cakehonolulu a day ago ago

      I'd love to! Sure sounds like the natural next step for this.

  • petabyt a day ago ago

    vhci-hcd for USB has been so useful for usb development. Especially for testing usb driver code in CI.

  • brcmthrowaway a day ago ago

    How would I do this under macOS?

    • cakehonolulu 9 hours ago ago

      Unfortunately not with PCIem... I don't know how the XNU kernel does PCIe stuff; maybe something can be done there with a Kext module but no idea.