With fake memory mapping it would not access the same memory, but you'd still run the same code, right? So the timing/details of access may change a lot, but the profile of executed code would be still meaningful. For example, if you run a loop with range checks, and a loop without them, it doesn't matter what's happening in the loop, as long as it stays consistent between the runs - you can still tell things about the loop overhead.
Would need a somewhat realistic emulation of the NIC setting/clearing these flags. Also, the by far slowest steps are the MMIO access because it involves a full PCIe round trip which is hard to emulate.
It will behave different at the level we are looking at here. It's easier to use real hardware; we specifically chose this NIC because it's probably the most common 10G NIC on servers.
Or did I miss something more nuanced?