Our focus was on latency. The reason we were able to cut it down was due to the fact that eBPF-based automatic instrumentation separates the recording from the processing.
The main factor for reduced latency is the separation between recording and processing of data. The eBPF programs are the only overhead for the instrumented process in terms of latency. The eBPF programs transfer the collected data to a separate process which handles all the exporting. In contrast to manually adding code to an application which adds latency and memory footprint in terms of handling the exported data.
but the processing will still cost CPU time which takes it away from the 'main' process, unless it's transferred away from the machine and processed elsewhere. Unless if eBPF can do such processing much more efficiently than the application's own code, i don't see how it reduces latency differently from a properly threaded app. Of course, using eBPF makes an app instrument-able without changes is good enough a reason to use it.
Compared to multi-threaded process, there is still a big advantage in terms of latency. Handling all the exporting in the same process will greatly effect GC operations which require stop the world handling.
Because odigos uses different technologies for instrumentation, it actually depends on the programming language of your applications. Anyway, the performance overhead should be minimal.