I read NVBit’s paper, and wondering how it implements Instrumentation.
In the paper, the original code was modified to the instrumented code and will call a trampoline to execute user-defined instrumentation function.
- Is the instrumented code generated in GPU side, or it’s a new kernel generated in CPU side, and transferred to GPU?
- Is the trampoline pre-generated in CPU side and built into Instrumented code, or it’s dynamically generated in GPU side?