TensorRT inference Time

Hi,
I understand that my Tensorflow model should run faster on Jetson TX2 using TensorRT.
But after converting my TF model to TensorRT I found out that the inference time is slower with TensorRT engine, 80 ms instead of 20 ms.

My net:
1 Input1 of 1,448,576
1 Input1 of 1,448,576
1 output of 5,233,297

After converting to uff I run this function once:

def preprare_inference(self, channel_size, height, width, batch_size):             # Allocate pagelocked memory             self.output = pycuda.pagelocked_empty(5 * 233 * 297, dtype=np.float32)             # alocate device memory             self.d_input1 = pycuda.mem_alloc(1 * 448 * 576 * 4)             self.d_input2 = pycuda.mem_alloc(1 * 448 * 576 * 4)             self.d_output = pycuda.mem_alloc(1 * 5 * 233 * 297 * 4)              self.stream = cuda.Stream()             self.bindings = [int(self.d_input1), int(self.d_input2), int(self.d_output)] 

and run with the following code

def do_infer(self, input1, input2):             input1 = input1.astype(np.float32)             input2 = input2.astype(np.float32)             cuda.memcpy_htod_async(self.d_input1, input1,self.stream)             cuda.memcpy_htod_async(self.d_input2, input2,self.stream)              # execute model             self.context.enqueue(1, self.bindings, self.stream.handle, None)             # transfer predictions back             cuda.memcpy_dtoh(self.output, self.d_output)              return np.reshape(self.output, (5, 233, 297)) 

Can you please help me understand how is it possible?
Thanks

Hello,

can you share the UFF with us? What version of TF and TRT are you using?

thanks