I can't get result from TensorRT model

837720757 · May 29, 2022, 12:28pm

Description

I tried to convert the GPT model from pytorch to onnx and then to tensorRT, I successfully converted to tensorRT engine, but I can’t get the results I want during the inference phase, I can guarantee that the onnx model is correct. These two warnings appeared in the process of converting the onnx model to the tensorRT engine. I don’t know if these two warnings will affect the engine conversion.

[05/29/2022-19:08:00] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[05/29/2022-19:08:01] [TRT] [W] ShapedWeights.cpp:173: Weights transformer.h.8.attn.c_attn.weight has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.

The code that onnx converts to tensorRT：

import tensorrt as trt  logger = trt.Logger(trt.Logger.WARNING)  builder = trt.Builder(logger)  network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))  parser = trt.OnnxParser(network, logger)  success = parser.parse_from_file('model.onnx') # for idx in range(parser.num_errors): #     print(parser.get_error(idx))  if not success:     pass # Error handling code here  config = builder.create_builder_config() #config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 20) # 1 MiB config.max_workspace_size = 1 << 31  profile = builder.create_optimization_profile()   profile.set_shape("input_ids", (1, 1), (1, 20), (1, 300)) profile.set_shape("token_type_ids", (1, 1), (1, 20), (1, 300)) config.add_optimization_profile(profile)  serialized_engine = builder.build_serialized_network(network, config) with open("sample4.engine", "wb") as f:     f.write(serialized_engine)

The main code to inference, input_ids and token_type_ids is two input for the model.

context.active_optimization_profile = 0 origin_inputshape = context.get_binding_shape(0) origin_inputshape[0],origin_inputshape[1] = input_ids.shape context.set_binding_shape(0,(origin_inputshape)) context.set_binding_shape(1,(origin_inputshape))   inputs, outputs, bindings, stream = common.allocate_buffers(engine) inputs[1].host = input_ids inputs[0].host = token_type_ids  logits, *_= common.do_inference_v2(context,bindings = bindings, inputs= inputs, outputs=outputs, stream = stream)

the model I want to convert is OpenAIGPTLMHeadModel, I can only put one link, but you can cheack it from huggingface

Environment

TensorRT Version: 8.2.5.1
GPU Type: RTX 3060
Nvidia Driver Version: 497.38
CUDA Version: 11.5.1
CUDNN Version: 8.2.1.32
Operating System + Version: Windows11
Python Version (if applicable): 3.8.13
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11
Baremetal or Container (if container which image + tag):

Relevant Files

github link to my code
RuntensorRT is inference phase

NVES · May 29, 2022, 12:37pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

837720757 · May 29, 2022, 1:18pm

1.Validation results

2.I try run trtexec with ‘./trtexec --onnx=D:\Subject\dialogue\CDial-GPT\model.onnx --saveEngine=D:\Subject\dialogue\CDial-GPT\sample.engine --fp16 --workspace=10000 --minShapes=input_ids:1x1,token_type_ids:1x1 --optShapes=input_ids:1x300,token_type_ids:1x300 --maxShapes=input_ids:1x300,token_type_ids:1x300 --device=0 --verbose --exportTimes=trace.json’
here are all the logs I can get
logs.txt (683.1 KB)

837720757 · May 29, 2022, 1:31pm

The onnx file is too big to upload,I am uploading the onnx model to google drive, can I have your email so I can share with you, or you have a more convenient way.

837720757 · May 29, 2022, 1:35pm

It’s trace.josn
trace.json (169.2 KB)

837720757 · May 29, 2022, 1:38pm

I have reply blow my question, please check.

837720757 · May 29, 2022, 1:51pm

The result is no longer 0, but the dimension is still wrong, the correct dimension is 3.

837720757 · May 29, 2022, 2:08pm

The problem seems to be in allocate_buffers(engine):, I changed the size to quantitative before, because the size obtained by trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size is a negative number, so in host_mem = cuda.pagelocked_empty(size, dtype), the error is ‘pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory’ , how can I solve this problem?

def allocate_buffers(engine):     inputs = []     outputs = []     bindings = []     stream = cuda.Stream()     for binding in engine:         print(engine.get_binding_shape(binding))         size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size         dtype = trt.nptype(engine.get_binding_dtype(binding))         # Allocate host and device buffers         host_mem = cuda.pagelocked_empty(size, dtype)         device_mem = cuda.mem_alloc(host_mem.nbytes)         # Append the device buffer to device bindings.         bindings.append(int(device_mem))         # Append to the appropriate list.         if engine.binding_is_input(binding):             inputs.append(HostDeviceMem(host_mem, device_mem))         else:             outputs.append(HostDeviceMem(host_mem, device_mem))     return inputs, outputs, bindings, stream

spolisetty · May 31, 2022, 1:13pm

Hi,

The above error is related to dimensions, maybe you’re not handling the dynamic shape correctly.
Could you use context.get_binding_shape correctly for the engine with dynamic shape.

Please share with us issue repro script and model to try from our end if you still face this issue.

Thank you.

Topic		Replies	Views
Incorrect inference results after converting from ONNX to TRT with trtexec TensorRT tensorrt , python , onnx	4	1632	December 9, 2022
ONNX to TensorRT conversion TensorRT	3	749	July 6, 2023
Onnx to tensorrt inference fail TensorRT tensorrt	1	513	August 24, 2021
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1180	January 19, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1105	December 13, 2022
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5473	June 29, 2022
Pytorch model convert to TensorRT engine failed TensorRT tensorrt , pytorch , onnx	5	1223	December 28, 2020
Wrong result in TensorRT, but it seems something is working correctly TensorRT tensorrt , onnx	9	2519	September 29, 2022
Converting from ONNX to TensorRT Fails when saving engine TensorRT	3	1085	July 8, 2020
Miss ouput when convert Onnx to TensorRT DeepStream SDK nvbugs	8	501	October 12, 2021

I can't get result from TensorRT model

Description

Environment

Relevant Files

check_model.py

Related topics