Does TensorRT Model Optimizer Support aarch64 for LLaMA 3.1 Optimization?

Description

I am currently working on optimizing LLaMA 3.1 using the TensorRT Model Optimizer (nvidia-modelopt) and TensorRT-LLM. I would like to know if the Model Optimizer is compatible with the aarch64 architecture as the official documentation states that the system requirement is x86_64.

Specifically, I am interested in:

  • Official support for aarch64.

  • Any potential workarounds or methods to enable its functionality on aarch64 systems if it is not supported.

  • Any performance considerations or limitations I should be aware of when attempting to use it on this architecture.

Thank you for your assistance!

Environment

TensorRT-LLM version: 0.14.0.dev2024091700
GPU Type: GH200
Nvidia Driver Version: 550.90.12
CUDA Version: 12.5
Operating System + Version: Ubuntu 22.04.4 LTS

Hey I’m right there with you. I have not been able to get any LLM Servers running under aarch64 I’ve tried vLLM, Llama.cpp, TensorRT, TensorRT-LLM I ended up writing my own engine. it works, kind of. But it only uses a portion of the capability of the chip. I’m only getting like 12tok/sec i should be pushing over 50. I think its a transformer issue on aarch64. Ive been beating my head against the wall for a week. I will say the ubuntu 24.04 and cuda 12.8 including torch cu128 will compile in most cases. it won’t work on ubuntu 22.04