Does TensorRT Model Optimizer Support aarch64 for LLaMA 3.1 Optimization?

samit1 · September 19, 2024, 2:53pm

Description

I am currently working on optimizing LLaMA 3.1 using the TensorRT Model Optimizer (nvidia-modelopt) and TensorRT-LLM. I would like to know if the Model Optimizer is compatible with the aarch64 architecture as the official documentation states that the system requirement is x86_64.

Specifically, I am interested in:

Official support for aarch64.
Any potential workarounds or methods to enable its functionality on aarch64 systems if it is not supported.
Any performance considerations or limitations I should be aware of when attempting to use it on this architecture.

Thank you for your assistance!

Environment

TensorRT-LLM version: 0.14.0.dev2024091700
GPU Type: GH200
Nvidia Driver Version: 550.90.12
CUDA Version: 12.5
Operating System + Version: Ubuntu 22.04.4 LTS

jwalley1 · July 15, 2025, 1:28pm

Hey I’m right there with you. I have not been able to get any LLM Servers running under aarch64 I’ve tried vLLM, Llama.cpp, TensorRT, TensorRT-LLM I ended up writing my own engine. it works, kind of. But it only uses a portion of the capability of the chip. I’m only getting like 12tok/sec i should be pushing over 50. I think its a transformer issue on aarch64. Ive been beating my head against the wall for a week. I will say the ubuntu 24.04 and cuda 12.8 including torch cu128 will compile in most cases. it won’t work on ubuntu 22.04

Topic		Replies	Views
Arm64 + gh200 LLM Engine issues TensorRT llama	2	49	July 27, 2025
Inquiry on any updated support for tensorrt-llm support nvidia orin AGX? Jetson AGX Orin tensorrt , generative_ai , llama	4	60	June 3, 2025
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1754	January 25, 2024
Supercharging Llama 3.1 across NVIDIA Platforms Technical Blog	14	203	September 17, 2024
Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model Optimizer on NVIDIA H200 GPUs Technical Blog llama	2	65	September 17, 2024
NIM TensorRT-LLM on H100 NVL Models nim , llama-31-8b-instruct , llama	2	173	November 22, 2024
NVIDIA TensorRT-LLM 및 NVIDIA Triton Inference Server로 Meta Llama 3 성능 강화 Technical Blog - South Korea	1	289	May 3, 2024
Recommend Compute for running a TensorRT-LLM using LLama2 13B & 70B model TensorRT	2	1058	November 15, 2023
TensorRT LLM for NIM Models nim	3	270	January 7, 2025
Beyond the Algorithm: The New PyTorch Architecture for TensorRT-LLM Announcements	1	201	April 21, 2025

Does TensorRT Model Optimizer Support aarch64 for LLaMA 3.1 Optimization?

Description

Environment

Related topics