SLURM errors: failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error; GPU:0 unknown device

Robert_Kudyba · August 18, 2021, 4:22pm

We have a SLURM batch file that fails with TF2 and Keras, and also fails when called directly on a node that has a GPU. Here is the Python script contents:

from datetime import date import numpy as np import matplotlib.pyplot as plt  import pandas as pd from sklearn.decomposition import PCA from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense, SimpleRNN from keras.optimizers import adam from keras.layers import Dropout from tensorflow.keras.callbacks import Callback, EarlyStopping from sklearn.preprocessing import StandardScaler from datetime import datetime, timedelta from sklearn.metrics import r2_score, mean_squared_error, accuracy_score from keras.layers.core import Dense, Dropout, Activation from keras.layers.recurrent import LSTM from keras.models import load_model from keras.callbacks import EarlyStopping, ModelCheckpoint import warnings import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = "3" warnings.filterwarnings('ignore') import tensorflow as tf import logging logging.getLogger('tesorflow').setLevel(logging.FATAL) delay = 252 window = 60 factor = 15 K = 8.4 sbo = 1.25 sso = 1.25 sbc = 0.75 ssc = 0.5 r = 0.02 tran_cost = 0.0002 leverage = 1.0 start_val = 100 bo = 1 so = -1 X_pd=pd.read_pickle('./data/X_pd.pkl') X = pd.DataFrame(columns=range(0, window)) Y = [] for tag in X_pd.columns[:1]:     # i=0 ....len(X_pd.index)-window     for i in range(0, len(X_pd.index) - window):         X_example = X_pd.loc[i:i + window - 1][tag].values          X= X.append(pd.Series(X_example), ignore_index=True)         Y.append(X_pd.loc[i + window][tag])     print('done %s stocks' % (tag)) Y=pd.DataFrame(Y) #normalization SS = StandardScaler() features = SS.fit_transform(X.values) X=features X=pd.DataFrame(X) #LSTM model def trainLSTMModel(layers, neurons, d):     model = Sequential()      model.add(LSTM(neurons[0], input_shape=(layers[1], layers[2]), return_sequences=False,activation='relu'))     #model.add(Dropout(d))      #model.add(LSTM(neurons[1], input_shape=(layers[1], layers[2]), return_sequences=False))     #model.add(Dropout(d))      #model.add(Dense(neurons[2], kernel_initializer="uniform", activation='relu'))     model.add(Dense(neurons[3], kernel_initializer="uniform", activation='relu'))     optimizer=adam(learning_rate=0.001)     #adam = Adam(decay=0.2)     # predict up and down     # model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])     model.compile(loss='mse', optimizer=optimizer)     model.summary()     return model length=X.shape[0] X=np.array(X) Y=np.array(Y) time_step = 60 d = 0.3 output=1 shape = [length,time_step, output] # feature, window, output neurons = [64, 64, 32, 1] epochs = 100 batch_size=10000 model = trainLSTMModel(shape, neurons, d) #shape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) gpu_no = 0 with tf.device('/gpu:' + str(gpu_no)): #    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) #    keras.backend.set_session(sess)      print('model_manager: running tensorflow version: ' + tf.__version__)     print('model_manager: will attempt to run on ' + '/gpu:' + str(gpu_no))     model.fit(X, Y, epochs=epochs, verbose=2,batch_size=batch_size)

The log shows this:

Loading requirement: cuda10.1/toolkit/10.1.243 Loading cm-ml-python3deps/3.3.0   Loading requirement: gcc5/5.5.0 python36 Loading tensorflow2-py36-cuda10.1-gcc/2.0.0   Loading requirement: ml-pythondeps-py36-cuda10.1-gcc/3.3.0     openblas/dynamic/0.2.20 hdf5_18/1.8.20 keras-py36-cuda10.1-gcc/2.3.1     protobuf3-gcc/3.8.0 nccl2-cuda10.1-gcc/2.7.8 Loading openmpi/cuda/64/3.1.4   Loading requirement: hpcx/2.4.0 2021-08-18 11:11:43.064175: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1  2021-08-18 11:18:08.026219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-08-18 11:18:08.031771: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error 2021-08-18 11:18:08.031811: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node001 2021-08-18 11:18:08.031819: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001 2021-08-18 11:18:08.031921: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1 2021-08-18 11:18:08.031958: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1 2021-08-18 11:18:08.031966: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.73.1 2021-08-18 11:18:08.032266: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F Using TensorFlow backend. done A stocks Model: "sequential_1" _________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm_1 (LSTM)                (None, 64)                16896 _________________________________________________________________ dense_1 (Dense)              (None, 1)                 65 ================================================================= Total params: 16,961 Trainable params: 16,961 Non-trainable params: 0 _________________________________________________________________ model_manager: running tensorflow version: 2.0.0 model_manager: will attempt to run on /gpu:0 Traceback (most recent call last):   File "stocks.py", line 99, in <module>     model.fit(X, Y, epochs=epochs, verbose=2,batch_size=batch_size)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/engine/training.py", line 1213, in fit     self._make_train_function()   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/engine/training.py", line 316, in _make_train_function     loss=self.total_loss)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper     return func(*args, **kwargs)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 75, in symbolic_fn_wrapper     return func(*args, **kwargs)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/optimizers.py", line 519, in get_updates     for (i, p) in enumerate(params)]   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/optimizers.py", line 519, in <listcomp>     for (i, p) in enumerate(params)]   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 963, in zeros     v = tf.zeros(shape=shape, dtype=dtype, name=name)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 2349, in zeros     output = _constant_if_small(zero, shape, dtype, name)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 2307, in _constant_if_small     return constant(value, shape=shape, dtype=dtype, name=name)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant     allow_broadcast=True)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 235, in _constant_impl     t = convert_to_eager_tensor(value, ctx, dtype)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor     return ops.EagerTensor(value, ctx.device_name, dtype) RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

Why is the script not seeing the GPU?

Bhack · August 19, 2021, 2:43pm

Can you try to just list the visibile devices?

Robert_Kudyba · August 19, 2021, 7:43pm

Part of the problem was the code requires TF > 2.0.

The only difference I see is that the user told me he got it to work by adjusting the comment tags as such:

#sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) #keras.backend.set_session(sess)

Now the GPU works.

I also changed:
from keras.optimizers import adam
to
from keras.optimizers import adam_v2
and
optimizer=adam(learning_rate=0.001)
to
optimizer=adam_v2.Adam(learning_rate=0.001)

Before this the logfile blew up to 6 GB with entries like:

2021-08-19 05:08:41.796216: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796223: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796232: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node _SINK}} = NoOp[]() 2021-08-19 05:08:41.796238: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796245: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796255: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod}} = FloorMod[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size) 2021-08-19 05:08:41.796283: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add}} = AddV2[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](loss/dense_1_loss/mean_squared_error/Mean/reduction_indices, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size) 2021-08-19 05:08:41.796303: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size}} = Const[_class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"], dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 2>]() 2021-08-19 05:08:41.796319: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node loss/dense_1_loss/mean_squared_error/Mean/reduction_indices}} = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: -1>]() 2021-08-19 05:08:41.796335: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node _send_training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod_0}} = _Send[T=DT_INT32, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-6529568560417163830, tensor_name="training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod:0"](training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod) 2021-08-19 05:08:41.796357: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0 2021-08-19 05:08:41.796368: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 4 step -1 {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size}} = Const[_class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"], dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 2>]() device: /device:CPU:0 2021-08-19 05:08:41.796378: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 5 step -1 {{node loss/dense_1_loss/mean_squared_error/Mean/reduction_indices}} = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: -1>]() device: /device:CPU:0 2021-08-19 05:08:41.796390: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 3 step -1 {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add}} = AddV2[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](loss/dense_1_loss/mean_squared_error/Mean/reduction_indices, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size) device: /device:CPU:0

Anyways seems to be good now perhaps this will help someone down the line.

Robert_Kudyba · August 20, 2021, 2:53pm

Well in Slurm this still fails

Loading cudnn7.6-cuda10.1/7.6.5.32   Loading requirement: cuda10.1/toolkit/10.1.243 Loading cm-ml-python3deps/3.3.0   Loading requirement: gcc5/5.5.0 python36 Loading tensorflow2-py37-cuda10.1-gcc/2.2.0   Loading requirement: python37 ml-pythondeps-py37-cuda10.1-gcc/4.1.2     openblas/dynamic/0.2.20 hdf5_18/1.8.20 keras-py37-cuda10.1-gcc/2.3.1     protobuf3-gcc/3.8.0 nccl2-cuda10.1-gcc/2.7.8 Loading openmpi/cuda/64/3.1.4   Loading requirement: hpcx/2.4.0 2021-08-20 10:36:18.057370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 Using TensorFlow backend. Traceback (most recent call last):   File "stocks.py", line 9, in <module>     from keras.models import Sequential   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/__init__.py", line 3, in <module>     from . import utils   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/utils/__init__.py", line 6, in <module>     from . import conv_utils   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/utils/conv_utils.py", line 9, in <module>     from .. import backend as K   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/__init__.py", line 1, in <module>     from .load_backend import epsilon   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/load_backend.py", line 90, in <module>     from .tensorflow_backend import *   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 5, in <module>     import tensorflow as tf   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/__init__.py", line 41, in <module>     from tensorflow.python.tools import module_util as _module_util   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 64, in <module>     from tensorflow.python.framework.framework_lib import *  # pylint: disable=redefined-builtin   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/framework_lib.py", line 24, in <module>     from tensorflow.python.framework.device import DeviceSpec   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/device.py", line 24, in <module>     from tensorflow.python.framework import device_spec   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/device_spec.py", line 21, in <module>     from tensorflow.python.util.tf_export import tf_export   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_export.py", line 48, in <module>     from tensorflow.python.util import tf_decorator   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_decorator.py", line 64, in <module>     from tensorflow.python.util import tf_stack   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py", line 28, in <module>     from tensorflow.python import _tf_stack ImportError: /cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/_tf_stack.so: undefined symbol: PyThread_tss_set

Is this a known issue with TF 2.2.0?

Bhack · August 20, 2021, 4:12pm

Does it work with TF 2.6.0?

Robert_Kudyba · August 20, 2021, 5:38pm

When I run this directly on a node which has Python 3.6 and TF 2.6 yes I get expected results: Is there a way to get the earlier TF/Keras to work with this?

done A stocks Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm (LSTM)                  (None, 64)                16896 _________________________________________________________________ dense (Dense)                (None, 1)                 65 ================================================================= Total params: 16,961 Trainable params: 16,961 Non-trainable params: 0 _________________________________________________________________ model_manager: running tensorflow version: 2.6.0 model_manager: will attempt to run on /gpu:0 Epoch 1/100 7/7 - 36s - loss: 38939.2383 Epoch 2/100 7/7 - 17s - loss: 38939.2383 Epoch 3/100

Bhack · August 20, 2021, 6:58pm

I don’t know but generally we have a support Policy for older versions, and so patch releases, only for security bugs.
So I suggest you to use an updated version of TF.

Robert_Kudyba · August 20, 2021, 7:20pm

Even with 2.6 I see this error:

  Loading requirement: hpcx/2.4.0 2021-08-20 14:23:09.943253: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'li bcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/ shared/apps/openmpi/cuda/64/3.1.4/lib:/cm/shared/apps/hpcx/2.4.0/sharp/lib:/cm/shared/apps/hpcx/2.4.0/hcoll/lib:/cm/shared/app s/hpcx/2.4.0/ucx/lib:/cm/shared/apps/cudnn7.6-cuda10.2/7.6.5.32/lib64:/cm/shared/apps/cuda10.2/toolkit/10.2.89/targets/x86_64- linux/lib:/cm/shared/apps/cuda10.1/toolkit/10.1.243/extras/CUPTI/lib64:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/ cuda10.1/toolkit/10.1.243/targets/x86_64-linux/lib:/cm/local/apps/python3/lib:/cm/shared/apps/gcc5/5.5.0/lib64:/cm/shared/apps /gcc5/5.5.0/lib32:/cm/shared/apps/gcc5/5.5.0/lib:/cm/shared/apps/slurm/20.11.3/lib64/slurm:/cm/shared/apps/slurm/20.11.3/lib64 :/cm/local/apps/gcc/8.2.0/lib:/cm/local/apps/gcc/8.2.0/lib64:/cm/shared/apps/openmpi/gcc/64/1.10.7/lib64 2021-08-20 14:23:09.943288: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not hav e a GPU set up on your machine. 2021-08-20 14:24:41.582692: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: u nknown error 2021-08-20 14:24:41.582920: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node001 2021-08-20 14:24:41.582935: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001 2021-08-20 14:24:41.583068: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1 2021-08-20 14:24:41.583108: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1 2021-08-20 14:24:41.583115: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460. 73.1 2021-08-20 14:24:41.583609: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneA PI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512 F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-20 14:24:41.871823: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Pass es are enabled (registered 2) WARNING: Logging before flag parsing goes to stderr. W0820 14:24:42.032056 46912496384256 ag_logging.py:146] AutoGraph could not transform <function Model.make_train_function.<loc als>.train_function at 0x2aab736d7f28> and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY =10`) and attach the full output. Cause: 'arguments' object has no attribute 'posonlyargs' To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

Bhack · August 20, 2021, 7:25pm

It is a problem with your env setup as TF doesn’t find CUDA libraries in your system paths:

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'li bcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory ``

Robert_Kudyba · August 20, 2021, 7:48pm

Sorry I should’ve posted more of the logs. The CUDA diagnostic does appear to find CUDA. Just not the GPU.

2021-08-20 15:21:38.393015: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: u nknown error 2021-08-20 15:21:38.393070: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node001 2021-08-20 15:21:38.393081: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001 2021-08-20 15:21:38.393208: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1 2021-08-20 15:21:38.393248: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1 2021-08-20 15:21:38.393256: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460. 73.1 2021-08-20 15:29:06.834136: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneA PI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512 F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-20 15:29:07.343075: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Pass es are enabled (registered 2) WARNING: Logging before flag parsing goes to stderr. W0820 15:29:07.578475 46912496383040 ag_logging.py:146] AutoGraph could not transform <function Model.make_train_function.<loc als>.train_function at 0x2aab74dc1840> and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY =10`) and attach the full output. Cause: 'arguments' object has no attribute 'posonlyargs' To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert done A stocks Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm (LSTM)                  (None, 64)                16896 _________________________________________________________________ dense (Dense)                (None, 1)                 65

Is that error Cause: 'arguments' object has no attribute 'posonlyargs' just a re herring?

Bhack · August 20, 2021, 10:50pm

I see that CUDA has failed to initialize. Your environment is not in good shape.

We had many CUDA setup issues in the repo like:

github.com/tensorflow/tensorflow

"failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error" unless running with sudo

opened 04:18PM - 18 Sep 19 UTC

closed 01:16AM - 23 Sep 19 UTC

josei

stat:awaiting tensorflower type:build/install type:support comp:gpu TF 1.14

**System information** - OS Platform and Distribution (e.g., Linux Ubuntu 16.04…): ClearLinux 31030 - Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: - - TensorFlow installed from (source or binary): binary - TensorFlow version: tensorflow-gpu 1.14.0 - Python version: 3.7.4 - Installed using virtualenv? pip? conda?: pyenv's pip - Bazel version (if compiling from source): - - GCC/Compiler version (if compiling from source): - - CUDA/cuDNN version: cuda_10.0.130_410.48_linux, cudnn-10.0-linux-x64-v7.6.3.30 - GPU model and memory: Geforce RTX 2060, 6GB RAM **Describe the problem** The error "failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error" is thrown when initializing tensorflow-gpu, falling back to CPU instead of GPU. When running python with sudo, GPU is detected but libraries cannot be opened. A subsequent run without sudo works, enabling GPU being used. I don't understand why running with sudo is needed to enable future calls without sudo work. The specific output is: $ python -c "import tensorflow as tf; tf.Session(config=tf.ConfigProto(log_device_placement=True))" 2019-09-18 17:49:26.342297: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-09-18 17:49:26.364789: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz 2019-09-18 17:49:26.365673: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b4f7008c70 executing computations on platform Host. Devices: 2019-09-18 17:49:26.365685: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-09-18 17:49:26.387440: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-09-18 17:49:26.397897: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error 2019-09-18 17:49:26.397918: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: linux 2019-09-18 17:49:26.397923: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: linux 2019-09-18 17:49:26.397950: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.50.0 2019-09-18 17:49:26.397965: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.50.0 2019-09-18 17:49:26.397969: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.50.0 2019-09-18 17:49:26.399634: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device $ sudo python -c "import tensorflow as tf; tf.Session(config=tf.ConfigProto(log_device_placement=True))" 2019-09-18 17:49:33.476640: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-09-18 17:49:33.492804: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz 2019-09-18 17:49:33.493345: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b8b43539a0 executing computations on platform Host. Devices: 2019-09-18 17:49:33.493356: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-09-18 17:49:33.494037: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-09-18 17:49:33.525519: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:33.525827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.755 pciBusID: 0000:01:00.0 2019-09-18 17:49:33.525900: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.525942: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.525980: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526055: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526135: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-09-18 17:49:33.614070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-09-18 17:49:33.614090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-09-18 17:49:33.614095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-09-18 17:49:33.615437: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:33.615752: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b8b71df500 executing computations on platform CUDA. Devices: 2019-09-18 17:49:33.615761: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5 2019-09-18 17:49:33.616523: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device $ python -c "import tensorflow as tf; tf.Session(config=tf.ConfigProto(log_device_placement=True))" 2019-09-18 17:49:38.343247: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-09-18 17:49:38.359840: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz 2019-09-18 17:49:38.360790: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d522b77c70 executing computations on platform Host. Devices: 2019-09-18 17:49:38.360803: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-09-18 17:49:38.361478: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-09-18 17:49:38.377924: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.378224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.755 pciBusID: 0000:01:00.0 2019-09-18 17:49:38.382955: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-09-18 17:49:38.426461: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2019-09-18 17:49:38.452107: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2019-09-18 17:49:38.468904: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2019-09-18 17:49:38.517258: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2019-09-18 17:49:38.545852: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2019-09-18 17:49:38.660617: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-09-18 17:49:38.660684: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.661018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.661283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2019-09-18 17:49:38.661304: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-09-18 17:49:38.727136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-09-18 17:49:38.727157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-09-18 17:49:38.727164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-09-18 17:49:38.727262: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.727564: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.727848: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.728115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5451 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5) 2019-09-18 17:49:38.729363: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d525ddbe50 executing computations on platform CUDA. Devices: 2019-09-18 17:49:38.729373: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5 2019-09-18 17:49:38.730199: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5 /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5 /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device **Any other info / logs** CUDA and Nvidia drivers installed at /opt following [ClearLinux guide](https://docs.01.org/clearlinux/latest/tutorials/nvidia.html). $ echo $LD_LIBRARY_PATH /usr/local/cuda/lib64: $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 $ nvidia-smi Wed Sep 18 17:58:36 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A | | 0% 50C P8 9W / 170W | 152MiB / 5931MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 651 G /usr/bin/X 39MiB | | 0 789 G /usr/bin/gnome-shell 111MiB | +-----------------------------------------------------------------------------+

Robert_Kudyba · August 21, 2021, 12:56am

Well kind of. We use Bright Cluster with Slurm. So on our head node we use a “SBATCH” file (Slurm batch) that calls modules. TF 2.6 is not yet available in Bright’s packages. I used pip to install TF 2.6 on a node in Python 3. So now I exclude the call to the TF module in the SBATCH file and let Slurm auto-magically find that TF 2.6 I installed. It looks like we also needed CUDA 11 or greater. For now it’s running but without the GPU.

Topic		Replies	Views
New to Tensorflow and Keras - Cant get GPU to work General Discussion gpu	2	2059	October 25, 2023
Not able to run my code on gpu General Discussion gpu	1	328	January 22, 2024
Unicode decode error when trying to train model TensorFlow models , object-detection	5	877	October 30, 2023
You must feed a value for placeholder tensor 'gradients/.../split_dim' with dtype int32 General Discussion models , keras	7	5001	May 24, 2023
TF encountered strange errors when using GPU General Discussion gpu , tensorflow	1	69	May 19, 2024

SLURM errors: failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error; GPU:0 unknown device

Related topics