SLURM errors: failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error; GPU:0 unknown device

We have a SLURM batch file that fails with TF2 and Keras, and also fails when called directly on a node that has a GPU. Here is the Python script contents:

from datetime import date import numpy as np import matplotlib.pyplot as plt  import pandas as pd from sklearn.decomposition import PCA from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense, SimpleRNN from keras.optimizers import adam from keras.layers import Dropout from tensorflow.keras.callbacks import Callback, EarlyStopping from sklearn.preprocessing import StandardScaler from datetime import datetime, timedelta from sklearn.metrics import r2_score, mean_squared_error, accuracy_score from keras.layers.core import Dense, Dropout, Activation from keras.layers.recurrent import LSTM from keras.models import load_model from keras.callbacks import EarlyStopping, ModelCheckpoint import warnings import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = "3" warnings.filterwarnings('ignore') import tensorflow as tf import logging logging.getLogger('tesorflow').setLevel(logging.FATAL) delay = 252 window = 60 factor = 15 K = 8.4 sbo = 1.25 sso = 1.25 sbc = 0.75 ssc = 0.5 r = 0.02 tran_cost = 0.0002 leverage = 1.0 start_val = 100 bo = 1 so = -1 X_pd=pd.read_pickle('./data/X_pd.pkl') X = pd.DataFrame(columns=range(0, window)) Y = [] for tag in X_pd.columns[:1]:     # i=0 ....len(X_pd.index)-window     for i in range(0, len(X_pd.index) - window):         X_example = X_pd.loc[i:i + window - 1][tag].values          X= X.append(pd.Series(X_example), ignore_index=True)         Y.append(X_pd.loc[i + window][tag])     print('done %s stocks' % (tag)) Y=pd.DataFrame(Y) #normalization SS = StandardScaler() features = SS.fit_transform(X.values) X=features X=pd.DataFrame(X) #LSTM model def trainLSTMModel(layers, neurons, d):     model = Sequential()      model.add(LSTM(neurons[0], input_shape=(layers[1], layers[2]), return_sequences=False,activation='relu'))     #model.add(Dropout(d))      #model.add(LSTM(neurons[1], input_shape=(layers[1], layers[2]), return_sequences=False))     #model.add(Dropout(d))      #model.add(Dense(neurons[2], kernel_initializer="uniform", activation='relu'))     model.add(Dense(neurons[3], kernel_initializer="uniform", activation='relu'))     optimizer=adam(learning_rate=0.001)     #adam = Adam(decay=0.2)     # predict up and down     # model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])     model.compile(loss='mse', optimizer=optimizer)     model.summary()     return model length=X.shape[0] X=np.array(X) Y=np.array(Y) time_step = 60 d = 0.3 output=1 shape = [length,time_step, output] # feature, window, output neurons = [64, 64, 32, 1] epochs = 100 batch_size=10000 model = trainLSTMModel(shape, neurons, d) #shape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) gpu_no = 0 with tf.device('/gpu:' + str(gpu_no)): #    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) #    keras.backend.set_session(sess)      print('model_manager: running tensorflow version: ' + tf.__version__)     print('model_manager: will attempt to run on ' + '/gpu:' + str(gpu_no))     model.fit(X, Y, epochs=epochs, verbose=2,batch_size=batch_size) 

The log shows this:

Loading requirement: cuda10.1/toolkit/10.1.243 Loading cm-ml-python3deps/3.3.0   Loading requirement: gcc5/5.5.0 python36 Loading tensorflow2-py36-cuda10.1-gcc/2.0.0   Loading requirement: ml-pythondeps-py36-cuda10.1-gcc/3.3.0     openblas/dynamic/0.2.20 hdf5_18/1.8.20 keras-py36-cuda10.1-gcc/2.3.1     protobuf3-gcc/3.8.0 nccl2-cuda10.1-gcc/2.7.8 Loading openmpi/cuda/64/3.1.4   Loading requirement: hpcx/2.4.0 2021-08-18 11:11:43.064175: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1  2021-08-18 11:18:08.026219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-08-18 11:18:08.031771: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error 2021-08-18 11:18:08.031811: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node001 2021-08-18 11:18:08.031819: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001 2021-08-18 11:18:08.031921: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1 2021-08-18 11:18:08.031958: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1 2021-08-18 11:18:08.031966: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.73.1 2021-08-18 11:18:08.032266: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F Using TensorFlow backend. done A stocks Model: "sequential_1" _________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm_1 (LSTM)                (None, 64)                16896 _________________________________________________________________ dense_1 (Dense)              (None, 1)                 65 ================================================================= Total params: 16,961 Trainable params: 16,961 Non-trainable params: 0 _________________________________________________________________ model_manager: running tensorflow version: 2.0.0 model_manager: will attempt to run on /gpu:0 Traceback (most recent call last):   File "stocks.py", line 99, in <module>     model.fit(X, Y, epochs=epochs, verbose=2,batch_size=batch_size)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/engine/training.py", line 1213, in fit     self._make_train_function()   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/engine/training.py", line 316, in _make_train_function     loss=self.total_loss)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper     return func(*args, **kwargs)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 75, in symbolic_fn_wrapper     return func(*args, **kwargs)   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/optimizers.py", line 519, in get_updates     for (i, p) in enumerate(params)]   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/optimizers.py", line 519, in <listcomp>     for (i, p) in enumerate(params)]   File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 963, in zeros     v = tf.zeros(shape=shape, dtype=dtype, name=name)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 2349, in zeros     output = _constant_if_small(zero, shape, dtype, name)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 2307, in _constant_if_small     return constant(value, shape=shape, dtype=dtype, name=name)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant     allow_broadcast=True)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 235, in _constant_impl     t = convert_to_eager_tensor(value, ctx, dtype)   File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor     return ops.EagerTensor(value, ctx.device_name, dtype) RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device. 

Why is the script not seeing the GPU?

Can you try to just list the visibile devices?

Part of the problem was the code requires TF > 2.0.

The only difference I see is that the user told me he got it to work by adjusting the comment tags as such:

#sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) #keras.backend.set_session(sess) 

Now the GPU works.

I also changed:
from keras.optimizers import adam
to
from keras.optimizers import adam_v2
and
optimizer=adam(learning_rate=0.001)
to
optimizer=adam_v2.Adam(learning_rate=0.001)

Before this the logfile blew up to 6 GB with entries like:

2021-08-19 05:08:41.796216: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796223: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796232: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node _SINK}} = NoOp[]() 2021-08-19 05:08:41.796238: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796245: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel. 2021-08-19 05:08:41.796255: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod}} = FloorMod[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size) 2021-08-19 05:08:41.796283: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add}} = AddV2[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](loss/dense_1_loss/mean_squared_error/Mean/reduction_indices, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size) 2021-08-19 05:08:41.796303: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size}} = Const[_class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"], dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 2>]() 2021-08-19 05:08:41.796319: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node loss/dense_1_loss/mean_squared_error/Mean/reduction_indices}} = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: -1>]() 2021-08-19 05:08:41.796335: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node _send_training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod_0}} = _Send[T=DT_INT32, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-6529568560417163830, tensor_name="training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod:0"](training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod) 2021-08-19 05:08:41.796357: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0 2021-08-19 05:08:41.796368: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 4 step -1 {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size}} = Const[_class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"], dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 2>]() device: /device:CPU:0 2021-08-19 05:08:41.796378: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 5 step -1 {{node loss/dense_1_loss/mean_squared_error/Mean/reduction_indices}} = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: -1>]() device: /device:CPU:0 2021-08-19 05:08:41.796390: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 3 step -1 {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add}} = AddV2[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](loss/dense_1_loss/mean_squared_error/Mean/reduction_indices, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size) device: /device:CPU:0 

Anyways seems to be good now perhaps this will help someone down the line.

Well in Slurm this still fails

Loading cudnn7.6-cuda10.1/7.6.5.32   Loading requirement: cuda10.1/toolkit/10.1.243 Loading cm-ml-python3deps/3.3.0   Loading requirement: gcc5/5.5.0 python36 Loading tensorflow2-py37-cuda10.1-gcc/2.2.0   Loading requirement: python37 ml-pythondeps-py37-cuda10.1-gcc/4.1.2     openblas/dynamic/0.2.20 hdf5_18/1.8.20 keras-py37-cuda10.1-gcc/2.3.1     protobuf3-gcc/3.8.0 nccl2-cuda10.1-gcc/2.7.8 Loading openmpi/cuda/64/3.1.4   Loading requirement: hpcx/2.4.0 2021-08-20 10:36:18.057370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 Using TensorFlow backend. Traceback (most recent call last):   File "stocks.py", line 9, in <module>     from keras.models import Sequential   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/__init__.py", line 3, in <module>     from . import utils   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/utils/__init__.py", line 6, in <module>     from . import conv_utils   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/utils/conv_utils.py", line 9, in <module>     from .. import backend as K   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/__init__.py", line 1, in <module>     from .load_backend import epsilon   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/load_backend.py", line 90, in <module>     from .tensorflow_backend import *   File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 5, in <module>     import tensorflow as tf   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/__init__.py", line 41, in <module>     from tensorflow.python.tools import module_util as _module_util   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 64, in <module>     from tensorflow.python.framework.framework_lib import *  # pylint: disable=redefined-builtin   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/framework_lib.py", line 24, in <module>     from tensorflow.python.framework.device import DeviceSpec   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/device.py", line 24, in <module>     from tensorflow.python.framework import device_spec   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/device_spec.py", line 21, in <module>     from tensorflow.python.util.tf_export import tf_export   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_export.py", line 48, in <module>     from tensorflow.python.util import tf_decorator   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_decorator.py", line 64, in <module>     from tensorflow.python.util import tf_stack   File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py", line 28, in <module>     from tensorflow.python import _tf_stack ImportError: /cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/_tf_stack.so: undefined symbol: PyThread_tss_set 

Is this a known issue with TF 2.2.0?

Does it work with TF 2.6.0?

When I run this directly on a node which has Python 3.6 and TF 2.6 yes I get expected results: Is there a way to get the earlier TF/Keras to work with this?

done A stocks Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm (LSTM)                  (None, 64)                16896 _________________________________________________________________ dense (Dense)                (None, 1)                 65 ================================================================= Total params: 16,961 Trainable params: 16,961 Non-trainable params: 0 _________________________________________________________________ model_manager: running tensorflow version: 2.6.0 model_manager: will attempt to run on /gpu:0 Epoch 1/100 7/7 - 36s - loss: 38939.2383 Epoch 2/100 7/7 - 17s - loss: 38939.2383 Epoch 3/100 

I don’t know but generally we have a support Policy for older versions, and so patch releases, only for security bugs.
So I suggest you to use an updated version of TF.

Even with 2.6 I see this error:

  Loading requirement: hpcx/2.4.0 2021-08-20 14:23:09.943253: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'li bcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/ shared/apps/openmpi/cuda/64/3.1.4/lib:/cm/shared/apps/hpcx/2.4.0/sharp/lib:/cm/shared/apps/hpcx/2.4.0/hcoll/lib:/cm/shared/app s/hpcx/2.4.0/ucx/lib:/cm/shared/apps/cudnn7.6-cuda10.2/7.6.5.32/lib64:/cm/shared/apps/cuda10.2/toolkit/10.2.89/targets/x86_64- linux/lib:/cm/shared/apps/cuda10.1/toolkit/10.1.243/extras/CUPTI/lib64:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/ cuda10.1/toolkit/10.1.243/targets/x86_64-linux/lib:/cm/local/apps/python3/lib:/cm/shared/apps/gcc5/5.5.0/lib64:/cm/shared/apps /gcc5/5.5.0/lib32:/cm/shared/apps/gcc5/5.5.0/lib:/cm/shared/apps/slurm/20.11.3/lib64/slurm:/cm/shared/apps/slurm/20.11.3/lib64 :/cm/local/apps/gcc/8.2.0/lib:/cm/local/apps/gcc/8.2.0/lib64:/cm/shared/apps/openmpi/gcc/64/1.10.7/lib64 2021-08-20 14:23:09.943288: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not hav e a GPU set up on your machine. 2021-08-20 14:24:41.582692: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: u nknown error 2021-08-20 14:24:41.582920: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node001 2021-08-20 14:24:41.582935: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001 2021-08-20 14:24:41.583068: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1 2021-08-20 14:24:41.583108: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1 2021-08-20 14:24:41.583115: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460. 73.1 2021-08-20 14:24:41.583609: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneA PI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512 F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-20 14:24:41.871823: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Pass es are enabled (registered 2) WARNING: Logging before flag parsing goes to stderr. W0820 14:24:42.032056 46912496384256 ag_logging.py:146] AutoGraph could not transform <function Model.make_train_function.<loc als>.train_function at 0x2aab736d7f28> and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY =10`) and attach the full output. Cause: 'arguments' object has no attribute 'posonlyargs' To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert 

It is a problem with your env setup as TF doesn’t find CUDA libraries in your system paths:

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'li bcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory ``

Sorry I should’ve posted more of the logs. The CUDA diagnostic does appear to find CUDA. Just not the GPU.

2021-08-20 15:21:38.393015: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: u nknown error 2021-08-20 15:21:38.393070: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node001 2021-08-20 15:21:38.393081: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001 2021-08-20 15:21:38.393208: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1 2021-08-20 15:21:38.393248: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1 2021-08-20 15:21:38.393256: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460. 73.1 2021-08-20 15:29:06.834136: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneA PI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512 F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-20 15:29:07.343075: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Pass es are enabled (registered 2) WARNING: Logging before flag parsing goes to stderr. W0820 15:29:07.578475 46912496383040 ag_logging.py:146] AutoGraph could not transform <function Model.make_train_function.<loc als>.train_function at 0x2aab74dc1840> and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY =10`) and attach the full output. Cause: 'arguments' object has no attribute 'posonlyargs' To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert done A stocks Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm (LSTM)                  (None, 64)                16896 _________________________________________________________________ dense (Dense)                (None, 1)                 65 

Is that error Cause: 'arguments' object has no attribute 'posonlyargs' just a re herring?

I see that CUDA has failed to initialize. Your environment is not in good shape.

We had many CUDA setup issues in the repo like:

Well kind of. We use Bright Cluster with Slurm. So on our head node we use a “SBATCH” file (Slurm batch) that calls modules. TF 2.6 is not yet available in Bright’s packages. I used pip to install TF 2.6 on a node in Python 3. So now I exclude the call to the TF module in the SBATCH file and let Slurm auto-magically find that TF 2.6 I installed. It looks like we also needed CUDA 11 or greater. For now it’s running but without the GPU.

1 Like