Skip to content

Tensorflow 2.17 updates#181

Merged
lfoppiano merged 36 commits into
tensorflow-2.17from
tensorflow-2.17-updates
Feb 28, 2026
Merged

Tensorflow 2.17 updates#181
lfoppiano merged 36 commits into
tensorflow-2.17from
tensorflow-2.17-updates

Conversation

@lfoppiano

@lfoppiano lfoppiano commented May 28, 2025

Copy link
Copy Markdown
Collaborator

@kermitt2 I've got an issue when training with 2.17 (it seems that the TensorFlow docker image uses python 3.11, though).

I've got this issue when training even a simple model like date:

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1748463701.859042 1108825 service.cc:146] XLA service 0x7f12547d4870 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1748463701.859079 1108825 service.cc:154]   StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2025-05-28 22:21:41.865569: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2025-05-28 22:21:41.883818: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8906
I0000 00:00:1748463701.939038 1108825 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
10/10 [==============================] - ETA: 0s - loss: 5.3103 - crf_loss: 5.3103      f1 (micro): 85.80
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/netscratch/lfoppiano/delft/delft_tf2.17.1/delft/applications/grobidTagger.py", line 475, in <module>
    train(model, 
    ^^^^^^^^^^^^
  File "/netscratch/lfoppiano/delft/delft_tf2.17.1/delft/applications/grobidTagger.py", line 222, in train
    model.train(x_train, y_train, f_train, x_valid, y_valid, f_valid, incremental=incremental, multi_gpu=multi_gpu)
  File "/netscratch/lfoppiano/delft/delft_tf2.17.1/delft/sequenceLabelling/wrapper.py", line 165, in train
    self.train_(x_train, y_train, f_train, x_valid, y_valid, f_valid, incremental, callbacks)
  File "/netscratch/lfoppiano/delft/delft_tf2.17.1/delft/sequenceLabelling/wrapper.py", line 216, in train_
    trainer.train(x_train, y_train, x_valid, y_valid, features_train=f_train, features_valid=f_valid, callbacks=callbacks)
  File "/netscratch/lfoppiano/delft/delft_tf2.17.1/delft/sequenceLabelling/trainer.py", line 59, in train
    self.model = self.train_model(self.model, x_train, y_train, x_valid=x_valid, y_valid=y_valid,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/netscratch/lfoppiano/delft/delft_tf2.17.1/delft/sequenceLabelling/trainer.py", line 175, in train_model
    local_model.fit(training_generator,
  File "/usr/local/lib/python3.11/dist-packages/tf_keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/netscratch/lfoppiano/delft/delft_tf2.17.1/delft/sequenceLabelling/trainer.py", line 279, in on_epoch_end
    logs.update({"lr": self.model.optimizer._decayed_lr(tf.float32)})
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Adam' object has no attribute '_decayed_lr'
2025-05-28 22:21:54.133744: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
         [[{{node PyFunc}}]]

The fix in this branch solved the issue for me (the libraries updates are just minor)

@lfoppiano

Copy link
Copy Markdown
Collaborator Author

As a separate comment, I also have this occasionally, but I cannot figure out where they come from, this is not fatal, but might indicate some bugs:

2025-05-28 22:21:35.126118: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_FLOAT
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_BOOL
    }
  }
}

        for Tuple type infernce function 0
        while inferring type of node 'cond/output/_757'

@de-code

de-code commented Dec 19, 2025

Copy link
Copy Markdown
Contributor

As a separate comment, I also have this occasionally, but I cannot figure out where they come from, this is not fatal, but might indicate some bugs:

I came across that and with the help of AI, enabling eager mode helped:
eLifePathways/sciencebeam-trainer-delft@87a40f2

@lfoppiano

lfoppiano commented Feb 21, 2026

Copy link
Copy Markdown
Collaborator Author

@de-code It works. Thanks 😄

Question: why did you remove the different allocation of loss function?

EDIT: actually I did not manage to get pass the second epoch... 😭

@lfoppiano lfoppiano force-pushed the tensorflow-2.17-updates branch 3 times, most recently from 5148b00 to a212306 Compare February 21, 2026 20:30
@lfoppiano lfoppiano force-pushed the tensorflow-2.17-updates branch from 86686e2 to 2b3aa3d Compare February 22, 2026 07:35
@lfoppiano lfoppiano merged commit eb57a80 into tensorflow-2.17 Feb 28, 2026
2 checks passed
@lfoppiano lfoppiano deleted the tensorflow-2.17-updates branch February 28, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants