Skip to content

Fix Tensorflow 2.17 training #191

Merged
lfoppiano merged 30 commits into
tensorflow-2.17-updatesfrom
tensorflow-2.17-training
Feb 28, 2026
Merged

Fix Tensorflow 2.17 training #191
lfoppiano merged 30 commits into
tensorflow-2.17-updatesfrom
tensorflow-2.17-training

Conversation

@lfoppiano

@lfoppiano lfoppiano commented Feb 23, 2026

Copy link
Copy Markdown
Collaborator

This incremental PR fixes the issues I with the training:

  • configuration of nb_workers, it seemed causing problems with LInux and GPUs
  • Breaking change: removed pickled arrays in the LMDB storage
  • Fixed the LMDB access using multiple workers
  • Added scripts for running training and training/evaluation on a SLURM cluster
  • Fix issue with tensorflow and pytorch being built on different cuda versions. Now everything should work with cuda 12
  • Fixed classification classes and completed migration to tensorflow 2.17

Signed-off-by: Luca Foppiano <luca@foppiano.org>
Signed-off-by: Luca Foppiano <luca@foppiano.org>
Signed-off-by: Luca Foppiano <luca@foppiano.org>
Signed-off-by: Luca Foppiano <luca@foppiano.org>
Signed-off-by: Luca Foppiano <luca@foppiano.org>
@lfoppiano lfoppiano merged commit e223d1d into tensorflow-2.17-updates Feb 28, 2026
2 checks passed
@lfoppiano lfoppiano deleted the tensorflow-2.17-training branch February 28, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant