-
Notifications
You must be signed in to change notification settings - Fork 75.2k
Description
Hi,
OS and version: TF r0.11.0rc0, Linux 64bit, cudnn-7.5-v5.1
I am using TF-Slim and a FCN-style architecture based on ResNets. I experience extremely slow training times: 5-10x slower compared to an equivalent Caffe implementation.
I train fully-convolutionally, and my images are of arbitrary sizes and aspect ratios. The training code uses FIFOQueue and preloads data in a separate thread. I use batch_size=1 as all images are of different sizes.
If I feed dummy random numpy tensors of fixed size, it works very fast. I tried to generate input tensors of 10 predefined sizes and fed them sequentially, the first 10 iterations were slow, but then it speeds back up. Looks like it does some extra work for each input size. I only used Caffe before, and there it was possible to resize all tensors per batch efficiently, somehow.
Am I missing some simple trick or is it a bug?