Skip to content

Conversation

@sguada
Copy link
Contributor

@sguada sguada commented Jun 30, 2014

This is a tentative approach to add transformation layers that would allow to crop_mirror, center_scale the blobs.

The first use would be to help Data source and pre-processing separation. The source would produce a vector of blobs, one per image, the transformation layers and finally concat all the blobs into one.

The concept is similar to do map(transformation, bottom_blobs, top_blobs)

@Yangqing What do you think?

@jeffdonahue
Copy link
Contributor

Note that this would have substantial speed and memory overhead vs. the current implementation as all of this stuff is currently done on the next batch by the prefetch thread while the current batch is being run through the net; this moves it to the forward pass. Unless I'm misunderstanding the intent here.

@shelhamer
Copy link
Member

In the past the idea came up of a PREFETCH phase so that we could specify
the data processing and have it run before forward to avoid overheads like
these. Or perhaps we need a "stage" instead of just phase since these
options differ at train and test.

At any rate, I agree that whatever the design it needs to not incur a bunch
of memory and speed costs.

Le lundi 30 juin 2014, Jeff Donahue notifications@github.com a écrit :

Note that this would have substantial speed and memory overhead vs. the
current implementation as all of this stuff is currently done on the next
batch by the prefetch thread while the current batch is being run through
the net; this moves it to the forward pass. Unless I'm misunderstanding the
intent here.


Reply to this email directly or view it on GitHub
#569 (comment).

Evan Shelhamer

@sguada
Copy link
Contributor Author

sguada commented Jun 30, 2014

I was planning on keeping the prefetch in the next batch within the data
layers.

The transform layers don't need to know about that. The data layer will
have set of internal transform layers that would run in the prefetch thread
in the next batch while the rest of the net process the current batch.

I was also thinking in adding multiple threads to process the blobs in
parallel within the transform layers.

We could discuss the architecture later in person.

On Monday, June 30, 2014, Evan Shelhamer notifications@github.com wrote:

In the past the idea came up of a PREFETCH phase so that we could specify
the data processing and have it run before forward to avoid overheads like
these. Or perhaps we need a "stage" instead of just phase since these
options differ at train and test.

At any rate, I agree that whatever the design it needs to not incur a
bunch
of memory and speed costs.

Le lundi 30 juin 2014, Jeff Donahue <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> a écrit :

Note that this would have substantial speed and memory overhead vs. the
current implementation as all of this stuff is currently done on the
next
batch by the prefetch thread while the current batch is being run
through
the net; this moves it to the forward pass. Unless I'm misunderstanding
the
intent here.


Reply to this email directly or view it on GitHub
#569 (comment).

Evan Shelhamer


Reply to this email directly or view it on GitHub
#569 (comment).

Sergio

@jeffdonahue
Copy link
Contributor

Even if these layers are run inside the prefetch thread there's still a speed and memory cost (at least for most of the transformations). For example, when cropping is enabled, we never store the entire uncropped image in a blob, which would be a huge cost in both speed and memory if you're training on small patches of the input image. Since mirroring doesn't change the size, it could be done "in place" to use almost no additional memory, but you still have the speed cost of copying the un-mirrored image in the first place.

@sguada
Copy link
Contributor Author

sguada commented Jun 30, 2014

There will be some memory overhead if we transform all images at once. But we could apply all the transformations one image at the time, and iterate to avoid extra memory.

I think this approach should be faster if we can process several images in parallel with different threads.

Although we could add a getCropMirror_fromdatum for the case of small patches. Or we could store Blobs in LevelDB instead of datum and save one memory copy.

@shelhamer
Copy link
Member

We could pull the current method into a base data layer that still does
everything as-is then inherit, no? If we align the interfaces that
is. There'd be a certain amount of redundant code, but much less.

Let's talk this afternoon in person like @sguada suggested.

Le lundi 30 juin 2014, Sergio Guadarrama notifications@github.com a
écrit :

There will be some memory overhead if we transform all images at once. But
we could apply all the transformations one image at the time, and iterate
to avoid extra memory.

I think this approach should be faster if we can process several images in
parallel with different threads.


Reply to this email directly or view it on GitHub
#569 (comment).

@sguada
Copy link
Contributor Author

sguada commented Jun 30, 2014

Probably we can abstract most of the common things, but that will depend if
we assume the input data is datum or not. I don't see an easy way to do the
same for datum, images read from files, cv:Mat and HDF5 data.

I think that in the long run having the ability to do different
pre-processing steps will pay off. For instance if we want to add color
jittering or have videos as inputs.

Sergio

2014-06-30 10:56 GMT-07:00 Evan Shelhamer notifications@github.com:

We could pull the current method into a base data layer that still does
everything as-is then inherit, no? If we align the interfaces that
is. There'd be a certain amount of redundant code, but much less.

Let's talk this afternoon in person like @sguada suggested.

Le lundi 30 juin 2014, Sergio Guadarrama notifications@github.com a
écrit :

There will be some memory overhead if we transform all images at once.
But
we could apply all the transformations one image at the time, and
iterate
to avoid extra memory.

I think this approach should be faster if we can process several images
in
parallel with different threads.


Reply to this email directly or view it on GitHub
#569 (comment).


Reply to this email directly or view it on GitHub
#569 (comment).

@shelhamer
Copy link
Member

Agreed–the point of this re-design is to make everything more configurable
and a matter of prototxt instead of code and a worthy change. Any
refactoring for simplicity is only a side goal.

but that will depend if we assume the input data is datum or not.

Maybe it's best if we turn everything into blob, and if we pay a per-batch
memory hit so be it.

On Mon, Jun 30, 2014 at 11:20 AM, Sergio Guadarrama <
notifications@github.com> wrote:

Probably we can abstract most of the common things, but that will depend
if
we assume the input data is datum or not. I don't see an easy way to do
the
same for datum, images read from files, cv:Mat and HDF5 data.

I think that in the long run having the ability to do different
pre-processing steps will pay off. For instance if we want to add color
jittering or have videos as inputs.

Sergio

2014-06-30 10:56 GMT-07:00 Evan Shelhamer notifications@github.com:

We could pull the current method into a base data layer that still does
everything as-is then inherit, no? If we align the interfaces that
is. There'd be a certain amount of redundant code, but much less.

Let's talk this afternoon in person like @sguada suggested.

Le lundi 30 juin 2014, Sergio Guadarrama notifications@github.com a
écrit :

There will be some memory overhead if we transform all images at once.
But
we could apply all the transformations one image at the time, and
iterate
to avoid extra memory.

I think this approach should be faster if we can process several
images
in
parallel with different threads.


Reply to this email directly or view it on GitHub
#569 (comment).


Reply to this email directly or view it on GitHub
#569 (comment).


Reply to this email directly or view it on GitHub
#569 (comment).

@kloudkl
Copy link
Contributor

kloudkl commented Jul 1, 2014

Back in #244, I tried many APIs to unify the preprocessing steps for different data formats but gave up.

The new design would possibly consist of data IO including prefetching, data format conversions and finally data content transformations. The data IO takes care of the various data sources such as leveldb/lmdb, HDF5, images on the disk and from memory. To avoid replicating the preprocessing codes for each format, the raw data should be converted into Blob.

Although the layers have been a very important part of Caffe. They have unified method interfaces and we are all accustomed to wrapping many things into them. But certainly not everything needs to be a layer. The conversions and transformations are better put in simpler classes to allow for in-place data manipulations.

template <typename Dtype>
class BaseInPlaceOperation {
    virtual void apply(Blob<Dtype>* data) = 0;
}
template <typename Dtype>
class Crop : public BaseInPlaceOperation {
   ...
}

I have used boost::thread to parallelize the performance critical parts in a few projects. It is cross platform and has much more flexible API than pthread. Multiple threads for a common task can be effectively managed in thread_group. We should definitely switch to it to be future-proof.

@bhack
Copy link
Contributor

bhack commented Jul 1, 2014

One of the advantage of boost threads is that will be very easy to switch to c++11 threads when cuda version will let us to switch to a modern version of gcc.
Will we need optimized opencv operator when we want to handle images transformation? On one side we need to think at data transform other than image on the other one could be useful to experiment doing transformation for augmenting dataset/synth dataset generator ( http://bouthilx.wordpress.com/tag/sampling/) and probably some kind of "infinite" transformation renewing a little part of the training set every n iteration with or without a logic controlling loss.

@bhack
Copy link
Contributor

bhack commented Sep 20, 2014

@sguada Can we discuss here if this layer could be compatible and useful also for data augmentation (light changes, elastic distortion, affine transformation, Gaussian noise, motion blur etc.). I think that some operation (when we handle image data) probably are easier and more optimized to do in opencv but at this level we are handling blob so I don't know how we could introduce pluggable augmentation operators.

@sguada
Copy link
Contributor Author

sguada commented Sep 20, 2014

@bhack maybe doing data augmentation would be easier within Transform_Data. Within #1070 during the transformation of cv::Mat to Blob we could use any opencv routine.
Although having Transform Layers could be useful in some cases, it wouldn't be very efficient to change Blob back to cv::Mat for data augmentation.

@jeffdonahue
Copy link
Contributor

Closing as abandoned. We agree with the motivation (data transformers becoming layers of their own rather than applied by each data layer) but this needs a bit more thought (prefetching etc., see discussion above) and a rebase.

@whjxnyzh123
Copy link

@sguada are there RGB jittering in caffe?Thank you

@sguada
Copy link
Contributor Author

sguada commented Jun 1, 2015

I don't think so, but you could implement them easily using other layers.

Sergio

2015-06-01 0:30 GMT-07:00 whjxnyzh notifications@github.com:

@sguada https://github.com/sguada are there RGB jittering in
caffe?Thank you


Reply to this email directly or view it on GitHub
#569 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants