All-in-one nets #734

jeffdonahue · 2014-07-19T00:14:06Z

This PR lets you specify a phase, stage, and/or level in each of your layers to indicate whether or not these layers should be available. The most obvious application of this is to insert "phase: TRAIN" or "phase: TEST" to special layers that are only used in one or the other so that you need not repeat the common layers; e.g., a data layer which should have a different source at train vs test time, and the accuracy layer which can only be used in the test net (although #686 will fix this whenever someone reviews that ;).

The level parameter, an int, allows you to turn off different layers all at once (all those that have level < the net's setting), e.g. to do layerwise training. The stage parameter, suggested by @shelhamer, (which generalizes the level parameter in a sense), a string, allows you to create arbitrary groups of layers. I'm not sure if I made thee best decisions with this interface design, feel free to make suggestions.

The train_net, test_net, train_net_param, and test_net_param should all work as before.

jeffdonahue · 2014-07-19T00:26:37Z

Whoops, I screwed up by overwriting deploy prototxts (at least mnist.prototxt, which I combined mnist_{train,test}.prototxt into) -- will fix that later. Which reminds me, I haven't come up with a good solution for deploy prototxts (which is why they weren't also merged with the train/test nets). So this can be considered WIP.

shelhamer · 2014-07-20T02:19:16Z

Oh sweet! Sorry I dithered on this for so long. I'll review soon and think
about deploy nets.

One could add optional input fields for phase/level/stage.

Le samedi 19 juillet 2014, Jeff Donahue notifications@github.com a écrit :

This PR lets you specify a phase, stage, and/or level in each of your
layers to indicate whether or not these layers should be available. The
most obvious application of this is to insert "phase: TRAIN" or "phase:
TEST" to special layers that are only used in one or the other so that you
need not repeat the common layers; e.g., a data layer which should have a
different source at train vs test time, and the accuracy layer which can
only be used in the test net (although #686
#686 will fix this whenever someone
reviews that ;).

The level parameter, an int, allows you to turn off different layers all
at once (all those that have level < the net's setting), e.g. to do
layerwise training. The stage parameter, suggested by @shelhamer
https://github.com/shelhamer, (which generalizes the level parameter in
a sense), a string, allows you to create arbitrary groups of layers. I'm
not sure if I made thee best decisions with this interface design, feel
free to make suggestions.

The train_net, test_net, train_net_param, and test_net_param should all

work as before.

You can merge this Pull Request by running

git pull https://github.com/jeffdonahue/caffe all-in-one-net

Or view, comment on, or merge it at:

#734
Commit Summary

Add phase, level, and stage to LayerParameter to define when each
layer

Incorporate net, net_param, level, stage into Solver/Net.

Use unified train/test nets in examples.

File Changes

M examples/cifar10/cifar10_full.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-0 (78)

M examples/cifar10/cifar10_full_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-1 (6)

M examples/cifar10/cifar10_full_solver_lr1.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-2 (6)

M examples/cifar10/cifar10_full_solver_lr2.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-3 (6)

D examples/cifar10/cifar10_full_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-4 (182)

D examples/cifar10/cifar10_full_train.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-5 (174)

M examples/cifar10/cifar10_quick.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-6 (81)

M examples/cifar10/cifar10_quick_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-7 (6)

M examples/cifar10/cifar10_quick_solver_lr1.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-8 (6)

D examples/cifar10/cifar10_quick_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-9 (176)

D examples/cifar10/cifar10_quick_train.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-10 (168)

R examples/imagenet/alexnet.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-11 (23)

M examples/imagenet/alexnet_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-12 (3)

R examples/imagenet/imagenet.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-13 (23)

M examples/imagenet/imagenet_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-14 (3)

D examples/imagenet/imagenet_val.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-15 (228)

M examples/mnist/lenet.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-16 (44)

M examples/mnist/lenet_consolidated_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-17 (257)

M examples/mnist/lenet_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-18 (6)

D examples/mnist/lenet_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-19 (118)

D examples/mnist/lenet_train.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-20 (118)

R examples/mnist/mnist_autoencoder.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-21 (28)

M examples/mnist/mnist_autoencoder_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-22 (3)

D examples/mnist/mnist_autoencoder_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-23 (146)

M include/caffe/net.hpp
https://github.com/BVLC/caffe/pull/734/files#diff-24 (12)

M include/caffe/solver.hpp
https://github.com/BVLC/caffe/pull/734/files#diff-25 (2)

M src/caffe/net.cpp
https://github.com/BVLC/caffe/pull/734/files#diff-26 (74)

M src/caffe/proto/caffe.proto
https://github.com/BVLC/caffe/pull/734/files#diff-27 (82)

M src/caffe/solver.cpp
https://github.com/BVLC/caffe/pull/734/files#diff-28 (139)

Patch Links:

https://github.com/BVLC/caffe/pull/734.patch

https://github.com/BVLC/caffe/pull/734.diff

—
Reply to this email directly or view it on GitHub
#734.

shelhamer · 2014-07-20T02:20:40Z

Solves #57 and completes the model definition improvements for 1.0

Le samedi 19 juillet 2014, Jeff Donahue notifications@github.com a écrit :

This PR lets you specify a phase, stage, and/or level in each of your
layers to indicate whether or not these layers should be available. The
most obvious application of this is to insert "phase: TRAIN" or "phase:
TEST" to special layers that are only used in one or the other so that you
need not repeat the common layers; e.g., a data layer which should have a
different source at train vs test time, and the accuracy layer which can
only be used in the test net (although #686
#686 will fix this whenever someone
reviews that ;).

The level parameter, an int, allows you to turn off different layers all
at once (all those that have level < the net's setting), e.g. to do
layerwise training. The stage parameter, suggested by @shelhamer
https://github.com/shelhamer, (which generalizes the level parameter in
a sense), a string, allows you to create arbitrary groups of layers. I'm
not sure if I made thee best decisions with this interface design, feel
free to make suggestions.

The train_net, test_net, train_net_param, and test_net_param should all

work as before.

You can merge this Pull Request by running

git pull https://github.com/jeffdonahue/caffe all-in-one-net

Or view, comment on, or merge it at:

#734
Commit Summary

Add phase, level, and stage to LayerParameter to define when each
layer

Incorporate net, net_param, level, stage into Solver/Net.

Use unified train/test nets in examples.

File Changes

M examples/cifar10/cifar10_full.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-0 (78)

M examples/cifar10/cifar10_full_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-1 (6)

M examples/cifar10/cifar10_full_solver_lr1.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-2 (6)

M examples/cifar10/cifar10_full_solver_lr2.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-3 (6)

D examples/cifar10/cifar10_full_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-4 (182)

D examples/cifar10/cifar10_full_train.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-5 (174)

M examples/cifar10/cifar10_quick.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-6 (81)

M examples/cifar10/cifar10_quick_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-7 (6)

M examples/cifar10/cifar10_quick_solver_lr1.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-8 (6)

D examples/cifar10/cifar10_quick_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-9 (176)

D examples/cifar10/cifar10_quick_train.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-10 (168)

R examples/imagenet/alexnet.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-11 (23)

M examples/imagenet/alexnet_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-12 (3)

R examples/imagenet/imagenet.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-13 (23)

M examples/imagenet/imagenet_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-14 (3)

D examples/imagenet/imagenet_val.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-15 (228)

M examples/mnist/lenet.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-16 (44)

M examples/mnist/lenet_consolidated_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-17 (257)

M examples/mnist/lenet_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-18 (6)

D examples/mnist/lenet_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-19 (118)

D examples/mnist/lenet_train.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-20 (118)

R examples/mnist/mnist_autoencoder.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-21 (28)

M examples/mnist/mnist_autoencoder_solver.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-22 (3)

D examples/mnist/mnist_autoencoder_test.prototxt
https://github.com/BVLC/caffe/pull/734/files#diff-23 (146)

M include/caffe/net.hpp
https://github.com/BVLC/caffe/pull/734/files#diff-24 (12)

M include/caffe/solver.hpp
https://github.com/BVLC/caffe/pull/734/files#diff-25 (2)

M src/caffe/net.cpp
https://github.com/BVLC/caffe/pull/734/files#diff-26 (74)

M src/caffe/proto/caffe.proto
https://github.com/BVLC/caffe/pull/734/files#diff-27 (82)

M src/caffe/solver.cpp
https://github.com/BVLC/caffe/pull/734/files#diff-28 (139)

Patch Links:

https://github.com/BVLC/caffe/pull/734.patch

https://github.com/BVLC/caffe/pull/734.diff

—
Reply to this email directly or view it on GitHub
#734.

jeffdonahue · 2014-07-21T02:04:36Z

In the last four commits, I refactored this to make it somewhat more flexible. I created new proto messages NetState and NetStateRule. NetState contains a phase, level, and any number of stages, and NetStateRule has parallel fields (phase, min_level, max_level, stages) specifying rules on NetState's fields. Each NetStateRule can be thought of as a "conjunction" or logical AND -- i.e. ALL the rules must hold for the NetState for the rule to pass. For disjunctions (logical OR), one should specify multiple NetStateRules, which you can do because each layer now has a repeated NetStateRule enable and repeated NetStateRule disable. In any particular layer, you can specify only one of enable or disable. If neither are specified, the layer is always enabled. If one or more enable rules are specified, the layer is disabled by default and enabled only if the NetState meets one or more of the rules. If one or more disable rules are specified, the layer is enabled by default and disabled only if the NetState meets one or more of the rules.

To handle deploy nets, I also added a new INPUT layer type which should act exactly like an actual net input*, and an extra NetState field bool solver. The default value of solver is false, but the Solver itself will pass a NetState with solver = true. This way, outside code using the Net class in deployment environments doesn't need to be altered to pass a special parameter to ignore solver-specific layers; rather the solver itself is the "special case" and has the burden of passing in the special flag. (Note, however, that the solver can still be run on existing train/test prototxts; it only might become necessary to add enable: { solver: true } (or false) to a layer if you want to write a net prototxt that works for train, test, AND deploy.)

*This might not be true if the existing inputs are handled specially somewhere outside of Net::Init, e.g. in the Python wrapper. Are they?

jeffdonahue · 2014-07-21T02:09:35Z

The only other thing I'd like to do before this gets merged (modulo any revisions due to reviewer comments) is add a few NetTests.

jeffdonahue · 2014-07-21T02:47:02Z

Oh right...the solver: true thing doesn't really work either because there are a bunch of uses of Net elsewhere in the tools that don't use the solver but still rely on the data being provided by the network (e.g. test_net, extract_features, etc.). Obviously I could go through and pass solver: true from all those scripts as well, but that seems a bit hacky.

Maybe what I'll end up doing is removing the bool solver from NetState, thereby requiring users to pass a "stage" if they want to distinguish between "solving time" and "deployment time". Since people may rely on the ability to run the existing deploy nets without any changes, I would also revert the existing deploy nets and leave them as is.

jeffdonahue · 2014-07-29T08:11:06Z

I revised this to exclude the INPUT layer and the "solver" parameter being a part of the NetState(Rule). This is good to go as far as I know, but only (conveniently) solves a part of the overall net consolidation issue -- the need to create separate train and test nets. Technically one could use the custom "stages" provided by this PR to also further differentiate between "solver" vs. "deploy" nets, but then this would require every existing tool to correctly set this setting.

One possible solution to this would be to do something along the lines of @sguada's suggestion in #57 with the "include_net" in the proto. Then you'd make three separate files -- one the "main net" file that has conv1-fc8, one the "solver net" file that includes the main net and also has leveldb layers and loss/accuracy, and last the "deploy net" file that also includes the main net and has inputs and softmax prediction output. That solves the problem of redundancy among files, but doesn't eliminate the annoyance of having to work with many files. Still better than nothing though. But I won't do that for this PR; it's an orthogonal change, and this PR is still useful without it (especially for those of us whose primary caffe workflow is to train/test a net for a minute, check the accuracy, and throw it away).

jeffdonahue · 2014-07-29T10:11:21Z

btw, I'd slightly prefer this one be reviewed/merged before #686 despite the ordering of the PRs. (Also this is easily the 'safer' of the two changes in that it's pretty much an optional proto thing.) But it's probably just 5-10 minutes' worth of extra rebase conflict resolution work if you'd prefer the opposite Evan, so no big deal either way.

shelhamer · 2014-07-29T15:06:35Z

I'll review this first. Thanks for letting me know your preference.

On Tuesday, July 29, 2014, Jeff Donahue notifications@github.com wrote:

btw, I'd slightly prefer this one be reviewed/merged before #686
#686 despite the ordering of the PRs.
(Also this is easily the 'safer' of the two changes in that it's pretty
much an optional proto thing.) But it's probably just 5-10 minutes' worth
of extra rebase conflict resolution work if you'd prefer the opposite Evan,
so no big deal either way.

—
Reply to this email directly or view it on GitHub
#734 (comment).

shelhamer · 2014-07-29T16:14:23Z

re: #734 (comment) this is a nice step in combining the definitions, so let's review and merge as-is.

To follow-up after this PR, one of us should:

bring back the INPUT layer you introduced, as I will be happy to finally do away with the oddball input fields and make everything a layer
make the tools understand deploy nets

shelhamer · 2014-07-29T16:30:11Z

src/caffe/solver.cpp

~~Although the logging is precise, the two cases for the logic are just net_param inline prototxt vs net file and the (train_)_net_* cases could be combined or conciseness. Do what you like.~~ I take it back, with all the possible nets there could be when combining definitions it's best to be clear.

shelhamer · 2014-07-29T16:45:28Z

Nice schema in 1cf797825528cb39a8dc43cbe0e9ef592c610939 -- thanks for the good inline documentation of the Net/NetState/NetStateRule fields.

shelhamer · 2014-07-29T17:00:39Z

Ok this all looks good to me. Decide if you want to change any names then merge.

Thanks Jeff! This does make the workflow of defining and training a whole host of nets much neater.

filtering rules for Layers.

includes/excludes layers based on whether the NetState meets each layer's NetStateRule(s).

jeffdonahue · 2014-07-29T20:40:28Z

Thanks for the thorough review Evan! I changed the name of FilterParam to FilterNet and enable/disable to include/exclude as you suggested online & offline. Also added a bunch of unit tests, which sadly made this PR a net increase in lines of code :(

Will merge after Travis.

All-in-one nets

shelhamer mentioned this pull request Jul 20, 2014

Consolidate network definitions #57

Closed

shelhamer self-assigned this Jul 29, 2014

shelhamer reviewed Jul 29, 2014
View reviewed changes

jeffdonahue added 4 commits July 29, 2014 13:11

Add NetState message with phase, level, stage; NetStateRule message with

c2b74c3

filtering rules for Layers.

Add unit tests and skeleton code for Net/Solver filtering functionality.

b883356

Incorporate NetState{,Rule} into Solver/Net. Net::FilterNet

cb4555c

includes/excludes layers based on whether the NetState meets each layer's NetStateRule(s).

Use unified train/test nets in examples.

e526e2d

jeffdonahue added a commit that referenced this pull request Jul 29, 2014

Merge pull request #734 from jeffdonahue/all-in-one-net

16b7b25

All-in-one nets

jeffdonahue merged commit 16b7b25 into BVLC:dev Jul 29, 2014

jeffdonahue deleted the all-in-one-net branch July 29, 2014 20:55

shelhamer mentioned this pull request Jul 30, 2014

Improve Net & Layer Schema #169

Closed

3 tasks

jeffdonahue mentioned this pull request Aug 2, 2014

Fix speed test tool (and possibly others); add script to speedtest imagenet #840

Merged

shelhamer mentioned this pull request Aug 7, 2014

Next: 0.9999 #880

Merged

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#734 from jeffdonahue/all-in-one-net

b4b6db9

All-in-one nets

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#734 from jeffdonahue/all-in-one-net

c60cda7

All-in-one nets

ih4cku mentioned this pull request May 19, 2016

stage ih4cku/caffe-notes#9

Open

ftokarev mentioned this pull request Jan 21, 2017

Obsolete reference to bool solver in caffe.proto #5210

Merged

All-in-one nets #734

All-in-one nets #734

Uh oh!

Conversation

jeffdonahue commented Jul 19, 2014

Uh oh!

jeffdonahue commented Jul 19, 2014

Uh oh!

shelhamer commented Jul 20, 2014

work as before.

Uh oh!

shelhamer commented Jul 20, 2014

work as before.

Uh oh!

jeffdonahue commented Jul 21, 2014

Uh oh!

jeffdonahue commented Jul 21, 2014

Uh oh!

jeffdonahue commented Jul 21, 2014

Uh oh!

jeffdonahue commented Jul 29, 2014

Uh oh!

jeffdonahue commented Jul 29, 2014

Uh oh!

shelhamer commented Jul 29, 2014

Uh oh!

shelhamer commented Jul 29, 2014

Uh oh!

shelhamer Jul 29, 2014

Choose a reason for hiding this comment

Uh oh!

shelhamer commented Jul 29, 2014

Uh oh!

shelhamer commented Jul 29, 2014

Uh oh!

jeffdonahue commented Jul 29, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants