Skip to content

Conversation

@jeffdonahue
Copy link
Contributor

(Replaces #1872)

Based on #1977 (parameter gradient accumulation). This adds EmbedLayer (should probably change the name to EmbeddingLayer for consistency with PoolingLayer etc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to an InnerProductLayer with "one-hot" vector inputs, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change that continues the unfortunate trend of casting floats to ints as labels.

@jeffdonahue
Copy link
Contributor Author

Rebased and ready for review. (Previously depended on gradient accumulation PR #1663.)

@futurely
Copy link

This layer works as a lookup table and could be renamed to LookupTable.
https://github.com/torch/nn/blob/master/doc/convolution.md#nn.LookupTable
https://github.com/torch/nn/blob/master/LookupTable.lua

shelhamer added a commit that referenced this pull request Aug 25, 2015
Embed layer for lookup table of one hot encodings
@shelhamer shelhamer merged commit 80579b8 into BVLC:master Aug 25, 2015
@shelhamer shelhamer mentioned this pull request Aug 25, 2015
@jeffdonahue jeffdonahue deleted the embed-layer branch August 26, 2015 00:55
@beniz
Copy link

beniz commented Dec 8, 2015

I am very confused by this Embed layer. My hunch is that no one uses it outside of the RNN/LSTM branch. So I doubt I'll get any answer but let's try just in case.

I've tried to use it in a simple MLP like this:

Here is an example of a typical prototxt section:

layer {
  name: "embed"
  type: "Embed"
  bottom: "data"
  top: "embed_data"
  embed_param {
    input_dim: 5454
    num_output: 200
    weight_filler {
      type: "uniform"
      min: -0.08
      max: 0.08
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip0"
  type: "InnerProduct"
  bottom: "embed_data"
  top: "ip0"
  inner_product_param {
    num_output: 200
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  } 
}

Filling up the data elements with the vocabulary indices of words in a sentence, naturally I get an error from the data_transformer since datum channels are now of various sizes. Then I tried padding the remaining elements to 0, as I understand it is done in https://github.com/BVLC/caffe/pull/1873/files

But in this case, there's no memory advantage of doing this vs one-hot vectors since the input dim is the same. Thus I am confused :)

Needless to say, any help is highly appreciated at this point!

Understood, of course the padding is to fix the input sequence length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants