Forgive me if I am asking a naive question. What is the difference between Pooling and Sampling, say TemporalMaxPooling vs TemporalSubSampling, and SpatialMaxPooling and SpatialSubSampling?
My understanding is that both of them are layer-wise operations (as special cases of convolution?), where each layer of the input will be mapped independently to a layer of the output. With the difference that a MaxPooling uses a max op and SubSampling uses a Weighted sum of the local region.
So is it safe to say that SubSampling is a generalization of MaxPooling and AveragePooling? Is there any references on how to use SubSampling in practice?
The documentations on both TemporalSubSampling and SpatialSubSampling are a little confusing. E.g.,
- In TemporalSubSampling Section, it says "If the input sequence is a 2D tensor
nInputFrame x inputFrameSize, the output sequence will be inputFrameSize x nOutputFrame", should it be nOutputFrame x inputFrameSize based on the definition equation below or I misunderstood it?
- Similar as in SpatialSubSampling section, where it says "the output image size will be
nInputPlane x oheight x owidth", but in the equation below output[i][j][k] = bias[k] + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k], the order of the input and output dimension is actually owidth x oheight x nInputPlane.
Appreciate it if someone can help.
Forgive me if I am asking a naive question. What is the difference between Pooling and Sampling, say TemporalMaxPooling vs TemporalSubSampling, and SpatialMaxPooling and SpatialSubSampling?
My understanding is that both of them are layer-wise operations (as special cases of convolution?), where each layer of the input will be mapped independently to a layer of the output. With the difference that a
MaxPoolinguses a max op andSubSamplinguses aWeighted sum of the local region.So is it safe to say that
SubSamplingis a generalization ofMaxPoolingandAveragePooling? Is there any references on how to useSubSamplingin practice?The documentations on both TemporalSubSampling and SpatialSubSampling are a little confusing. E.g.,
nInputFrame x inputFrameSize, the output sequence will beinputFrameSize x nOutputFrame", should it benOutputFrame x inputFrameSizebased on the definition equation below or I misunderstood it?nInputPlane x oheight x owidth", but in the equation belowoutput[i][j][k] = bias[k] + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k], the order of the input and output dimension is actuallyowidth x oheight x nInputPlane.Appreciate it if someone can help.