LJQ
LJQ
There are skip connections. See Add() in EncoderLayer/DecoderLayer. The tricks (lr scheduler, etc.) should be used if the network is deep, even if there are skip connections.
The learning rate and the optimization strategy should be carefully tuned. I suggest using the same optimization strategy as BERT. MELLAH Youssef 于2020年6月28日周日 上午6:38写道: > Hi. > I have the...
Did you mean that the ``eval'' mode in the code does not work? Is the model loading failed? Check the line 34-35. By the way, a newer version of tf...
This function is always failed in my environment, so I usually save/load weights only with using code to construct the model.
CyberZHG's ```python variance = K.mean(K.square(inputs - mean), axis=-1, keepdims=True) std = K.sqrt(variance + self.epsilon) ``` My ```python std = K.std(x, axis=-1, keepdims=True) ``` I think maybe there are input sequences...
The tf loss contains a softmax. In fact, you do softmax twice.
I can run the code in tf=2.6.0. Please provide the environments. Or using a lambda layer to package the tf function may help, like this: transformer.py 453-457 #loss = get_loss(final_output,...
```python from transformer import QANet_Encoder inp = Input(shape=(max_len,), dtype='int32') x = Embedding(words.num(), 64)(inp) x = Dropout(0.5)(x) mask = Lambda(lambda x:K.cast(K.greater(x, 0), 'float32'))(inp) x = QANet_Encoder(64, n_head=4, n_conv=2, n_block=3, kernel_size=5, dropout=0.5,...
keras 2.1.3, tensorflow 1.4.1
OpenLLaMA和我们自己复现的另一个LLaMA,正在比较效果