Results 10 comments of LJQ

There are skip connections. See Add() in EncoderLayer/DecoderLayer. The tricks (lr scheduler, etc.) should be used if the network is deep, even if there are skip connections.

The learning rate and the optimization strategy should be carefully tuned. I suggest using the same optimization strategy as BERT. MELLAH Youssef 于2020年6月28日周日 上午6:38写道: > Hi. > I have the...

Did you mean that the ``eval'' mode in the code does not work? Is the model loading failed? Check the line 34-35. By the way, a newer version of tf...

This function is always failed in my environment, so I usually save/load weights only with using code to construct the model.

CyberZHG's ```python variance = K.mean(K.square(inputs - mean), axis=-1, keepdims=True) std = K.sqrt(variance + self.epsilon) ``` My ```python std = K.std(x, axis=-1, keepdims=True) ``` I think maybe there are input sequences...

The tf loss contains a softmax. In fact, you do softmax twice.

I can run the code in tf=2.6.0. Please provide the environments. Or using a lambda layer to package the tf function may help, like this: transformer.py 453-457 #loss = get_loss(final_output,...

```python from transformer import QANet_Encoder inp = Input(shape=(max_len,), dtype='int32') x = Embedding(words.num(), 64)(inp) x = Dropout(0.5)(x) mask = Lambda(lambda x:K.cast(K.greater(x, 0), 'float32'))(inp) x = QANet_Encoder(64, n_head=4, n_conv=2, n_block=3, kernel_size=5, dropout=0.5,...

keras 2.1.3, tensorflow 1.4.1

OpenLLaMA和我们自己复现的另一个LLaMA,正在比较效果