LJQ comments

Results 10 comments of

LJQ

Skip-connection in Transformer

There are skip connections. See Add() in EncoderLayer/DecoderLayer. The tricks (lr scheduler, etc.) should be used if the network is deep, even if there are skip connections.

why get same output with different input?

The learning rate and the optimization strategy should be carefully tuned. I suggest using the same optimization strategy as BERT. MELLAH Youssef 于2020年6月28日周日上午6:38写道： > Hi. > I have the...

why get same output with different input?

Did you mean that the ``eval'' mode in the code does not work? Is the model loading failed? Check the line 34-35. By the way, a newer version of tf...

Save model to json

This function is always failed in my environment, so I usually save/load weights only with using code to construct the model.

'nan' loss function when using layer normalization

CyberZHG's ```python variance = K.mean(K.square(inputs - mean), axis=-1, keepdims=True) std = K.sqrt(variance + self.epsilon) ``` My ```python std = K.std(x, axis=-1, keepdims=True) ``` I think maybe there are input sequences...

maybe i find a point should be change

The tf loss contains a softmax. In fact, you do softmax twice.

startup error

I can run the code in tf=2.6.0. Please provide the environments. Or using a lambda layer to package the tf function may help, like this: transformer.py 453-457 #loss = get_loss(final_output,...

Using the transformer instead of a simple LSTM layer

```python from transformer import QANet_Encoder inp = Input(shape=(max_len,), dtype='int32') x = Embedding(words.num(), 64)(inp) x = Dropout(0.5)(x) mask = Lambda(lambda x:K.cast(K.greater(x, 0), 'float32'))(inp) x = QANet_Encoder(64, n_head=4, n_conv=2, n_block=3, kernel_size=5, dropout=0.5,...

Keras and Tensorflow Versions

keras 2.1.3, tensorflow 1.4.1

商用版本

OpenLLaMA和我们自己复现的另一个LLaMA，正在比较效果