Sin comments

Results 54 comments of

Sin

请问原始的chatGLM的模型支持最大的序列长度是多少呢？如果我想在使用ptuning微调的设定更大的长度是否和PRE_SEQ_LEN参数有关呢

理论上rotary embedding是支持无限长的，前提是显存放的下那么多kv_cache。不过如果训练数据都没有超过2048的话，不知道外推到2048以外会不会影响生成效果

Support FP8 KV Cache

LGTM, I was wondering about the performance improvement. And can we run the fp8 intrinsic on Volta/Ampere/Ada arch or is it just Hopper only?

Support FP8 KV Cache

And I want to know which one should we use for better precision and performance between E5M2 and E4M3? I guess this may be related to the specific model.

can not test with restful_api

> Hi @irasin it looks like you are using an older version of MII. Your error message for line 31 of `mii/grpc_related/restful_gateway.py` indicates you are trying to get the `request`...

can not test with restful_api

> @irasin can you please try the following instead? > > ```python > import json > import requests > url = f"http://localhost:8000/mii/mii_test" > params = {"prompts": ["DeepSpeed is", "Seattle is...

can not test with restful_api

> > > @irasin can you please try the following instead? > > > ```python > > > import json > > > import requests > > > url =...

Issue with raylet error

@ZihanWang314, Got the same warning, but the model is still running. It seems that the disk space is not enough, just use `df -h` to check the disk space

> > > 是的，按道理来说，应该会忽略pad的值。所以这个感觉更像是是transformer的一个bug吧 > > > > > > 我的困惑点是在是否使用use_cache，如果不使用那padding在右边也可以，只要解码出下一个token_ids时候接到上次padding之前，如下所示： > > ```python > > # 原始输入 > > input = tensor([[1,2,3,4,5], [1,2,0,0,0]]) > > # 解码得到各自的结果是6, 3，那下一次输入的就应该是 >...

[BUG/Help]开batch预测时，模型结果不一致。

> @irasin 是的，我看了下padding在左边，其position_ids也是从非padding位置开始的，这样不仅是chatglm，应该所有的decoder架构的都可以padding在左边来实现batch_generate 确实是这样的，但是chatglm相比其他的模型还是要特殊一点的，由于涉及到position_2d，做batch_generation的时候，两种position的处理方式还不一样，这个需要注意一下

[Help] 生成的停止条件是什么

你好@chenyiwan ，感谢回复想请教一下，在推理的代码中，有比如说判断是eos然后停止生成的地方吗？好像没看到有相关的代码哎