mempalace_check_duplicate misses clearly existing content at its default threshold of 0.9.
I tested it against text that already exists in the palace, including exact sentences copied from indexed content, and still got:
{
"is_duplicate": false,
"matches": []
}
Example 1:
马丁·海德格尔(Martin Heidegger)出生于德国巴登——符腾堡(Baden-Württemberg)梅斯基尔希的一个贫寒的天主教家庭中。
This only returned true after lowering the threshold to 0.4.
Example 2:
本标准使用重新起草法参考 ISO 690:2010(E)《信息和文献 参考文献和信息资源引用指南》编制,与 ISO 690:2010 的一致性程度为非等效。
This only returned true after lowering the threshold to 0.15.
So on real indexed content, the default 0.9 appears too high to detect even exact existing text.
mempalace_check_duplicatemisses clearly existing content at its default threshold of0.9.I tested it against text that already exists in the palace, including exact sentences copied from indexed content, and still got:
{
"is_duplicate": false,
"matches": []
}
Example 1:
马丁·海德格尔(Martin Heidegger)出生于德国巴登——符腾堡(Baden-Württemberg)梅斯基尔希的一个贫寒的天主教家庭中。
This only returned
trueafter lowering the threshold to0.4.Example 2:
本标准使用重新起草法参考 ISO 690:2010(E)《信息和文献 参考文献和信息资源引用指南》编制,与 ISO 690:2010 的一致性程度为非等效。
This only returned
trueafter lowering the threshold to0.15.So on real indexed content, the default
0.9appears too high to detect even exact existing text.