refactor(translator): enhance prompt conversion method by Tql-ws1 · Pull Request #637 · PDFMathTranslate/PDFMathTranslate

Tql-ws1 · 2025-02-16T16:34:05Z

…hen using custom prompts. Main Changes: - Replaced `eval()` with `json.loads()` to convert text into a JSON object. - Emphasized the occurrence of errors that prevent further execution. - Used `string.Template.substitute()` instead of `string.Template.safe_substitute()` to avoid unintentional use of incorrectly formatted templates. - Added logging information to inform users whether the custom prompt is being used effectively. Other Changes: Refactor!: - Used `num_predict` parameter to set the output length limit for large language models in ollama. Perf!: - Removed broad and redundant exception throwing. - Disabled streaming output and used regular expressions to remove thought chain outputs. Test: - Added unit test cases for `OllamaTranslator`. Style: - Sorted library imports. - Added type hints where necessary. Related to issue #636.

awwaawwa · 2025-02-16T16:45:10Z

I have converted this PR to draft status. Please mark it as Ready after the work is completed.

awwaawwa · 2025-02-16T16:47:34Z

_remove_cot_content can use re.sub and ^<think>.+?</think>

….sub`

Tql-ws1 · 2025-02-16T16:58:59Z

_remove_cot_content can use re.sub and ^<think>.+?</think>

Good idea!

hellofinch · 2025-02-17T01:27:28Z

def do_translate(self, text):
        for model in self.model.split(";"):
            try:
                response = self.client.chat(
                    model=self.model,
                    options=self.options,
                    messages=self.prompt(text, self.prompttext),
                )
                response = response["message"]["content"].strip()
                if (
                    "deepseek-r1" in model
                    and "<think>" in response["message"]["content"].strip()
                    and "</think>" in response["message"]["content"].strip()
                ):
                    response = re.sub(
                        r"^<think>.+?</think>",
                        "",
                        response["message"]["content"].strip(),
                    )
                return response
            except Exception as e:
                print(e)
        raise Exception("All models failed")

Here for you.
Remember to test API, CLI, and WebUI to make sure the custom prompt is working well.
Custom prompt use Templete. Sometime None check will pass.

…bstituted variables can cause JSON parsing errors

Tql-ws1 · 2025-02-17T12:13:36Z

Completed. However, json.loads() can only parse and will not perform implicit conversion for invalid content. Therefore, the prompts users provide in the future need to be in valid JSON format.

Tql-ws1 · 2025-02-17T12:21:18Z

And you, @hellofinch! If you wanted to solve this issue, you should have raised it four days ago and also initiated a pull request at the same time, rather than waiting until today. This way, others would have one less merge conflict to handle and wouldn't have to clean up the mess you left behind!

awwaawwa · 2025-02-17T13:43:20Z

Please pass all CI checks, thank you!

However, it might be because the format was changed recently but CI wasn't updated........

awwaawwa · 2025-02-17T13:47:01Z

And you, @hellofinch! If you wanted to solve this issue, you should have raised it four days ago and also initiated a pull request at the same time, rather than waiting until today. This way, others would have one less merge conflict to handle and wouldn't have to clean up the mess you left behind!

Thank you for your feedback. We understand that merge conflicts can create additional workload. At the same time, we want to thank @hellofinch for their efforts and voluntary contributions. This is an open-source project, and everyone's time and energy are valuable - we're grateful for all participation.

To better avoid similar conflicts in the future, everyone can consider raising issues and discussing them early when problems are discovered, and plan solutions in advance. If you have any suggestions or better practices to make collaboration smoother, we very much welcome your sharing!

Once again, thank you to every contributor for your dedication in making this project better together!

Tql-ws1 · 2025-02-17T14:56:33Z

Undoubtedly, @hellofinch's contributions are certainly commendable. But, code reviewers! Could you please help? How did something like using eval() to convert template strings into ChatRequest messages happen? When I checked the blame for this code, I also noticed several poor changes related to @hellofinch.

When I raised issue #636 and initiated pull request #637, I wondered if @hellofinch had reviewed them. I assume they didn't, otherwise, why would they suggest more verbose and less readable code and remind me to conduct thorough testing? If you had reviewed my changes, you might not have posted such content. Interestingly, about ten minutes after posting this code, @hellofinch began submitting a related pull request, with the commit time for pdf2zh/translator.py being February 13, 2025. Merge conflicts are inevitable. Buddy, could you please take a look at what changes #637 made before submitting?

That's why I'm frustrated with @hellofinch and the reviewers who approved @hellofinch's commit. It's absurd! Making contributions is cool, but when those contributions contradict the project's direction and quality, and are accepted without refinement, they become a burden.

Byaidu · 2025-02-17T17:17:03Z

Undoubtedly, @hellofinch's contributions are certainly commendable. But, code reviewers! Could you please help? How did something like using eval() to convert template strings into ChatRequest messages happen? When I checked the blame for this code, I also noticed several poor changes related to @hellofinch.

When I raised issue #636 and initiated pull request #637, I wondered if @hellofinch had reviewed them. I assume they didn't, otherwise, why would they suggest more verbose and less readable code and remind me to conduct thorough testing? If you had reviewed my changes, you might not have posted such content. Interestingly, about ten minutes after posting this code, @hellofinch began submitting a related pull request, with the commit time for pdf2zh/translator.py being February 13, 2025. Merge conflicts are inevitable. Buddy, could you please take a look at what changes #637 made before submitting?

That's why I'm frustrated with @hellofinch and the reviewers who approved @hellofinch's commit. It's absurd! Making contributions is cool, but when those contributions contradict the project's direction and quality, and are accepted without refinement, they become a burden.

Apologies for my mistake in reviewing, and thank you for pointing it out. We will revert the erroneous commits and work on improving the maintainability of the codebase.

hellofinch · 2025-02-18T01:07:36Z

GOOD!

test/test_translator.py

pdf2zh/translator.py

awwaawwa

Could you revise the test when you have time?

…ient

…r prompt parsing and add prompt to cache params

awwaawwa

@hellofinch when you have time please help test the cli etc.

I've tested the webui, it works fine.

Tql-ws1 · 2025-02-18T17:21:53Z

Should we mock self.client.chat here and then do the testing?

@awwaawwa Thank you for pointing out that "the object being mocked should be more concrete" and for helping me further improve the unit tests of OllamaTranslator.

In addition, using ast.literal_eval can safely evaluate an expression node or a string containing only a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis.

ast.literal_eval() attempts to evaluate the text, but it cannot convert risky text into pure text. You can try adding some expressions in the prompt, such as a2-a1, or inserting a single quote, and see what happens. I believe user input should be normalized rather than compromised, which would otherwise leave hidden issues and result in an endless stream of parsing errors and strange exceptions.

Here you can catch errors and send out user-friendly error messages

I reviewed your changes here, and I think we should not add any exception handling statements except for those expected exceptions. I raised #636 because when an error occurred, it did not provide specific exception information, making it difficult to pinpoint the root cause. You can try modifying the code in prompt() locally; there is only one line, raise, and see if the program halts.

awwaawwa · 2025-02-19T02:24:13Z

@Tql-ws1 The exception handling is a problem with the current program architecture. I plan to solve it through pdf2zh 2.0. The current solution is a workaround that can help reduce customer service pressure.

As for ast.literal_eval(), I think it can be completely removed. Specifically: we can write all prompts into the User prompt, and then just use template strings.

Additionally, this modification can improve compatibility with models like o1/r1, since these models cannot set system prompts.

Tql-ws1 · 2025-02-19T06:18:57Z

As for ast.literal_eval(), I think it can be completely removed. Specifically: we can write all prompts into the User prompt, and then just use template strings.

Additionally, this modification can improve compatibility with models like o1/r1, since these models cannot set system prompts.

Using template substitution alone can indeed avoid parsing issues caused by formatting, but what about models that rely on system prompts to constrain their behavior? This might reduce translation quality. How about considering releasing a nightly version to crowdtest and compare the effects before and after removing the system prompt?

awwaawwa · 2025-02-19T06:33:59Z

Unfortunately, the current release CI doesn't handle nightly builds very well... However, I hope to improve this in #645.

After improving the release CI, releasing will become much easier, so it should be safe to include this change directly in the official version. If there are many reports of translation quality degradation, we can revert this change.

As for ast.literal_eval(), I think it can be completely removed. Specifically: we can write all prompts into the User prompt, and then just use template strings.
Additionally, this modification can improve compatibility with models like o1/r1, since these models cannot set system prompts.

Using template substitution alone can indeed avoid parsing issues caused by formatting, but what about models that rely on system prompts to constrain their behavior? This might reduce translation quality. How about considering releasing a nightly version to crowdtest and compare the effects before and after removing the system prompt?

Tql-ws1 · 2025-02-19T06:39:42Z

Okay, understood. Are there any other areas that need improvement regarding this submission? If not, I think I've covered everything I needed to do.

awwaawwa · 2025-02-19T07:01:43Z

Okay, understood. Are there any other areas that need improvement regarding this submission? If not, I think I've covered everything I needed to do.

Next, I'll wait for @hellofinch to help test the API and CLI custom prompt functionality. After his testing, I will merge this PR.

awwaawwa · 2025-02-21T01:59:22Z

@Tql-ws1 I just remembered that https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#custom-prompt this file needs some modifications. For example, this prompt I used for testing before:

You are a professional,authentic machine translation engine. 如果目标语言为中文，你需要翻译成文言文。 Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:

hellofinch · 2025-02-21T08:03:51Z

CLI and WebUI work fine for me. @awwaawwa
API never support prompt. I will try fix API part.

awwaawwa · 2025-02-21T08:04:54Z

Thank you for your contribution. According to https://funstory-ai.github.io/BabelDOC/CONTRIBUTOR_REWARD/ , you can apply for a monthly membership redemption code for Immersive Translate.

hellofinch · 2025-02-21T08:53:45Z

Nice work! API works good now! Very Well!!

Tql-ws1 added 2 commits February 17, 2025 00:30

Merge branch 'Byaidu:main' into change-prompt-conversion-method

ddf5929

Tql-ws1 mentioned this pull request Feb 16, 2025

An error occurred when using custom prompt template #636

Closed

4 tasks

awwaawwa linked an issue Feb 16, 2025 that may be closed by this pull request

An error occurred when using custom prompt template #636

Closed

4 tasks

awwaawwa marked this pull request as draft February 16, 2025 16:44

refactor(_remove_cot_content): Remove thought chain content using `re…

b67f463

….sub`

awwaawwa linked an issue Feb 16, 2025 that may be closed by this pull request

feat(translator): remove LLM <think>xxx</think> #609

Closed

Tql-ws1 added 3 commits February 17, 2025 14:26

Merge branch 'main' into change-prompt-conversion-method

4e7c505

fix(prompt): Using json.loads() to load the template string with su…

e5d09cd

…bstituted variables can cause JSON parsing errors

test(translator): Add OllamaTranslator test cases

f9e309c

Tql-ws1 marked this pull request as ready for review February 17, 2025 12:21

style(test_translator): Use black formatting

5609530

awwaawwa self-requested a review February 17, 2025 14:28

awwaawwa reviewed Feb 18, 2025

View reviewed changes

test/test_translator.py Show resolved Hide resolved

awwaawwa reviewed Feb 18, 2025

View reviewed changes

pdf2zh/translator.py Outdated Show resolved Hide resolved

awwaawwa added this to the v1.9.1 milestone Feb 18, 2025

awwaawwa requested changes Feb 18, 2025

View reviewed changes

awwaawwa added 2 commits February 18, 2025 23:55

test(translator): Enhance OllamaTranslator test coverage with mock cl…

dc225ac

…ient

refactor(translator): Replace json.loads() with ast.literal_eval() fo…

fd36305

…r prompt parsing and add prompt to cache params

refactor(translator): Simplify prompt parsing and error handling

a08f108

awwaawwa requested review from awwaawwa and removed request for awwaawwa February 18, 2025 16:08

awwaawwa reviewed Feb 18, 2025

View reviewed changes

awwaawwa approved these changes Feb 18, 2025

View reviewed changes

format

c4ae42b

Tql-ws1 requested a review from awwaawwa February 18, 2025 17:31

awwaawwa added 2 commits February 19, 2025 10:28

refactor(translator): Improve prompt construction and formatting

3ddbeb0

format

26279ba

Tql-ws1 added 2 commits February 19, 2025 15:23

Merge branch 'Byaidu:main' into change-prompt-conversion-method

2fe1d61

refactor(prompt): Simplify

b828581

awwaawwa mentioned this pull request Feb 19, 2025

支持推理模型进行翻译 #650

Open

Tql-ws1 added 3 commits February 21, 2025 12:41

test(translator): Add OllamaTranslator ResponseError test case

8d45f6e

doc(advanced): Update the Custom Prompt section

1f83bb1

style(test_translator): Black formatting

432e55f

awwaawwa changed the title ~~Change prompt conversion method~~ refactor(translator): enhance prompt conversion method Feb 21, 2025

awwaawwa merged commit ecb9218 into PDFMathTranslate:main Feb 21, 2025
2 checks passed

Tql-ws1 deleted the change-prompt-conversion-method branch February 27, 2025 03:04

Uh oh!

Conversation

Tql-ws1 commented Feb 16, 2025

Uh oh!

awwaawwa commented Feb 16, 2025

Uh oh!

awwaawwa commented Feb 16, 2025

Uh oh!

Tql-ws1 commented Feb 16, 2025

Uh oh!

hellofinch commented Feb 17, 2025

Uh oh!

Tql-ws1 commented Feb 17, 2025

Uh oh!

Tql-ws1 commented Feb 17, 2025

Uh oh!

awwaawwa commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awwaawwa commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tql-ws1 commented Feb 17, 2025

Uh oh!

Byaidu commented Feb 17, 2025

Uh oh!

hellofinch commented Feb 18, 2025

Uh oh!

Uh oh!

Uh oh!

awwaawwa left a comment

Choose a reason for hiding this comment

Uh oh!

awwaawwa left a comment

Choose a reason for hiding this comment

Uh oh!

Tql-ws1 commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awwaawwa commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tql-ws1 commented Feb 19, 2025

Uh oh!

awwaawwa commented Feb 19, 2025

Uh oh!

Tql-ws1 commented Feb 19, 2025

Uh oh!

awwaawwa commented Feb 19, 2025

Uh oh!

awwaawwa commented Feb 21, 2025

Uh oh!

hellofinch commented Feb 21, 2025

Uh oh!

awwaawwa commented Feb 21, 2025

Uh oh!

Uh oh!

hellofinch commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

awwaawwa commented Feb 17, 2025 •

edited

Loading

awwaawwa commented Feb 17, 2025 •

edited

Loading

Tql-ws1 commented Feb 18, 2025 •

edited

Loading

awwaawwa commented Feb 19, 2025 •

edited

Loading