add string case conversion functions: lower and upper#7235
add string case conversion functions: lower and upper#7235waruqi merged 43 commits intoxmake-io:devfrom
Conversation
Summary of ChangesHello @luadebug, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the string manipulation capabilities of the system by integrating new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces string.lower() and string.upper() functions for case conversion, with support for Unicode characters. The implementation correctly handles character set conversions.
My review focuses on improving robustness and maintainability. I've identified potential integer overflow vulnerabilities in memory allocation calculations that could lead to security issues with very large input strings. I've also suggested refactoring the main logic to simplify error handling and improve code readability.
The added tests are good and cover both ASCII and non-ASCII strings.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces lower and upper string case conversion functions, which is a valuable addition. The implementation correctly handles Unicode by converting strings to a wide-character format before processing, and this is validated with tests for both ASCII and non-ASCII character sets. I have provided one suggestion to refactor the core logic in case.c to improve its readability, maintainability, and memory efficiency. Overall, the changes are well-implemented.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces string.lower() and string.upper() functions for case conversion of strings, with full UTF-8 support. This is achieved by adding the utf8proc library as a new dependency. The implementation is solid, correctly handling UTF-8 characters and even preserving invalid byte sequences. The integration into the build system and the addition of comprehensive test cases are well done. I have one minor suggestion to improve code hygiene in the new implementation file.
|
Please do not include third-party dependencies. xmake strictly controls the inclusion of third-party libraries and may gradually remove some existing third-party library dependencies in the future. tbox has built-in encoding handling and cross-platform support; we should improve tbox as much as possible to improve and implement these features. |
OK. What is a solution for this situation? |
|
Supporting full character set conversion seems like a huge undertaking, potentially increasing the xmake binary by over 1-2MB. We should improve the Then, for some system platforms, we can use existing system APIs to support it (e.g., Windows This might be quite complex and time-consuming. If you can't implement it in a short time, you can leave it for now. I'll try to support it when I have more time later. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds string.lower() and string.upper() functions with Unicode support. The implementation is well-designed, with an optimization for ASCII strings and correct handling of Unicode characters via wide-character conversion. The approach to managing locales across different platforms is thorough. However, I've found a potential bug in the setlocale-based implementation where a failure to save the current locale could lead to it not being restored, which could have side effects. I've included a suggestion to make this more robust. The new tests provide good coverage for various character sets.
Looks cool, would be nice to have it clean without windows.h or some appleframework? |
|
I don't know if tb_wchar_t maps to wchar_t but beware on Windows wchar_t is UTF-16 and has surrogate pairs, which means a single wchar_t may be half a character. |
Therefore, I would prioritize using the CharUpperW API to implement complete string conversion, as is the case on Apple's platform. A while-loop + towupper is the fallback solution. @luadebug |
Well sounds good as we have them included at any case. |
|
We also need to update tbox.config.h for each platform to add detection results such as wcsupr and towlower. Usually only towlower and towupper exist. |
|
wait for this patch #7237 |
|
rebase dev and try it again. |
core/src/xmake/string/case.c
Outdated
| tb_size_t i; | ||
| for (i = 0; i < size; i++) { | ||
| // Stop at first non-ascii char | ||
| if ((tb_byte_t)str[i] >= 0x80) break; |
There was a problem hiding this comment.
we can use tb_charset_utf8_toupper directly https://github.com/tboox/tbox/blob/73028bf9165aece5ec9863124fda67500a005e24/src/tbox/charset/utf8.c#L258
core/src/xmake/string/case.c
Outdated
| /* ////////////////////////////////////////////////////////////////////////////////////// | ||
| * implementation | ||
| */ | ||
| tb_int_t xm_string_lower(lua_State* lua) { |
There was a problem hiding this comment.
please move to lower.c and upper.c file
|
like this /* string.upper
*
* @param str the string
*
* @code
* local result = string.upper(str)
* @endcode
*/
tb_int_t xm_string_toupper(lua_State *lua) {
// check
tb_assert_and_check_return_val(lua, 0);
// get string
size_t size = 0;
tb_char_t const* cstr = luaL_checklstring(lua, 1, &size);
tb_check_return_val(cstr, 0);
// empty?
if (!size) {
lua_pushstring(lua, "");
return 1;
}
// copy string to buffer
tb_char_t* buffer = tb_malloc_bytes(size + 1);
if (buffer) {
tb_memcpy(buffer, cstr, size);
buffer[size] = '\0';
// to upper
tb_long_t real_size = tb_charset_utf8_toupper(buffer, size);
// push result
if (real_size >= 0) {
lua_pushlstring(lua, buffer, real_size);
} else {
lua_pushlstring(lua, cstr, size);
}
tb_free(buffer);
} else {
lua_pushnil(lua);
}
return 1;
} |
Resolves: #7234