Skip to content

Conversation

@turchanov
Copy link
Contributor

@turchanov turchanov commented Aug 29, 2019

Fix for bug #74170.

Some functions in libmagic (distributed with fileinfo extension) perform this sequence of calls:
func() {
setlocale(LC_TYPE, "C")
.. do some work ..
setlocale(LC_TYPE, "")
}

It effectively resets LC_TYPE if it that was set before the function call.

This bug was fixed in upstream file[aka libmagic] ver 5.18
(file/file@c397fb2)

So I just humbly ported it.

@nikic
Copy link
Member

nikic commented Aug 30, 2019

Looks like this is not upstream code, but something we're patching in ourselves:

file_replace(struct magic_set *ms, const char *pat, const char *rep)
{
- file_regex_t rx;
- int rc, rv = -1;
-
- rc = file_regcomp(&rx, pat, REG_EXTENDED);
- if (rc) {
- file_regerror(&rx, rc, ms);
- } else {
- regmatch_t rm;
- int nm = 0;
- while (file_regexec(&rx, ms->o.buf, 1, &rm, 0) == 0) {
- ms->o.buf[rm.rm_so] = '\0';
- if (file_printf(ms, "%s%s", rep,
- rm.rm_eo != 0 ? ms->o.buf + rm.rm_eo : "") == -1)
- goto out;
- nm++;
- }
- rv = nm;
+ zval patt;
+ uint32_t opts = 0;
+ pcre_cache_entry *pce;
+ zend_string *res;
+ zend_string *repl;
+ size_t rep_cnt = 0;
+
+ (void)setlocale(LC_CTYPE, "C");
+
+ opts |= PCRE2_MULTILINE;
+ convert_libmagic_pattern(&patt, (char*)pat, strlen(pat), opts);
+ if ((pce = pcre_get_compiled_regex_cache(Z_STR(patt))) == NULL) {
+ zval_ptr_dtor(&patt);
+ rep_cnt = -1;
+ goto out;
}

This looks pretty bogus to me. pcre_get_compiled_regex_cache() uses BG(locale_string), not the actual locale that setlocale sets, so I suspect that the current setlocale call doesn't actually do anything at all as far as PCRE is concerned.

I think the correct way to fix this would be to extend the pcre_get_compiled_regex_cache API to accept the locale string as a parameter, so we can just directly pass the C locale without changing any global state.

@turchanov
Copy link
Contributor Author

turchanov commented Aug 31, 2019

I think the correct way to fix this would be to extend the pcre_get_compiled_regex_cache API to accept the locale string as a parameter, so we can just directly pass the C locale without changing any global state.

I could do that, if you don't mind?

But! There is another place where setlocale is used and it is not related to PCRE.

private const char *
cdf_app_to_mime(const char *vbuf, const struct nv *nv)
{
size_t i;
const char *rv = NULL;
(void)setlocale(LC_CTYPE, "C");
for (i = 0; nv[i].pattern != NULL; i++)
if (strcasestr(vbuf, nv[i].pattern) != NULL) {
rv = nv[i].mime;
break;
}
(void)setlocale(LC_CTYPE, "");
return rv;
}

That still needs fixing. (We could theoretically use Intl extension but that will create unnecessary dependency and it will complicate code as whole). What is your decision on that account?

@nikic
Copy link
Member

nikic commented Aug 31, 2019

I think the correct way to fix this would be to extend the pcre_get_compiled_regex_cache API to accept the locale string as a parameter, so we can just directly pass the C locale without changing any global state.

I could do that, if you don't mind?

Feel free :) I would suggest not changing pcre_get_compiled_regex_cache() itself, but adding a new function like pcre_get_compiled_regex_cache_with_locale() and making the old function call it with BG(locale_string). ZSTR_CHAR('C') can be used to pass the C locale without allocating a string.

But! There is another place where setlocale is used and it is not related to PCRE.

private const char *
cdf_app_to_mime(const char *vbuf, const struct nv *nv)
{
size_t i;
const char *rv = NULL;
(void)setlocale(LC_CTYPE, "C");
for (i = 0; nv[i].pattern != NULL; i++)
if (strcasestr(vbuf, nv[i].pattern) != NULL) {
rv = nv[i].mime;
break;
}
(void)setlocale(LC_CTYPE, "");
return rv;
}

That still needs fixing. (We could theoretically use Intl extension but that will create unnecessary dependency and it will complicate code as whole). What is your decision on that account?

Though backing up the old locale would be an improvement, locale changes are still fundamentally not thread-safe, so it would be best to avoid it. We don't seem to have a ready-made function for strcasestr in particular, but an easy way to emulate it is to call zend_str_tolower_dup (which is not locale-sensitive) on both strings and use strstr. This can be made more efficient using stack-allocated strings, but this code does not look particularly performance-critical to me. Once we do that, we can also drop the strcasestr fallback implementation altogether. I think this issue would be best split into a separate PR from the PCRE one.

@krakjoe
Copy link
Member

krakjoe commented Oct 4, 2019

@turchanov can we get a status update here please ?

@turchanov
Copy link
Contributor Author

Sorry for late reply. Will look into that ASAP.

Some functions in libmagic (distributed with fileinfo extension) perform this sequence of calls:
func() {
setlocale(LC_TYPE, "C")
.. do some work ..
setlocale(LC_TYPE, "")
}

It effectively resets LC_TYPE if it that was set before the function call.

To avoid manipulations with current locale at all, the problematic functions
were modified to use locale-independent functions.
@turchanov turchanov changed the base branch from master to PHP-7.2 October 9, 2019 14:17
@turchanov turchanov closed this Oct 9, 2019
@turchanov turchanov reopened this Oct 9, 2019
@turchanov
Copy link
Contributor Author

I originally created pull request against master branch by accident. So I rebased my PR against PHP-7.2 branch. Plus I decided to squash first commit (where I save current locale before setlocale("C")) since it is no longer relevant to the current solution and will clutter the commits history.

@turchanov
Copy link
Contributor Author

Belatedly noticed two cosmetic flaws. Pushed another commit. I can squash again to one commit if needed.

@turchanov
Copy link
Contributor Author

So.... What do you think?

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, this looks good!

@turchanov
Copy link
Contributor Author

So... Will you merge it?

@nikic
Copy link
Member

nikic commented Dec 20, 2019

Whoops, this dropped off my radar, again. Now merged as c62cd9a.

@nikic nikic closed this Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants