Red Marker: Guaranteed quality reading right here!

From MSDN:

It is recommended that a C/C++ application (or library) have its manifest embedded inside the final binary because this guarantees correct runtime behavior in most scenarios. — MSDN

Guarantees behavior… in most scenarios? That’s not actually how the language qualifier guarantees works. Let’s try some similar constructs!

The software works 100%, most of the time!

… or …

I guarantee we’ll come in under budget and ahead of schedule most of the time!

… and when we don’t it’s because your expectation of “most” exceeded our guaranteed criteria!

most is clearly defined as 50.000000001% of the time, and it’s an aggregate over a sliding window. That means if we’re at a 35% success rate, it’s because we need more opportunities to bring the aggregate up to 50.000000001%. In other words, we guarantee success 50.0000000001% of the time over an undisclosed number of data points such that the final result can be reasonably construed as "most of the time".

Guarantees require measurable and quantifiable criteria. Without that criteria there is no guarantee, and it becomes just a nonsense word meant to invoke emotion rather than add value or informative content to the documentation.

Researching the MAX_PATH barrier on Windows

I’m working on a utility called eudo, and as part of this process I wanted to ensure that the utility made the best of modern Windows Long Path Name (LPN) support. This is not to be confused with Long File Name (LFN) support, which was hacked onto the FAT file system and gave MAX_PATH (260) character limit to file and directory names. This is about the LPNs which have been a basic functional capability of NTFS file systems since it was first introduced sometime around 1994 — where individual file and directory names are limited to MAX_PATH but the total length of any given path is PATHCCH_MAX_CCH (32767) characters in length.

Supporting these LPNs has become increasingly important as developers who grew up on Linux and MacOS platforms continue to produce software that relies on it. The npm / node.js community in particular runs into the 260 char limit, especially on distributed build systems with deeply-nested file structures meant to sandbox categories of jobs. This means that eudo really needs to support these, insofar is possible on windows.

For reasons mostly centering around security paranoia, Microsoft never really supported LPNs in any clear or concise fashion. The prevailing theory was (and still is?) that retroactively updating LPN support into system libraries would cause buffer overruns and all the world’s software would become compromised. Never mind the fact that happens a couple times a year already in all manner of other ways for software across all platforms — and the world keeps ticking along. Never mind the fact that such changes are as likely to help engineers find and fix existing-but-unknown exploits as it is to create new unknown exploits that could be abused before engineers discover them. Never mind that having APIs randomly and unexpectedly truncate LPNs as returned by NTFS to MAX_PATH is itself a potential security hazard.

I’ve known since the Windows 2000 days that the wide-char (unicode) versions of most Windows File APIs support LPNs fine enough. But I was hoping there was a better way today, seeing that we’re 15 years later, and staring down the barrel of the bold new Windows 10 and it’s built-in hybrid Ubuntu Linux layer feature.

Step 1. Research Latest Microsoft Path Manipulation APIs

My first stop was to read the MSDN section about MAX_PATH. This sent me down a rabbit hole. There’s some mess about some new LPN notation \\?\ that bypasses some internal processing that might “break” LPNs.

For file I/O, the \\?\ prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system. For example, if the file system supports large paths and file names, you can exceed the MAX_PATH limits that are otherwise enforced by the Windows APIs.

That sounds like really nuanced behavior that deserves a better explanation. Which Windows APIs? Because I can tell you right now — by direct empirical evidence — that LPNs are working just fine when I feed them into Win32 APIs like CreateFileW. Is this something that’s UWP only? Does it only affect CreateFileA?

Note The maximum path of 32,767 characters is approximate, because the \\?\ prefix may be expanded to a longer string by the system at run time, and this expansion applies to the total length.

So let’s break this down by what it really means: The string that “gets no expansion or processing” still gets expanded or processed somehow, and because of that, the actual max length is unknown — which makes this, in a way, more dangerous than the old stupid truncate-at-MAX_PATH behavior. At least with that one you could assert or error on a path that was too long, and be sure the user got a concrete explanation rather than random "file not found!" error or possibly something less vague. Now we have LPNs but there’s some magic unknown padding somewhere between 32000 and 32767 chars (undisclosed) where things break down and you get potentially mysterious errors resulting from path name truncation and possibly the ability to create a file on NTFS (max length 32767) that can simply never be accessed by windows itself. Waaat?

And then there’s this:

A registry key allows you to enable or disable the new long path behavior. To enable long path behavior set the registry key at HKLM\SYSTEM\CurrentControlSet\Control\FileSystemLongPathsEnabled (Type: REG_DWORD). The key’s value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (list follows). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.

A registry key? Reboot required? Affects all apps? Ok, this isn’t sounding very ideal. It’s clearly intended for use only by systems engineers in controlled production-level environments. There is a manifest-based alternative.

Finally, we get down to a point where it mentions what functions are actually affected by this new LPN support and registry key mess:

These are the directory management functions that no longer have MAX_PATH restrictions if you opt-in to long path behavior: CreateDirectoryW,CreateDirectoryExW, GetCurrentDirectoryW, RemoveDirectoryW, SetCurrentDirectoryW.

These are the file management functions that no longer have MAX_PATH restrictions if you opt-in to long path behavior: CopyFileW, CopyFile2, CopyFileExW, CreateFileW, CreateFile2, CreateHardLinkW, CreateSymbolicLinkW, DeleteFileW, FindFirstFileW, FindFirstFileExW, FindNextFileW, GetFileAttributesW, GetFileAttributesExW, SetFileAttributesW, GetFullPathNameW, GetLongPathNameW, MoveFileW, MoveFileExW, MoveFileWithProgressW, ReplaceFileW, SearchPathW, FindFirstFileNameW, FindNextFileNameW, FindFirstStreamW, FindNextStreamW, GetCompressedFileSizeW, GetFinalPathNameByHandleW.

But wait a minute? CreateFileW already supports Long Path Names since Windows 2000! I also know for a fact that SetCurrentDirectoryW does not allow LPNs under any circumstances due to a limitation of the Microsoft Common Runtime (CRT). So what is actually changing when we enable this new long path name support? I’m not at all sure. None of this is adding up, which means my next task is to put together some real tests, run them, record the results, and formulate my own documentation.

Further down the rabbit hole…

The next thing I decide to do is read up on some specific LPN details in the PathCchCombine and PathCchCombineEx functions, hoping for more clues. These APIs were introduced in Windows 8, and are wide-char-only so — one would hope — they’re built from the ground up with LPN support. My hopes are dashed instantly: PathCchCombine specifically says it has the MAX_PATH limitation, and that to get past the limitation I must use PathCchCombineEx. I repeat: a brand spanking new API introduced for Windows 8 is apparently still stuck on MAX_PATH, at least according to the MSDN docs.

I add that last bit because I didn’t actually verify behavior of either function. Why bother? I already knew from the old days of Windows 2000 programming that that PathCombineW handles LPNs perfectly fine. I re-confirmed it yesterday.

And then we get to the flags section of PathCchCombineEx, which its own special brand of complexity:

pathcch-omg

So we have a flag to opt into allowing LPNs — but it’s overridden by the registry setting described earlier — and then flags that override the registry setting (added in 1703). Got all that? Finally, my favorite:

PATHCCH_DO_NOT_NORMALIZE_SEGMENTS

Disables the normalization of path segments that includes removing trailing dots and spaces. This enables access to paths that win32 path normalization will block.

This one I love, because it’s actually a flag meant to disable legacy/buggy behavior of ancient Windows APIs that were designed with the original FAT filesystem’s 8.3 filename limitation in mind. The correct behavior of any system-provided path library should have been to augment filenames in a manner consistent with the user’s selected file system. This is how every other operating system operates — where if you use a new advanced file system, the system libraries actually allow you to, you know, use its features. But not on Windows! Not even on an API added for Windows 8 OS. And what is this about removal of trailing spaces? That’s technically a bug by all accounts. There’s no reason trailing spaces should have ever been automatically stripped by a path concatenation library, especially not without explicitly documenting it in the description or remarks. Keeping that kind of behavior around just because someone made the mistake of adding it 25 yrs ago is bad engineering.

Filtering it all down

One of the challenges of working well with Microsoft APIs and the MSDN is being able to filter the good APIs from the bad ones. The bad APIs, in many cases, are hacks and workarounds that Microsoft engineers probably only intended for use internally, but have to be disclosed publicly to avoid anti-trust lawsuits or other legal annoyances. My only guess is these flags were added as quick hacks to allow working around some specific LPN problems they discovered in the the new Ubuntu Linux feature launched with Windows 10 build 1703. They aren’t really meant for use by anyone else.

Finally, PathCombineW also has the problem where it strips trailing dots and whitespace, so don’t use it either. The best conclusion here is don’t use Microsoft-provided Path tools. Writing fully-functional LPN-friendly path concatenators and path normalizers in modern C++ is much easier than trying to grok these docs and verify windows versions and break compat with Windows Vista/7 and, possibly, end up not supporting LPNs correctly anyway in the process.

Red Marker: PERL docs

I’ve gone on record a few times that I wish I could go all red marker on poorly written online programming documentation — and by that I mean take a red marker and cross out all the unnecessary superlatives and circle all the vague statements and write big red question marks over them. And now that I have a blog, I have my chance. This is the first post of what might likely be a series.

Normally my Red Marker ire is directed towards the npm/node.js community, which on whole has the least value-per-word of any technical specs or docs. But today my focus is a PERLdoc page that graced my browser screen during some esoteric discussion of historical and current trends in variable scoping.

[disclaimer: at the time of this writing it’s described as Perl 5 version 26.1 documentation]

The link: PERL – Private-Variables-via-my()

The my is simply a modifier on something you might assign to.

Simply? What is the value of that word? Using it in this context does not alter the reality that it’s not actually simple. The length of this section of doc (several pages) is all the evidence that needs to be submitted on that account.

A my has both a compile-time and a run-time effect. At compile time, the compiler takes notice of it.

Well, thank the gods! Because if the compiler didn’t take notice of it, it wouldn’t really be a compile-time effect. That entire sentence belongs in /dev/null.

The principal usefulness of this is to quiet use strict ‘vars’ , but it is also essential for generation of closures as detailed in perlref.

Wait. How does a warning suppression take principle priority over an essential feature such as closure generation? It’s almost as if the actual behavior of the program is secondary to the build jargon it spits out when it’s compiled. Make all the jargon go away and surely our program will run better for it!

Unlike dynamic variables created by the local operator, lexical variables declared with my are totally hidden from the outside world,including any called subroutines.

Outside world? What does that mean? Does perl define the meaning of the words outside and world in the context of the language? I guess some people might find the occasional ‘friendly folksy’ vocabulary thing comforting. As an engineer, I’m left wondering what in the hell it means specifically. Because it’s actually important knowing if and when some variable is going behave differently than I would otherwise expect.

… and that was the extent of the PERL docs that I sampled.

StringCchVPrintf, vsnprintf, and WideChars — How did this get so complicated?

To start off, here is unequivocally the best way to format strings in C++11:

std::string xStringFormat(const char* fmt, ...) {
    if (!fmt || !fmt[0]) return {};

    std::string result;
    va_list list,copy;
    va_start(list, fmt);
    va_copy(copy,list);
    // calculate the length of the result string
    auto amt = vsnprintf(nullptr, 0, fmt, copy);
    va_end(copy);

    if (amt > 0) {
        result.resize(amt);
        vsnprintf(const_cast(result.c_str()), amt, fmt, list);
    }
    va_end(list);
    return result;
}

… and as of C++17, you can replace the const cast typecast with result.data().

What if you’re writing a native windows-only application — something that’s so platform specific that you figure you might as well use Windows’ dreaded WCHAR(16-bit wide char), also known as “One of the greatest architectural mistakes in computing history” then things get… a little odd. First step is to convert vsnprintf to _vsnwprintf, right? Not quite.

Thar be Dragons!

Like many ANSI C libraries, Microsoft’s original implementation of vsnprintf came out before the API was set into standard. It returned -1 if the provided buffer wasn’t big enough to accommodate the formatted string. Sometime after the C99 standard, Microsoft updated vsnprintf so that it could be used to get the length of a buffer — but only vsnprintf. All the other functions, such as both _vsnprintf and _vsnwprintf, retain the legacy behavior of returning -1.

Just to add to the fun of this great party, there’s also _vsnprintf_s(), which is extra-special complicated in a way that I can only imagine leading to more software bugs rather than less. There’s an extra buffer count that’s not entirely clear in it’s function, which has a special overloaded integer value _TRUNCATE that changes the function’s behavior significantly, and some other mess that I can’t imagine a single possible sane use case for. The only reason anyone uses it is because if you don’t use it, you’ll get this familiar warning message:

1>main.cpp(280): warning C4996: ‘_vsnwprintf’: This function or variable may be unsafe. Consider using _vsnwprintf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.

The correct response is to define _CRT_SECURE_NO_WARNINGS. Seriously, it’s like the second step I take any time I create a new C++ project in Visual Studio. Not a single one of those warnings is useful, and there are plenty of ways to write insecure code even with the “secure” functions.

So after that detour…

… we are still looking for a way to figure out how long the formatted string is so that we can allocate sufficient space to store it. The answer lies in a custom API that Microsoft added by the name _vscwprintf(). It does what we need, and is available in the wide-char variety. This is not to be confused with _vcwprintf()which is a function for printing text directly to the … “console.” Note that on Windows, the console is not tied to stdout or any other pipe or stream or anything remotely sensible. It’s a windows-specific construct; a special type of text-output dialog box that can be opened manually by a windowed application and which has all manner of special rules that could amount to a blog post all to itself. It’s also broadly depreciated, so let’s not even bother. 🙂

The wide-char version of the printf function ends up looking like this:

std::wstring xStringFormat(const WCHAR* fmt, ...)
{
    va_list list;
    va_list copy;
    va_start(list, fmt);
    va_copy(copy,list);
    auto amt = _vscwprintf(fmt, copy);
    std::wstring result;
    if (amt > 0) {
        result.resize(amt);
        _vsnwprintf(result.data(), amt, fmt, list);
    }
    va_end(copy);
    va_end(list);
    return result;
}

The most fascinating thing of all of this though is my discovery of the StringCchVPrintf()function and it’s suite of StrSafe cousins, none of which I’ve ever seen used in application code in nearly 20 yrs since they were introduced. These are Windows built-in DLL-provided functions. They’re “safe” for the same reason vsnprintf is safe: they force you to provide a count of characters (Cch) for all operations. They also have no equivalent function for _vscwprintf(). There is no means by which you can calculate the required buffer space for a printf operation. None. And just in case you thought it would be clever: it’s a really bad idea to call _vscwprintf() and then use StringCchVPrintf()to format the string, since the two functions may interpret certain printf formatting parameters differently.

And then there’s StringCchVPrintfEx(), which supports back filling unused portions of buffers, auto-nullifying strings rather than truncating, etc. Again, all provided in a manner that’s as likely to lead to bugs — thanks to terse all-caps flags constants — as fix them.

In conclusion, don’t use StringCchVPrintf()for any reason except the remarkably narrow use-case that you’re writing some specific low-level Windows driver code and it’s somehow really important to avoid statically linking any ANSI C libraries and/or avoid using any heap allocators. Aside from possibly driver authoring, I really can’t think why they exist.

Where vs. Which vs. the Common Use Case

Many CMD scripts use where.exe to determine the availability of an executable — typically something like python or 7zip. Several GIT BASH scripts on windows do this as well, including a few of my own. The typical BASH snippet looks something like this:

if ! where python; then
    >&2 echo "Error: python not found in your path!"
    exit 1
fi

I do this lazily on windows platforms as I’m usually not interested in cross-compatibility to Linux. This is an important distinction since where.exe is not part of the CoreUtils/Bash suite. The cross-platform method is to use which, and on the surface it works exactly the same way:

if ! which python; then
    >&2 echo "Error: python not found in your path!"
    exit 1
fi

Then I got to wondering, do they actually behave the same way? And of course, the answer is no. where.exe as provided by windows is meant to be a generic file locator tool, more similar to unix locate than unix which. But where.exe still does $PATHEXT extension guessing — and likewise the most common use case is to find executables in the user’s $PATH. This leads to an interesting edge-case as illustrated here:

CMD.EXE using WHERE:

C:\Users\jstine\source>copy con where
 yeah
 ^Z
 1 file(s) copied.

C:\Users\jstine\source>where where
C:\Users\jstine\source\where
C:\Windows\System32\where.exe

BASH using WHICH:

jstine@JSTINE MINGW64 /c/Users/jstine/source
$ echo "woo" > which

jstine@JSTINE MINGW64 /c/Users/jstine/source
$ which which
/usr/bin/which

where returns its first result as the dummy file I created — which is not remotely executable. which ignores the file, because it is only interested in executables. If I were to create a dummy file named python anywhere in my path, then where python would return success, even though there isn’t actually an executable instance of python installed. Granted, the changes of this actually biting anyone in a way that matters is slim at best. Jot this one down as “for the curious.”

Conclusion: when using where.exe from the context of a CMD script, one should always specify a fully-qualified executable filename whenever possible, eg. python.exe or 7zip.exe. But given that CoreUtils versions of CLI tools are almost universally superior to Windows ones, I present a second better conclusion:

Don’t use CMD — Install Git for Windows and use bash and associated CoreUtils for as much as possible.