Skip to content

Replace WinRT clipboard API with Win32 for pasting#15360

Merged
microsoft-github-policy-service[bot] merged 6 commits intomainfrom
dev/lhecker/win32-clipboard
Sep 21, 2023
Merged

Replace WinRT clipboard API with Win32 for pasting#15360
microsoft-github-policy-service[bot] merged 6 commits intomainfrom
dev/lhecker/win32-clipboard

Conversation

@lhecker
Copy link
Member

@lhecker lhecker commented May 15, 2023

The Win32 API is significantly faster than the WinRT one, in the order
of around 300-1000x depending on the CPU and CPU load.

This might slightly improve the situation around #15315, but I suspect
that it requires many more fixes. For instance, we don't really have a
single text input "queue" into which we write. Multiple routines that
resume_background just to WriteFile into the input pipe are thus
racing against each other, contributing to the laggy feeling.
I also fear that the modern Windows text stack might be inherently
RPC based too, producing worse lag with rising CPU load.

This might fix #14323

Validation Steps Performed

  • Paste text from Edge ✅
  • Paste text from Notepad ✅
  • Right click the address bar in Explorer, choose "Copy address",
    paste text into WT ✅

@microsoft-github-policy-service microsoft-github-policy-service bot added Issue-Bug It either shouldn't be doing this or needs an investigation. Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Priority-3 A description (P3) labels May 15, 2023
@lhecker lhecker force-pushed the dev/lhecker/win32-clipboard branch from 25c42f8 to ac6abff Compare May 15, 2023 22:53
@github-actions

This comment has been minimized.

@lhecker lhecker force-pushed the dev/lhecker/win32-clipboard branch from ac6abff to 515d3d3 Compare May 15, 2023 22:57
@github-actions

This comment has been minimized.

@lhecker lhecker force-pushed the dev/lhecker/win32-clipboard branch from 515d3d3 to 34caf03 Compare May 15, 2023 23:00
Copy link
Contributor

@metathinker metathinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, former developer-maintainer of the Windows clipboard implementation here - sorry for the hit and run review, but when I see clipboard, I'm always a bit intrigued :)

Comment on lines +2545 to +2547
// The old Win32 clipboard API as used below is somewhere in the order of 1000x faster than the WinRT one,
// which has about the same latency as a fibre optics connection across the entire WA state.
// Don't use the WinRT clipboard API. RPC calls are not funny.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I know you said you tested this, but I'm still surprised. Is calling UWP Clipboard.GetContent() or DataPackageView.GetWhateverAsync() really that slow? I'd be a bit interested to see how you tested and measured this.

My memory is rusty but my belief is that UWP DataTransfer for clipboard shouldn't be a cross-process RPC call in all cases - just ones where the source data is hosted in an app that clipped it using the UWP or OLE clipboard APIs, forcing a cross-process COM call (IDataObject::Whatever()). (As opposed to when the source item came from the Win32 clipboard; in that case, UWP clipboard impl. in the destination should "just" do the win32k system call.)

Copy link
Member Author

@lhecker lhecker May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it using some code that I have already deleted. To make the new comparison as realistic as possible, I've chosen to implement the benchmark inside Windows Terminal itself and modified like so:

fire_and_forget TerminalPage::_PasteFromClipboardHandler(const IInspectable /*sender*/, const PasteFromClipboardEventArgs eventArgs)
try
{
    const auto beg = std::chrono::high_resolution_clock::now();
    winrt::hstring text;

    for (int i = 0; i < 1000; i++)
    {
        const auto data = Clipboard::GetContent();

        if (data.Contains(StandardDataFormats::Text()))
        {
            text = co_await data.GetTextAsync();
        }
        // Windows Explorer's "Copy address" menu item stores a StorageItem in the clipboard, and no text.
        else if (data.Contains(StandardDataFormats::StorageItems()))
        {
            auto items = co_await data.GetStorageItemsAsync();
            if (items.Size() > 0)
            {
                auto item = items.GetAt(0);
                text = item.Path();
            }
        }
    }

    const auto end = std::chrono::high_resolution_clock::now();
    const auto dur = std::chrono::duration<float, std::milli>(end - beg).count() / 1000.0f;
    wchar_t buffer[64];
    swprintf_s(buffer, L"%f\n", dur);
    OutputDebugStringW(&buffer[0]);

    // ... remaining contents of the function ...
}
CATCH_LOG();

Pasting text from X takes Y ms per iteration:

  • Edge (Win32 clipboard API): 2.5ms
  • Notepad: 3ms
  • Explorer: 2.4ms
  • The same instance of Windows Terminal (WinRT API): 2.1ms
  • A different instance of Windows Terminal: also 2.1ms
  • My bodgy clipboard code: 2.7ms

It seems there's no difference in latency for us between whether an application uses the WinRT or Win32 API for setting the clipboard contents. This made me curious whether the air got to my head once again and if I got everything, I said above wrong. So, I've tried to recreate my initial test code:

#define NOMINMAX
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>

// NOTE: This code relies on the latest cppwinrt version. To make matters simpler I've simply
// copy-pasted the winrt headers from terminal's "Generated Files" next to the current file.
#include "winrt/Windows.ApplicationModel.DataTransfer.h"
#include "winrt/Windows.Foundation.h"

int main() {
    winrt::init_apartment(winrt::apartment_type::single_threaded);

    []() -> winrt::fire_and_forget {
        const auto beg = std::chrono::high_resolution_clock::now();
        winrt::hstring text;

        for (int i = 0; i < 1000; i++) {
            const auto data = winrt::Windows::ApplicationModel::DataTransfer::Clipboard::GetContent();
            text = co_await data.GetTextAsync();
        }

        const auto end = std::chrono::high_resolution_clock::now();
        const auto dur = std::chrono::duration<float, std::milli>(end - beg).count() / 1000.0f;
        wprintf(L"%f\n", dur);
        PostQuitMessage(0);
    }();

    MSG msg;
    while (GetMessageW(&msg, nullptr, 0, 0)) {
        DispatchMessageW(&msg);
    }

    return 0;
}

Using it I get approximately the same latency as I get with Windows Terminal's code, which makes me believe that it's most likely correct, even if I can't immediately prove it. (If it's incorrect, please let me know!)

I did this because this allows me to run it in NVIDIA Nsight Systems to observe cross process behavior. Here's how it looks like when pasting from Edge into this application:
image

With that as an orientation, here's how one call (2.7ms) looks like:
image

You're right! There's no COM call in GetTextAsync when the clipboard content comes from Edge. GetContent however does involve 1 CoCreateInstance and 3 COM calls. I've blindly assumed that these are 1:1 directly related to the WinRT call that I'm doing. I'm genuinely sorry for this.

When the last instance of the CDataPackageView is destroyed it'll drop the CNonAgileOleRefTaker, which will end up calling OleUninitialize. This is probably not a good idea since this will also destroy the cached window it uses to call OpenClipboard. That's why every time you (or rather I, in my example project) call Clipboard::GetContent it'll recreate the window, which is very expensive overall (about half the entire CPU cost, the rest being mostly scheduling latency). Given that, I expect that the cost of the API would be significantly reduced if it wasn't calling OleUninitialize. And indeed, if I add a:

if (OleInitialize(nullptr) != S_OK) {
    abort();
}

to the start of my test application, the cost of getting the clipboard contents from Edge is consistently reduced down to about 1.8ms. It is however still in stark contrast to the Win32 API which takes roughly 0.0066ms on average on my system. I didn't run a ETW trace where the remaining cost lies. But if there's any more interest in this topic, I'd be happy to trace it.

I do realize that it could be argued that the Win32 API has an unfair advantage since it doesn't need to do any async thread switching, the way the COM code needs to do it. As such I've made the following test based on this branch:

fire_and_forget TerminalPage::_PasteFromClipboardHandler(const IInspectable /*sender*/, const PasteFromClipboardEventArgs eventArgs)
try
{
    const auto beg = std::chrono::high_resolution_clock::now();
    winrt::hstring text;

    for (int i = 0; i < 1000; i++) // <--- added
    {
        co_await winrt::resume_background(); // <--- added
        {
            // ... the win32 clipboard code from this PR ...
        }
        co_await wil::resume_foreground(Dispatcher()); // <--- added
    }

    const auto end = std::chrono::high_resolution_clock::now();
    const auto dur = std::chrono::duration<float, std::milli>(end - beg).count() / 1000.0f;
    wchar_t buffer[64];
    swprintf_s(buffer, L"%f\n", dur);
    OutputDebugStringW(&buffer[0]);

    // ... remaining contents of the function from this PR ...
}
CATCH_LOG();

This increases the cost to roughly 0.15ms per iteration on my system.
(We technically don't need to resume_foreground however so the actual cost is 0.013ms.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those benchmarks seem complete enough, I'll accept them, and you can close this thread out. :)

I raised this point because I don't like to see people go back to the win32 clipboard API from UWP clipboard API. Win32 clipboard is really much harder to use correctly; even in a simple case like this you had to build a OpenClipboard busy wait loop which isn't even guaranteed to succeed.

But clearly the speed up is significant enough that it's worth making this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the other discussion about ClipboardServer.dll - I might be wrong or out of date to say that it doesn't get used for basic clipboard functions. But my guess would be that its usage when you call UWP clipboard on desktop is for 1 of 2 reasons:

  1. UWP app processes in a AppContainer use a broker to write to clipboard to filter out attempts to hijack higher-privileged processes with corrupt clipboard data. I thought that function wasn't built into ClipboardServer.dll, but maybe it is, or has become so. Within Windows Terminal, the UWP clipboard code in the caller process should be smart enough to see that it's not running from an AppContainer process and make the calls directly, but maybe it's not or there are other constraints.
  2. On desktop, ClipboardServer is used to enforce clipboard enterprise data watermarking and restriction (Windows Information Protection/Enterprise Data Protection). This protection should apply to all clipboard APIs but I don't know how or when it's enforced for usage of each API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this point because I don't like to see people go back to the win32 clipboard API from UWP clipboard API. Win32 clipboard is really much harder to use correctly; even in a simple case like this you had to build a OpenClipboard busy wait loop which isn't even guaranteed to succeed.

Yeah, that's a totally fair point and I wouldn't mind sticking with the old code. In fact, the new API is honestly way nicer and easier to use. It's genuinely just not fast enough and I think this ever so slightly contributes to the growing computational inefficiency affecting Windows.

If there's an API that's a little bit harder to use, but multiple magnitudes more power efficient 1, I think we not just should use it, but almost have a moral obligation to use it. Not just because we engineers are fickler when it comes to performance than product owners might hope 2, but more importantly because the rising layers of abstraction and corresponding computational latency are unsustainable long-term. 3

In other words, I want Windows to be fast and power efficient, where everything you do is as good as it gets to the fullest extent of the underlying hardware and what we MS engineers are capable of delivering and it's what I'm trying to work towards.

Footnotes

  1. The code I've ended up with only needs to do a single resume_background. This ends up with a latency of just 13us per iteration on my system, or roughly 200x more efficient.

  2. Everyone immediately switched to Chrome when it came out, because it was much leaner than IE, same for VS Code and Atom. Why would this be any different for Windows and Proton/Wine one day?

  3. If the WinRT API had replaced the Win32 one, so that the latter is just a thin shim, I'd totally get it, but this way? When will we get rid of the original code?

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented May 16, 2023

If the clipboard owner hangs when Terminal is attempting to paste to a session, how much of Terminal's UI will that block? Just the one pane, the entire window, or all windows? Does Windows enforce a timeout?

With conhost, it didn't matter that much because there was only one session per window anyway.

@lhecker lhecker changed the title Replace WinRT clipboard API with Win32 Replace WinRT clipboard API with Win32 for pasting May 16, 2023
@zadjii-msft
Copy link
Member

I'm inclined to say that this is okay, for one very major reason:

void Clipboard::Paste()
{
HANDLE ClipboardDataHandle;
// Clear any selection or scrolling that may be active.
Selection::Instance().ClearSelection();
Scrolling::s_ClearScroll();
// Get paste data from clipboard
if (!OpenClipboard(ServiceLocator::LocateConsoleWindow()->GetWindowHandle()))
{
return;
}
ClipboardDataHandle = GetClipboardData(CF_UNICODETEXT);
if (ClipboardDataHandle == nullptr)
{
CloseClipboard();
return;
}
auto pwstr = (PWCHAR)GlobalLock(ClipboardDataHandle);
StringPaste(pwstr, (ULONG)GlobalSize(ClipboardDataHandle) / sizeof(WCHAR));
// WIP auditing if user is enrolled
static auto DestinationName = _LoadString(ID_CONSOLE_WIP_DESTINATIONNAME);
Microsoft::Console::Internal::EdpPolicy::AuditClipboard(DestinationName);
GlobalUnlock(ClipboardDataHandle);
CloseClipboard();
}

Conhost has used the win32 APIs for decades.

I am curious about the "what if the source hangs while we're reading it" question - maybe there's a space for us to wrap this up as an AsyncAction that we can cancel if it doesn't return. That's probably not as expensive as all the other COM calls. Honestly though I'm not super familiar with the edge cases with the win32 API so I'm not sure what we have to be afraid of.

What would this lose? There's gotta be some reason to use the modern APIs.... right? right?

@lhecker
Copy link
Member Author

lhecker commented May 19, 2023

I am curious about the "what if the source hangs while we're reading it" question - maybe there's a space for us to wrap this up as an AsyncAction that we can cancel if it doesn't return.

I recently changed this PR to yield to the background thread and untagged #15315 as being fixed, so it won't hang the UI anymore. We can't use an AsyncAction, because there's no way to cancel reading from the clipboard. It'll simply time out after 30s (as guaranteed by the win32 API). To us this doesn't really matter, because WinRT calls OLE and OLE calls the Win32 API which also hangs for 30s. It's a matryoshka.

That's probably not as expensive as all the other COM calls.

Not even remotely. Even with yielding to a background thread first the total latency is still 20000% faster than the WinRT API (using percentages like that makes it look a bit funnier).

Honestly though I'm not super familiar with the edge cases with the win32 API so I'm not sure what we have to be afraid of.

The worst that can happen is that it works as bad as the conhost implementation, but the only major complaint we've ever gotten there is that pasting breaks Unicode, but that has always been our own fault (TextToKeyEvents being UCS-2).

@github-actions

This comment has been minimized.

}

attempts++;
Sleep(10 * attempts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that we should wait before trying again, but why are we doing 10*attempts here?

Copy link
Member Author

@lhecker lhecker Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual approach is to do exponential backoffs: Every iteration you increase the backoff duration by like 2x for instance. I felt like doing just a linear backoff would work well enough for our primitive use case. After all, conhost uses the same API and doesn't use any retrying at all.

@carlos-zamora carlos-zamora added the Needs-Second It's a PR that needs another sign-off label Jun 9, 2023
@microsoft-github-policy-service microsoft-github-policy-service bot added Area-Input Related to input processing (key presses, mouse, etc.) Product-Terminal The new Windows Terminal. labels Jun 9, 2023
@lhecker
Copy link
Member Author

lhecker commented Sep 19, 2023

@zadjii-msft @DHowett It'd be great if we could still get this into 1.19. We've been testing it for quite a bit already after all. It's missing just 1 more approval.

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure let's give it a shot

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs-Second It's a PR that needs another sign-off label Sep 20, 2023
@zadjii-msft
Copy link
Member

This one is from before the CI rewrite so it might need to be manually merged...

@lhecker lhecker added the AutoMerge Marked for automatic merge by the bot when requirements are met label Sep 21, 2023
Comment on lines +2685 to +2693
const auto str = static_cast<const wchar_t*>(GlobalLock(data));
if (!str)
{
const auto isNewLineLambda = [](auto c) { return c == L'\n' || c == L'\r'; };
const auto hasNewLine = std::find_if(text.cbegin(), text.cend(), isNewLineLambda) != text.cend();
warnMultiLine = hasNewLine;
co_return;
}

constexpr const std::size_t minimumSizeForWarning = 1024 * 5; // 5 KiB
const auto warnLargeText = text.size() > minimumSizeForWarning &&
_settings.GlobalSettings().WarnAboutLargePaste();
const auto dataCleanup = wil::scope_exit([&]() {
GlobalUnlock(data);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically there's wil::unique_hglobal_locked, which takes an HGLOBAL and unlocks it later! On its own!

@microsoft-github-policy-service microsoft-github-policy-service bot deleted the dev/lhecker/win32-clipboard branch September 21, 2023 21:55
github-merge-queue bot pushed a commit that referenced this pull request Apr 10, 2024
In the spirit of #15360 this implements the copy part.
The problem is that we have an issue accessing the clipboard while
other applications continue to work just fine. The major difference
between us and the others is that we use the WinRT clipboard APIs.
So, the idea is that we just use the Win32 APIs instead.

The feel-good side-effect is that this is (no joke) 200-1000x faster,
but I suspect no one will notice the -3ms difference down to <0.01ms.

The objective effect however is that it just works.

This may resolve #16982.

## Validation Steps Performed
* Cycle through Text/HTML/RTF-only in the Interaction settings
* Paste the contents into Word each time
* Text is plain and HTML/RTF are colored ✅
DHowett pushed a commit that referenced this pull request Apr 22, 2024
In the spirit of #15360 this implements the copy part.
The problem is that we have an issue accessing the clipboard while
other applications continue to work just fine. The major difference
between us and the others is that we use the WinRT clipboard APIs.
So, the idea is that we just use the Win32 APIs instead.

The feel-good side-effect is that this is (no joke) 200-1000x faster,
but I suspect no one will notice the -3ms difference down to <0.01ms.

The objective effect however is that it just works.

This may resolve #16982.

* Cycle through Text/HTML/RTF-only in the Interaction settings
* Paste the contents into Word each time
* Text is plain and HTML/RTF are colored ✅

(cherry picked from commit 5f3a857)
Service-Card-Id: 92308708
Service-Version: 1.20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Input Related to input processing (key presses, mouse, etc.) Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-3 A description (P3) Product-Terminal The new Windows Terminal. zBugBash-Consider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Paste from clipboard doesn't always work

7 participants