Greatly reduce allocations in the conhost/OpenConsole startup path by Austin-Lamb · Pull Request #8489 · microsoft/terminal

Austin-Lamb · 2020-12-03T20:53:40Z

I was looking at conhost/OpenConsole and noticed it was being pretty
inefficient with allocations due to some usages of std::deque and
std::vector that didn't need to be done quite that way.

So this uses std::vector for the TextBuffer's storage of ROW objects,
which allows one allocation to contiguously reserve space for all the
ROWs - on Desktop this is 9001 ROW objects which means it saves 9000
allocations that the std::deque would have done. Plus it has the
benefit of increasing locality of the ROW objects since deque is going
to chase pointers more often with its data structure.

Then, within each ROW there are CharRow and ATTR_ROW objects that use
std::vector today. This changes them to use Boost's small_vector, which
is a variation of vector that allows for the so-called "small string
optimization." Since we know the typical size of these vectors, we can
pre-reserve the right number of elements directly in the
CharRow/ATTR_ROW instances, avoiding any heap allocations at all for
constructing these objects.

There are a ton of variations on this "small_vector" concept out there
in the world - this one in Boost, LLVM has one called SmallVector,
Electronic Arts' STL has a small_vector, Facebook's folly library has
one...there are a silly number of these out there. But Boost seems like
it's by far the easiest to consume in terms of integration into this
repo, the CI/CD pipeline, licensing, and stuff like that, so I went with
the boost version.

In terms of numbers, I measured the startup path of OpenConsole.exe on
my dev box for Release x64 configuration. My box is an i7-6700k @ 4
Ghz, with 32 GB RAM, not that I think machine config matters much here:

	Allocation count	Allocated bytes	CPU usage (ms)
Before	29,461	4,984,640	103
After	2,459 (-91%)	4,853,931 (-2.6%)	96 (-7%)

Along the way, I also fixed a dynamic initializer I happened to spot in
the registry code, and updated some docs.

Validation Steps Performed

Ran "runut", "runft" and "runuia" locally and confirmed results are
the same as the main branch
Profiled the before/after numbers in the Visual Studio profiler, for
the numbers shown in the table

…the bigger vector change. Also fixing a silly dynamic initializer issue in RegistrySerialization while I was here.

…TODO for now). Optimizing sizes of TextAttributeRun, ROW, and ATTR_ROW more with tighter packing.

…that.

miniksa · 2020-12-03T21:09:02Z

Why did so many size_ts change to unsigned int?

src/buffer/out/TextAttributeRun.hpp

src/buffer/out/CharRowCell.hpp

Austin-Lamb · 2020-12-03T21:23:06Z

Why did so many size_ts change to unsigned int?

It just seemed unnecessary to allow 64 bits of size for these things - we should never have greater than 4 billion characters on a row, or 4 billion rows in a buffer, right? So shrinking this means a little less data needs to flow around, and leaves more space for future state.

Originally I had hoped it'd improve data packing, but it didn't end up working out. I just left it in the change since it still seemed like a good thing for 'leaving space for the future'

zadjii-msft · 2020-12-03T21:23:46Z

Should we maybe make this two PRs - one to bring in boost, the other to do the optimizations?

DHowett · 2020-12-03T21:25:02Z

Since this is the 5th time we've had cause to bring in boost, and since other projects inside the Windows repository have transcluded boost, I think I'm comfortable bringing it in. Doing that in a separate PR would make this a lot easier to review, for sure. 😄

github-actions · 2020-12-03T21:29:14Z

New misspellings found, please review:

boostorg

To accept these changes, run the following commands

perl -e '
my @expect_files=qw('".github/actions/spell-check/expect/5757ec679b03a4240130c3c53766c91bbc5cd6a7.txt
.github/actions/spell-check/expect/655f007265b351e140d20b3976792523ad689241.txt
.github/actions/spell-check/expect/alphabet.txt
.github/actions/spell-check/expect/expect.txt
.github/actions/spell-check/expect/web.txt"');
@ARGV=@expect_files;
my @stale=qw('"AAAAA Bopomofo CParams CSV GENERATEPROJECTPRIFILE hhhh IHosted Inlines MAKEINTRESOURCEA mousemode renamer
 Reserialize rgus SGRXY tcon UDK UDKs Unfocus xe xlang "');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
  if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
  next if /^($re)(?:$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spell-check/expect/7120a9d5365341bcba453719f9ecc40bcaf3bde0.txt";
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"boostorg csv inlines Renamer reserialize udk unfocus "');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a) cmp lc($b)} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;'
git add .github/actions/spell-check/expect || echo '... you want to ensure .github/actions/spell-check/expect/7120a9d5365341bcba453719f9ecc40bcaf3bde0.txt is added to your repository...'

✏️ Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

⚠️ The command is written for posix shells. You can copy the contents of each perl command excluding the outer ' marks and dropping any '"/"' quotation mark pairs into a file and then run perl file.pl from the root of the repository to run the code. Alternatively, you can manually insert the items...

If the listed items are:

... misspelled, then please correct them instead of using the command.
... names, please add them to .github/actions/spell-check/dictionary/names.txt.
... APIs, you can add them to a file in .github/actions/spell-check/dictionary/.
... just things you're using, please add them to an appropriate file in .github/actions/spell-check/expect/.
... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spell-check/patterns/.

See the README.md in each directory for more information.

🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The :check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉

⚠️ Reviewers

At present, the action that triggered this message will not show its ❌ in this PR unless the branch is within this repository.
Thus, you should make sure that this comment has been addressed before encouraging the merge bot to merge this PR.

Austin-Lamb · 2020-12-03T21:34:47Z

Should we maybe make this two PRs - one to bring in boost, the other to do the optimizations?

I think that'd be a great way to do it. This PR looks like it's enormous (see @miniksa's comment about 400-file PR), but the vast majority of it is just the subset of Boost being used. Should I do this split? @DHowett any preference?

miniksa · 2020-12-03T21:36:13Z

Why did so many size_ts change to unsigned int?

It just seemed unnecessary to allow 64 bits of size for these things - we should never have greater than 4 billion characters on a row, or 4 billion rows in a buffer, right? So shrinking this means a little less data needs to flow around, and leaves more space for future state.

Originally I had hoped it'd improve data packing, but it didn't end up working out. I just left it in the change since it still seemed like a good thing for 'leaving space for the future'

Oh it's probably quite unnecessary to be that big... I think it was chosen because it was at a time before we had gsl::narrow<T> and so it was size_t to avoid all the casts against the .size() from STL. Also I might have thought on a 64-bit processor, it's gonna be in a 64-bit register anyway or aligned at 64-bits. And also probably I was going "well it used to be SHORT and that became a problem... let's make sure that we never have that problem ever again."

The only place I feel like we'd cross 4 billion rows is if we ever had "infinite scrollback" feature light up. You're right on >4billion columns probably being ridiculous.

Should we maybe make this two PRs - one to bring in boost, the other to do the optimizations?

I think that'd be a great way to do it. This PR looks like it's enormous (see @miniksa's comment about 400-file PR), but the vast majority of it is just the subset of Boost being used. Should I do this split? @DHowett any preference?

I'd prefer if you did "just bring in boost and use it" in 1 PR and then the other things you found in a 2nd PR.

skyline75489 · 2020-12-03T23:29:37Z

Well with boost do we still need a separate dynamic_bitset library? I suppose we should just use boost::dynamic_bitset?

And maybe other utils & libraries used could also be replaced by boost version.

WSLUser · 2020-12-04T14:42:33Z

Well with boost do we still need a separate dynamic_bitset library? I suppose we should just use boost::dynamic_bitset

I remember this discussion when I brought up using Boost before. We're only pulling in small_vector.hpp and it's dep friends. That is large enough on it's own. Considering how much smaller the current dynamic_bitset library is and it's still obvious usefulness, we should leave that be. Boost is pretty big and has been proven to not always be the best at optimizations. It just was one of the first projects to actually attempt to bring in optimizations like we're seeing here. I'm in favor of just accepting this header library for now and if we can show other places in the code that would benefit more from a particular Boost library, then we bring it in.

In my #8489 we want to use boost's small_vector type, but that PR is kinda messy by adding boost and also making a meaningful change. So here I'm splitting out the boost addition to its own PR so that one can be more focused on the allocation improvement and consumption of boost.

…ned int changes, in most places it's not buying anything. Leaving it in TextAttributeRun since it actualyl does save space there on 64-bit.

src/buffer/out/AttrRow.hpp

src/buffer/out/TextAttributeRun.hpp

src/buffer/out/textBuffer.cpp

src/common.build.pre.props

vadimkantorov · 2020-12-11T22:08:26Z

I agree, 6Kb is not much. It's just I experienced very bad Terminal behavior in low-memory conditions (i.e. Chrome ate all memory of 8Gb old-ish laptop). I think it's important that basic tools like Terminal continue to be operational (by maybe blocking Windows to unload its pages from memory?) even in these conditions (+ terminal is likely to be used for debugging these problems). cmd.exe was way better in that respect. I think it's because of working set size and its dependency on dozens of big DLLs for modern UI components and such. My investigation is in #6409

DHowett · 2020-12-11T22:07:17Z

src/buffer/out/CharRowCell.hpp

+    wchar_t _wch{ UNICODE_SPACE };
+    DbcsAttribute _attr{};


these two became public -- intended?

This is leftover from trying to do more POD types - I'll make these private again, thanks for catching.

src/buffer/out/AttrRow.hpp

src/buffer/out/AttrRow.cpp

DHowett · 2020-12-11T22:10:36Z

@vadimkantorov fortunately, the changes here impact both conhost and terminal.

conhost is much easier to launch because it's likely already paged in. There's almost always at least one already running at all times.

vadimkantorov · 2020-12-11T22:23:52Z

conhost is much easier to launch because it's likely already paged in

yep, probably... but it would be super-nice to have working set size, number of libraries, waiting for paging in etc collected as telemetry or simulated in the test lab.

DHowett · 2020-12-11T22:25:50Z

@vadimkantorov when built in Windows, conhost is automatically tested for COW usage, working set, disk I/O, loaded libraries, and power consumption. This might not match up with the community's impression of Windows as a whole, but we have a whole suite of automated tools that try to keep us from regressing performance on these metrics.

vadimkantorov · 2020-12-11T22:46:41Z

That's good news and matches what I had imagined! Unfortunately, my questions about whether this info is available in did not bring a good discussion in #6409 (maybe it would have been better for me to comment there to not take over this PR discussion). Hopefully, in future this perf measurements would become public, then the users can also observe progress in this space!

Austin-Lamb · 2020-12-12T01:40:06Z

@DHowett - I'm not sure why the checks/pipeline timed out, is it possible to schedule a re-run?

zadjii-msft

So, I'm cool with this, but I think that I'd feel more comfortable with @miniksa and @DHowett being the two signoffs (being the perf and C++ experts, respectively).

miniksa

I didn't see a follow on that we've confirmed Resize Traditional is still working. But if that passes manual test, I'm good to go here.

DHowett

Blocking mainly over the line endings. 😄 Thanks again! I'm really excited for this.

src/buffer/out/AttrRowIterator.cpp

src/buffer/out/AttrRowIterator.hpp

src/common.build.pre.props

… this PR.

DHowett · 2020-12-16T18:40:16Z

@Austin-Lamb Thanks again for doing this. It's excellent, and I'm hoping that we can keep the perf train running. 😄

DHowett · 2020-12-16T18:41:26Z

FWIW: I checked out your branch in wsl and used unix2dos to convert the line endings.

ghost · 2021-01-28T18:04:01Z

🎉Windows Terminal Preview v1.6.10272.0 has been released which incorporates this pull request.:tada:

Handy links:

…icrosoft#8489) I was looking at conhost/OpenConsole and noticed it was being pretty inefficient with allocations due to some usages of std::deque and std::vector that didn't need to be done quite that way. So this uses std::vector for the TextBuffer's storage of ROW objects, which allows one allocation to contiguously reserve space for all the ROWs - on Desktop this is 9001 ROW objects which means it saves 9000 allocations that the std::deque would have done. Plus it has the benefit of increasing locality of the ROW objects since deque is going to chase pointers more often with its data structure. Then, within each ROW there are CharRow and ATTR_ROW objects that use std::vector today. This changes them to use Boost's small_vector, which is a variation of vector that allows for the so-called "small string optimization." Since we know the typical size of these vectors, we can pre-reserve the right number of elements directly in the CharRow/ATTR_ROW instances, avoiding any heap allocations at all for constructing these objects. There are a ton of variations on this "small_vector" concept out there in the world - this one in Boost, LLVM has one called SmallVector, Electronic Arts' STL has a small_vector, Facebook's folly library has one...there are a silly number of these out there. But Boost seems like it's by far the easiest to consume in terms of integration into this repo, the CI/CD pipeline, licensing, and stuff like that, so I went with the boost version. In terms of numbers, I measured the startup path of OpenConsole.exe on my dev box for Release x64 configuration. My box is an i7-6700k @ 4 Ghz, with 32 GB RAM, not that I think machine config matters much here: | | Allocation count | Allocated bytes | CPU usage (ms) | | ------ | ------------------- | ------------------ | -------------- | | Before | 29,461 | 4,984,640 | 103 | | After | 2,459 (-91%) | 4,853,931 (-2.6%) | 96 (-7%) | Along the way, I also fixed a dynamic initializer I happened to spot in the registry code, and updated some docs. ## Validation Steps Performed - Ran "runut", "runft" and "runuia" locally and confirmed results are the same as the main branch - Profiled the before/after numbers in the Visual Studio profiler, for the numbers shown in the table Co-authored-by: Austin Lamb <austinl@microsoft.com>

Austin Lamb added 8 commits December 1, 2020 00:32

Small optimizations on CharRowCell and DbcsAttribute, setting up for …

baba63f

…the bigger vector change. Also fixing a silly dynamic initializer issue in RegistrySerialization while I was here.

TextBuffer using a vector - reduces lots of allocations (leaving one …

759f94f

…TODO for now). Optimizing sizes of TextAttributeRun, ROW, and ATTR_ROW more with tighter packing.

A little more tidying of ROW in prep for the big change.

9f8b690

minor stuff

e6d19b7

Using boost::container::small_vector to massively reduce allocations.

804b78a

DebugFull sure is nice

42413fe

Get tests passing.

143b425

Adding docs on how to debug a test since I couldn't find any docs on …

7120a9d

…that.

miniksa reviewed Dec 3, 2020

View reviewed changes

src/buffer/out/TextAttributeRun.hpp Show resolved Hide resolved

miniksa reviewed Dec 3, 2020

View reviewed changes

src/buffer/out/CharRowCell.hpp Show resolved Hide resolved

microsoft deleted a comment Dec 3, 2020

Austin-Lamb mentioned this pull request Dec 4, 2020

Add a subset of boost #8492

Merged

Austin Lamb added 4 commits December 6, 2020 06:38

Merge branch 'main' into user/austinl/optimizeAllocations

e17eb5c

Addressing some PR comments by starting to revert the size_t -> unsig…

624e1b6

…ned int changes, in most places it's not buying anything. Leaving it in TextAttributeRun since it actualyl does save space there on 64-bit.

More size_t -> unsigned int changes reverted.

db89e86

Missed one

4e8c2e7

DHowett reviewed Dec 6, 2020

View reviewed changes

Updating sources file per PR feedback

54a5d11

DHowett reviewed Dec 11, 2020

View reviewed changes

Fixing AuditMode since it failed in the PR checks.

c505fbb

vadimkantorov mentioned this pull request Dec 11, 2020

Report to users publicly results of: Test Terminal in low-memory; slow hard disk conditions, measure/reduce working set, minimize number of used shared libraries; binary size #6409

Open

Austin-Lamb added 3 commits December 11, 2020 15:07

PR feedback

52b177f

Merge branch 'main' into user/austinl/optimizeAllocations

f4bc049

Failure to run 'runformat' strikes again

292bc65

Austin-Lamb marked this pull request as ready for review December 11, 2020 23:37

zadjii-msft reviewed Dec 15, 2020

View reviewed changes

zadjii-msft assigned DHowett and miniksa Dec 15, 2020

zadjii-msft added Area-Performance Performance-related issue Product-Conhost For issues in the Console codebase labels Dec 15, 2020

miniksa approved these changes Dec 15, 2020

View reviewed changes

DHowett requested changes Dec 15, 2020

View reviewed changes

src/buffer/out/AttrRowIterator.cpp Outdated Show resolved Hide resolved

src/buffer/out/AttrRowIterator.hpp Outdated Show resolved Hide resolved

src/common.build.pre.props Outdated Show resolved Hide resolved

ghost added Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels Dec 15, 2020

Austin-Lamb and others added 2 commits December 16, 2020 09:56

Reverting the GenerateDebugInformation change, it's not important for…

716f6a0

… this PR.

Renormalize line endings

eabe92c

DHowett approved these changes Dec 16, 2020

View reviewed changes

DHowett merged commit 539a5dc into microsoft:main Dec 16, 2020

Conversation

Austin-Lamb commented Dec 3, 2020 • edited by DHowett Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Validation Steps Performed

Uh oh!

miniksa commented Dec 3, 2020

Uh oh!

Uh oh!

Uh oh!

Austin-Lamb commented Dec 3, 2020

Uh oh!

zadjii-msft commented Dec 3, 2020

Uh oh!

DHowett commented Dec 3, 2020

Uh oh!

github-actions bot commented Dec 3, 2020

New misspellings found, please review:

⚠️ Reviewers

Uh oh!

Austin-Lamb commented Dec 3, 2020

Uh oh!

miniksa commented Dec 3, 2020

Uh oh!

skyline75489 commented Dec 3, 2020

Uh oh!

WSLUser commented Dec 4, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vadimkantorov commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DHowett Dec 11, 2020

Choose a reason for hiding this comment

Uh oh!

Austin-Lamb Dec 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DHowett commented Dec 11, 2020

Uh oh!

vadimkantorov commented Dec 11, 2020

Uh oh!

DHowett commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimkantorov commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Austin-Lamb commented Dec 12, 2020

Uh oh!

zadjii-msft left a comment

Choose a reason for hiding this comment

Uh oh!

miniksa left a comment

Choose a reason for hiding this comment

Uh oh!

DHowett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DHowett commented Dec 16, 2020

Uh oh!

DHowett commented Dec 16, 2020

Uh oh!

ghost commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Austin-Lamb commented Dec 3, 2020 •

edited by DHowett

Loading

vadimkantorov commented Dec 11, 2020 •

edited

Loading

DHowett commented Dec 11, 2020 •

edited

Loading

vadimkantorov commented Dec 11, 2020 •

edited

Loading