Skip to content

Store sqlite writes to an in-memory DB first. Flush that db to disk at periodic intervals.#36970

Merged
109 commits merged intodotnet:release/dev16.7-preview1from
CyrusNajmabadi:sqliteInMem
Apr 4, 2020
Merged

Store sqlite writes to an in-memory DB first. Flush that db to disk at periodic intervals.#36970
109 commits merged intodotnet:release/dev16.7-preview1from
CyrusNajmabadi:sqliteInMem

Conversation

@CyrusNajmabadi
Copy link
Contributor

@CyrusNajmabadi CyrusNajmabadi commented Jul 3, 2019

From a conversation in gitter.im with @ericsink he helped me with the following idea: Instead of caching sqlite writes into .net memory, just write them to an in-memory sqlite db, and then periodically flush that in-memory db to disk.

This is similar to our existing approach, but makes things much simpler in terms of implementation.

Backstory:
When we moved to sqlite an early issue I ran into was that sqlite has issues with doing tons of tiny transactions. These transactions have some fixed cost on them, and so if you're hammering the DB with them (which we are), then you can bottleneck and cause lots of backup as things try to get through. All our features are insulated from teh actual underlying DB, and only get a key/value read/write store. So all of them often just keep trying to dump lots of data into the db exacerbating the system.

To alleviate this problem i introduced an in-memory write-cache where we staged a lot of the incoming writes to then write out to disk twice-a-second, under a single transaction. This improved perf roughly 50x, but came with a complexity cost. Specifically, we needed to write the in memory cache, and we had to make sure that 'reads' of data that were in the write cache would see the latest 'writes' (even if they hadn't made it to disk).

This new approach is much simpler:

  1. sqlite has the concept of sharable, in-memory dbs. So this is a db that can contain data, that will only be held in memory, and can be used from many sql connections. This is a prime location for all our writes to buffer into.
  2. it's extremely simple and efficient to transfer data from the temp sqlite table to the on-disk table (and it can be done in a single transaction). This avoids the need to have the data stay in .net memory.
  3. this will allow us in the future to move to more efficent Span/ReadOnlySpan variants. We used to have to be async with how we cached writes and flushed them across threads. Now we always just write thigns through to the write cache synchronously, and we 'read-through' the write-layer and the actual-disk layer synchronously as well. This allows us to actually use spans as we don't have any unintentional display-classes that would be capturing spans.
  4. because the in-memory-db and the disk-db have the same format, reading/writing is extremely simple. There's no need for one of mechanisms to read/write from the memory cache and one for the disk-db. The same queries/sql/etc. work uniformly over both.

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner July 3, 2019 22:00
@vatsalyaagrawal vatsalyaagrawal added Area-IDE Community The pull request was submitted by a contributor who is not a Microsoft employee. labels Jul 3, 2019
@@ -1,245 +1,80 @@
// Copyright (c) Microsoft. All Rights Reserved. Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.

using System;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file's diff looks awful. the 'old' part is irrelevant. i recommend just looking at the 'new' side.

@CyrusNajmabadi
Copy link
Contributor Author

CyrusNajmabadi commented Jul 3, 2019

tagging @heejaechang @ericsink @sharwell @jasonmalinowski .

@sharwell it would be especially great if we could test the perf out here.

I recommend disabling whitespace diffs here.

if (fetchStringTable)
{
if (!TryFetchStringTable(connection))
if (!TryFetchStringTableFromMainDB(connection))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: all the string table work is done with the main DB. this keeps us having a single consistent place for the string information. It's also not a place we need to optimize as the string writing is already done in a bulk in a single transactin.

/// so that all reads for the queue observe any previously completed writes.
/// </summary>
private readonly Dictionary<TWriteQueueKey, Task> _writeQueueKeyToWriteTask =
new Dictionary<TWriteQueueKey, Task>();
Copy link
Contributor Author

@CyrusNajmabadi CyrusNajmabadi Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no more complex queues of keys-to-writes to maintain. Yaay!

var (dataBytes, dataLength, dataPooled) = GetBytes(stream);

await AddWriteTaskAsync(key, con =>
using (var pooledConnection = Storage.GetPooledConnection())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: we are no longer asynchronous. so we could consider just extending the range of the higher up pooled-connection using to just encompass this.


// Try to bulk populate all the IDs we'll need for strings/projects/documents.
// Bulk population is much faster than trying to do everything individually.
BulkPopulateIds(connection, solution, fetchStringTable);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines are the same from before. all that happened was i extracted out the table-initialization code since we want to initialize the main tables and initialize the tables in the write-cache.


// Try to bulk populate all the IDs we'll need for strings/projects/documents.
// Bulk population is much faster than trying to do everything individually.
BulkPopulateIds(connection, solution, fetchStringTable);
Copy link
Contributor Author

@CyrusNajmabadi CyrusNajmabadi Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note the logic from var fetched to here is the same as before. it's just that the lines above it were extracted out into EnsureTables

}

private const int NumThreads = 10;
private const int NumThreads = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local testing will not be merged in.

@CyrusNajmabadi CyrusNajmabadi requested a review from sharwell March 31, 2020 18:23
@CyrusNajmabadi
Copy link
Contributor Author

@sharwell have made the reg-key change.

@sharwell
Copy link
Contributor

@CyrusNajmabadi I'll review after the next master→dev16.7-preview1 merge completes

Copy link
Member

@jasonmalinowski jasonmalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was pretty good until I hit the SQLitePersistentStorage_Threading.cs file and then got terrified. It seems like we're doing a locking mechanism atop the transaction system in SQLite and that invalidated any understanding I had of this code. (Regrettably it was the last file I ended up looking at...)

How do we understand all that needs to be surrounded by PerformReadAsync and PerformWriteAsync? For example, why is the flushing not surrounded by a PerformWrite, because otherwise I could imagine it could be involved in a deadlock since it's going to be acquiring multiple read and write locks?

It's been awhile since I've played with SQL transactions but usually either the deadlock victim retries or you have additional tools to specify potential write locks that would be acquired to avoid the cycle. Are those not usable here, or is that not how SQLite chooses to work?

protected override string GetDatabaseFilePath(string workingFolderPath)
{
Contract.ThrowIfTrue(string.IsNullOrWhiteSpace(workingFolderPath));
return Path.Combine(workingFolderPath, StorageExtension, nameof(v2), PersistentStorageFileName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything clean up the old databases? Or people potentially will have the old DB on their disk forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. And that's an intentional part of the design. i.e. they're designed to keep working even if you have side-x-side versions of VS (even with the same solution open in all of them).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you imagine that if we delete the v1 we would then clean it up then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean... it's a bit tough. we could delete v1 and only ship vs in another release of vs. but maybe you have the different releases sxs.

maybe we could check if something hadn't been used in a few months... but i'm wary.

--

Note: with roslyn itself, our full cache size is 60mb. So while i'm not a huge fan of leaving these behind, i also don't view it as a substantial problem that needs pressing resolution.

Comment on lines +145 to +147
_select_star_from_string_table = $@"select * from {StringInfoTableName}";
_insert_into_string_table_values_0 = $@"insert into {StringInfoTableName}(""{DataColumnName}"") values (?)";
_select_star_from_string_table_where_0_limit_one = $@"select * from {StringInfoTableName} where (""{DataColumnName}"" = ?) limit 1";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these aren't also involved in the in-memory table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. the string table is dumped (and read) only from the on-disk db. they cahnge rarely enough (and the bulk is computed right at the beginning) that there's no need for the write cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we also write them in bulk anyways).

Comment on lines +59 to +61
// Haven't been shutdown. Actually go and move any outstanding data to the real DB.
using var _ = storage.GetPooledConnection(out var connection);
storage.FlushInMemoryDataToDisk_MustRunUnderLock(connection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing the actual flushing under _flushGate? I would have imagined this was kept outside as otherwise any write to the in-memory is still going to block behind this writing. Presumably the SQLite transaction semantics already make it safe (otherwise this has bigger problems), and since this is running after the nulling-out of the task it means if this runs outside the gate there's a possibility of an extra flush job being queued, but that won't do anything so it's moot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great find. I think this is actually a pure bug (though not likely to hit). I think we likely should not have this flush gate tbh. we already have a mechanism to ensure exclusive access through the threading mechanism.

@CyrusNajmabadi
Copy link
Contributor Author

@jasonmalinowski thanks for the feedback. i think i've addressed it all now. fortunately, it likely made things safer and simpler, so that's great :)

}

// Create a connection to the DB and ensure it has tables for the types we care about.
using var _ = GetPooledConnection(out var connection);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit should be viewed with whitespe diff off.

Copy link
Member

@jasonmalinowski jasonmalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .Wait()s that are spawning work and then immediately blocking during startup/shutdown do worry me a little bit on the perf side of thing, since if the thread pool is slammed during startup/shutdown that might cause some perf issues. If we have a reader writer lock that supports both sync and async lock acquisition though I can't find it, so I don't have a better idea. (And switching to sync acquisition would be worse since the main line stuff wouldn't be async anymore so that's even worse.)

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approval

@CyrusNajmabadi CyrusNajmabadi dismissed sharwell’s stale review April 3, 2020 22:54

everyhting has been addressed.

@ghost ghost merged commit 3b72659 into dotnet:release/dev16.7-preview1 Apr 4, 2020
@jinujoseph jinujoseph added this to the 16.7.P1 milestone Apr 5, 2020
@CyrusNajmabadi CyrusNajmabadi deleted the sqliteInMem branch April 8, 2020 06:08
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-IDE auto-merge Concept-Design Debt Engineering Debt, Design Debt, or poor product code quality Need Design Review The end user experience design needs to be reviewed and approved.

Projects

Status: Complete

Development

Successfully merging this pull request may close these issues.

9 participants