-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
chore(sqlite): use normalised tables for file names and versions #10383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This changes the files table to use normalisation for the names and
versions. The idea is that these are often common between all remote
devices, and repeating an integer is more efficient than repeating a
long string. A new benchmark bears this out; for a database with 100k
files shared between 31 devices, with some worst case assumption on
version vector size, the database is reduced in size by 50% and the test
finishes quicker:
Current:
db_bench_test.go:322: Total size: 6263.70 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (1084.89s)
New:
db_bench_test.go:326: Total size: 3049.95 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (776.97s)
The other benchmarks end up about the same within the margin of
variability, with one possible exception being that RemoteNeed seems to
be a little slower on average:
old files/s new files/s
Update/n=RemoteNeed/size=1000-8 5.051k 4.654k
Update/n=RemoteNeed/size=2000-8 5.201k 4.384k
Update/n=RemoteNeed/size=4000-8 4.943k 4.242k
Update/n=RemoteNeed/size=8000-8 5.099k 3.527k
Update/n=RemoteNeed/size=16000-8 3.686k 3.847k
Update/n=RemoteNeed/size=30000-8 4.456k 3.482k
I'm not sure why, possibly that query can be optimised anyhow.
Signed-off-by: Jakob Borg <jakob@kastelo.net>
43e43c9 to
700dbf9
Compare
Signed-off-by: Jakob Borg <jakob@kastelo.net>
86e53a1 to
3754eb6
Compare
Signed-off-by: Jakob Borg <jakob@kastelo.net>
3754eb6 to
29fb471
Compare
|
There's some room for improvement on storing the fileinfos as well but I'm leaving that for a separate pr |
|
(removed a few comments that didn't make much sense) |
|
| return nil, wrap(err) | ||
| } | ||
| defer func() { | ||
| _, _ = conn.ExecContext(ctx, "PRAGMA foreign_keys = ON") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add:
PRAGMA foreign_key_checksee https://www.sqlite.org/lang_altertable.html#otheralter
That link also doesn't show the need to use PRAGMA legacy_alter_table = ON ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That page seems out of date, the legacy alter table is to avoid the drop of fileinfos when we drop the (old) files table, google that pragme for the lowdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the section in the docs which proves your point:
https://www.sqlite.org/lang_altertable.html
But we should still run PRAGMA foreign_key_check as we're meddling with FKs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
From the sqlite docs:
|
Are we talking about this one? syncthing/internal/db/sqlite/folderdb_global.go Lines 163 to 183 in 20428cf
I guess we're paying the joins twice here. Currently on my phone but maybe we could use a single query and push the differing bits down to an |
No, the integer primary key replaces the rowid. Combining it with "without rowid" means we lose auto increment-like semantics and need to set the primary key manually, which we don't. |
Maybe? It wasn't obvious to me, I think it's good enough and can be left as a future optimisation. |
Signed-off-by: Jakob Borg <jakob@kastelo.net>
fdc57aa to
980e3d4
Compare
Signed-off-by: Jakob Borg <jakob@kastelo.net>
I'll take a stab at an optimized version. But that's unrelated to this PR as you already pointed out 😉 |
Co-authored-by: Ross Smith II <ross@smithii.com> Signed-off-by: Jakob Borg <jakob@kastelo.net>
internal/db/sqlite/sql/migrations/folder/05-normalize-files.sql
Outdated
Show resolved
Hide resolved
Signed-off-by: Jakob Borg <jakob@kastelo.net>
rasa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
internal/db/sqlite/sql/migrations/folder/05-normalize-files.sql
Outdated
Show resolved
Hide resolved
Co-authored-by: Ross Smith II <ross@smithii.com> Signed-off-by: Jakob Borg <jakob@kastelo.net>
---------------------------------------------------------------------------------------- amiberry.mk cbca08a4af7f656754d50a4934d6315483eaf4cc # Version: Commits on Sept 13, 2025 ---------------------------------------------------------------------------------------- bugfix: P96 mode of 1024x600 would throw error on startup (fixes #1708) This mode had a duplicate ID, ---------------------------------------------------------------------------------------- applewin.mk 541f2b172780c590c8039c7a6769fad3cd80a618 # Version: Commits on Sept 13, 2025 ---------------------------------------------------------------------------------------- Merge pull request #300 from audetto/zoltanvb-gitlab-ci Enable 64 and 32-bit builds for Linux and Windows, ----------------------------------------------------------------------------------- clk.mk b698850ce8413307e56157659156132c0a5a4554 # Version: Commits on Sept 13, 2025 ----------------------------------------------------------------------------------- Merge pull request #1556 from TomHarte/StaticConstexpr Further prefer `static constexpr`., ------------------------------------------------------------------------------------------- dolphin-emu.mk 695e06ca981dee93a2be517594b7c622bf18d41f # Version: Commits on Sept 13, 2025 ------------------------------------------------------------------------------------------- Merge pull request #13930 from Dentomologist/retroachievements_avoid_leaderboard_list_crash RetroAchievements: Avoid crash due to uninitialized memory read, ------------------------------------------------------------------------------------------- duckstation.mk ee9f32185eb034bcacc3d04be206e1501047defa # Version: Commits on Sept 13, 2025 ------------------------------------------------------------------------------------------- dep/rcheevos: Remove files deleted upstream, --------------------------------------------------------------------------------------- flycast.mk ac32a8035243951803d8a8e9f45ad2317a2da91f # Version: Commits on Sept 12, 2025 --------------------------------------------------------------------------------------- Fetch translations & Recreate libretro_core_options_intl.h, ------------------------------------------------------------------------------------- fsuae.mk f362278ccd4c60991caac3b4d240d4a3f751bea2 # Version: Commits on Sept 13, 2025 ------------------------------------------------------------------------------------- Also add joystick mappings for port 0 and mouse mappings for port 1, -------------------------------------------------------------------------------------- ikemen.mk 4515f67e9716e9f73d8f82069cecd425c870de44 # Version: Commits on Sept 13, 2025 -------------------------------------------------------------------------------------- style: fix code style issues with gofmt, ------------------------------------------------------------------------------------ play.mk 19b3995e6d09d9cbd4de9871ad94ea183ca53a6e # Version: Commits on Sept 10, 2025 ------------------------------------------------------------------------------------ Implement PSUBSB., -------------------------------------------------------------------------------------- ppsspp.mk c7aa2bcd01c3aa4e243a7f9054ca5d1cedc67c23 # Version: Commits on Sept 12, 2025 -------------------------------------------------------------------------------------- Merge pull request #20808 from hrydgard/atlas-in-common Move some image atlas generation logic to Common/Render, ------------------------------------------------------------------------------------- rpcs3.mk 6106e8f79f021764f9feca56ee5541bdef8128db # Version: Commits on Sept 12, 2025 ------------------------------------------------------------------------------------- rsx: fix 3D aspect ratio, ---------------------------------------------------------------- ruffle.mk nightly-2025-09-13 # Version: Commits on Sept 13, 2025 ---------------------------------------------------------------- ## What's Changed * avm2: Dont construct frame in nested 'goto' frame for SWF 9 by @nivkner in ruffle-rs/ruffle#21572 * chore: Update translations by @RuffleBuild in ruffle-rs/ruffle#21642 **Full Changelog**: ruffle-rs/ruffle@nightly-2025-09-12...nightly-2025-09-13, ------------------------------------------------------------------------------------ ymir.mk 1d83a4a3fd25fff68f39e5cecbd2fc2db5e52278 # Version: Commits on Sept 12, 2025 ------------------------------------------------------------------------------------ feat(debug): Visually display CD device connection in Filters view, ------------------------------------------------------------------------------------- box64.mk 2c6816be8f9f6b0ed3c87cd97f9fb468ac61e042 # Version: Commits on Sept 12, 2025 ------------------------------------------------------------------------------------- Update release.yml Another fix for TERMUX CI, -------------------------------------------------------------------------------------- ecwolf.mk 7e6bd186af0b5a48ca0104e1efd8f740246a7e98 # Version: Commits on Sept 13, 2025 -------------------------------------------------------------------------------------- Fixed reverse labels for keypad plus and minus (fixes #315)\n -------------------------------------------------------------------------------------------- jazz2-native.mk 51808c1c98ab2aa954ea75709c66137303c2d994 # Version: Commits on Sept 12, 2025 -------------------------------------------------------------------------------------------- Fixed build, ---------------------------------------------------- nblood.mk r14265 # Version: Commits on Sept 13, 2025 ---------------------------------------------------- - ----------------------------------------------------------------------------------------- openmohaa.mk c8c9ae2468657fc5d48064c45b0e76ffae53cb10 # Version: Commits on Sept 12, 2025 ----------------------------------------------------------------------------------------- Basic standalone Debian package, --------------------------------------------------------------------------------------- stalker.mk bd7f2ef7a29635feed304c834ef914efff2e072a # Version: Commits on Sept 12, 2025 --------------------------------------------------------------------------------------- cmake/XRay.Build.cmake: don't set CMAKE_ARCHIVE_OUTPUT_DIRECTORY We don't want static libraries to be in the final bin folder., ------------------------------------------------------- syncthing.mk v2.0.9 # Version: Commits on Sept 13, 2025 ------------------------------------------------------- ## Major changes in 2.0 - Database backend switched from LevelDB to SQLite. There is a migration on first launch which can be lengthy for larger setups. The new database is easier to understand and maintain and, hopefully, less buggy. - The logging format has changed to use structured log entries (a message plus several key-value pairs). Additionally, we can now control the log level per package, and a new log level WARNING has been inserted between INFO and ERROR (which was previously known as WARNING...). The INFO level has become more verbose, indicating the sync actions taken by Syncthing. A new command line flag `--log-level` sets the default log level for all packages, and the `STTRACE` environment variable and GUI has been updated to set log levels per package. The `--verbose` and `--logflags` command line options have been removed and will be ignored if given. - Deleted items are no longer kept forever in the database, instead they are forgotten after fifteen months. If your use case require deletes to take effect after more than a fifteen month delay, set the `--db-delete-retention-interval` command line option or corresponding environment variable to zero, or a longer time interval of your choosing. - Modernised command line options parsing. Old single-dash long options are no longer supported, e.g. `-home` must be given as `--home`. Some options have been renamed, others have become subcommands. All serve options are now also accepted as environment variables. See `syncthing --help` and `syncthing serve --help` for details. - Rolling hash detection of shifted data is no longer supported as this effectively never helped. Instead, scanning and syncing is faster and more efficient without it. - A \default folder\ is no longer created on first startup. - Multiple connections are now used by default between v2 devices. The new default value is to use three connections: one for index metadata and two for data exchange. - The following platforms unfortunately no longer get prebuilt binaries for download at syncthing.net and on GitHub, due to complexities related to cross compilation with SQLite: - dragonfly/amd64 - solaris/amd64 - linux/ppc64 - netbsd/* - openbsd/386 and openbsd/arm - windows/arm - The handling of conflict resolution involving deleted files has changed. A delete can now be the winning outcome of conflict resolution, resulting in the deleted file being moved to a conflict copy. This release is also available as: * APT repository: https://apt.syncthing.net/ * Docker image: `docker.io/syncthing/syncthing:2.0.9` or `ghcr.io/syncthing/syncthing:2.0.9` (`{docker,ghcr}.io/syncthing/syncthing:2` to follow just the major version) ## What's Changed ### Fixes * fix(sqlite): add _txlock=immediate to modernc implementation by @calmh in syncthing/syncthing#10384 * fix(api): limit size of allowed authentication request by @calmh in syncthing/syncthing#10386 ### Other * chore(ursrv): update regex patterns for Syncthing-Fork entries by @Catfriend1 in syncthing/syncthing#10380 * chore: clean up migrated database by @calmh in syncthing/syncthing#10381 * chore(sqlite): use normalised tables for file names and versions by @calmh in syncthing/syncthing#10383 * chore(model): slightly deflake TestRecvOnlyRevertOwnID by @calmh in syncthing/syncthing#10390 **Full Changelog**: syncthing/syncthing@v2.0.8...v2.0.9, ----------------------------------------------------------------------------------------- retroarch.mk d04919199061c45a23feb3c4f75b10ea2aef28cc # Version: Commits on Sept 13, 2025 ----------------------------------------------------------------------------------------- Fetch translations from Crowdin, -------------------------------------------------------------------------------------- gzdoom.mk 1fff657204491479fd01b41a4086098d5321c204 # Version: Commits on Sept 12, 2025 -------------------------------------------------------------------------------------- Fix Hexen transcription error It's actually \archimage\, with an i, not \archmage\. Cf. https://github.com/OpenSourcedGames/Hexen/blob/bc35ff0cb70db033942cc9e5abdffae59f2f139f/Hexen%20Source/MN_MENU.C#L1008, ------------------------------------------------------------------------------------ tr1x.mk 169c6c1f5a13421a75b03d76be87ff2fd2f0cfef # Version: Commits on Sept 12, 2025 ------------------------------------------------------------------------------------ console: add backdrop to TR2 Resolves #2150., ------------------------------------------------------------------------------------ tr2x.mk 169c6c1f5a13421a75b03d76be87ff2fd2f0cfef # Version: Commits on Sept 12, 2025 ------------------------------------------------------------------------------------ console: add backdrop to TR2 Resolves #2150., ----------------------------------------------------------------------------------------------- libretro-fceumm.mk 5cd4a43e16a7f3cd35628d481c347a0a98cfdfa2 # Version: Commits on Sept 13, 2025 ----------------------------------------------------------------------------------------------- Merge pull request #637 from NRS-NewRisingSun/asic_modularize New Mappers and modularization, ------------------------------------------------------------------------------------------------ libretro-flycast.mk ac32a8035243951803d8a8e9f45ad2317a2da91f # Version: Commits on Sept 12, 2025 ------------------------------------------------------------------------------------------------ Fetch translations & Recreate libretro_core_options_intl.h, --------------------------------------------------------------------------------------------- libretro-pc98.mk 02b08deb3833305251fb3ee6c5d59b0efb5b52ff # Version: Commits on Sept 13, 2025 --------------------------------------------------------------------------------------------- fix build error, ----------------------------------------------------------------------------------------------- libretro-ppsspp.mk c7aa2bcd01c3aa4e243a7f9054ca5d1cedc67c23 # Version: Commits on Sept 12, 2025 ----------------------------------------------------------------------------------------------- Merge pull request #20808 from hrydgard/atlas-in-common Move some image atlas generation logic to Common/Render, ---------------------------------------------------------------------------------------------- libretro-tic80.mk bae9661882c4086073967207e7f1dca59aed34b7 # Version: Commits on Sept 13, 2025 ---------------------------------------------------------------------------------------------- [Linux] Associates .tic and .png to TIC-80 (#2835), -------------------------------------------------------------------------------------------- glsl-shaders.mk 503b81c81ce2b10eab5b3cf34f33f91bee85c713 # Version: Commits on Sept 12, 2025 -------------------------------------------------------------------------------------------- crt-pi rectify some errors (#530)
This changes the files table to use normalisation for the names and versions. The idea is that these are often common between all remote devices, and repeating an integer is more efficient than repeating a long string. A new benchmark bears this out; for a database with 100k files shared between 31 devices, with some worst case assumption on version vector size, the database is reduced in size by 50% and the test finishes quicker:
The other benchmarks end up about the same within the margin of variability, with one possible exception being that RemoteNeed seems to be a little slower on average:
I'm not sure why, possibly that query can be optimised anyhow.