Skip to content

Conversation

@calmh
Copy link
Member

@calmh calmh commented Sep 10, 2025

This changes the files table to use normalisation for the names and versions. The idea is that these are often common between all remote devices, and repeating an integer is more efficient than repeating a long string. A new benchmark bears this out; for a database with 100k files shared between 31 devices, with some worst case assumption on version vector size, the database is reduced in size by 50% and the test finishes quicker:

Current:
    db_bench_test.go:322: Total size: 6263.70 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (1084.89s)

New:
    db_bench_test.go:326: Total size: 3049.95 MiB
--- PASS: TestBenchmarkSizeManyFilesRemotes (776.97s)

The other benchmarks end up about the same within the margin of variability, with one possible exception being that RemoteNeed seems to be a little slower on average:

                                      old files/s   new files/s
Update/n=RemoteNeed/size=1000-8            5.051k        4.654k
Update/n=RemoteNeed/size=2000-8            5.201k        4.384k
Update/n=RemoteNeed/size=4000-8            4.943k        4.242k
Update/n=RemoteNeed/size=8000-8            5.099k        3.527k
Update/n=RemoteNeed/size=16000-8           3.686k        3.847k
Update/n=RemoteNeed/size=30000-8           4.456k        3.482k

I'm not sure why, possibly that query can be optimised anyhow.

@github-actions github-actions bot added the chore label Sep 10, 2025
This changes the files table to use normalisation for the names and
versions. The idea is that these are often common between all remote
devices, and repeating an integer is more efficient than repeating a
long string. A new benchmark bears this out; for a database with 100k
files shared between 31 devices, with some worst case assumption on
version vector size, the database is reduced in size by 50% and the test
finishes quicker:

    Current:
        db_bench_test.go:322: Total size: 6263.70 MiB
    --- PASS: TestBenchmarkSizeManyFilesRemotes (1084.89s)

    New:
        db_bench_test.go:326: Total size: 3049.95 MiB
    --- PASS: TestBenchmarkSizeManyFilesRemotes (776.97s)

The other benchmarks end up about the same within the margin of
variability, with one possible exception being that RemoteNeed seems to
be a little slower on average:

                                          old files/s   new files/s
    Update/n=RemoteNeed/size=1000-8            5.051k        4.654k
    Update/n=RemoteNeed/size=2000-8            5.201k        4.384k
    Update/n=RemoteNeed/size=4000-8            4.943k        4.242k
    Update/n=RemoteNeed/size=8000-8            5.099k        3.527k
    Update/n=RemoteNeed/size=16000-8           3.686k        3.847k
    Update/n=RemoteNeed/size=30000-8           4.456k        3.482k

I'm not sure why, possibly that query can be optimised anyhow.

Signed-off-by: Jakob Borg <jakob@kastelo.net>
Signed-off-by: Jakob Borg <jakob@kastelo.net>
@calmh calmh marked this pull request as ready for review September 10, 2025 10:28
@calmh calmh marked this pull request as draft September 10, 2025 10:47
Signed-off-by: Jakob Borg <jakob@kastelo.net>
@calmh calmh marked this pull request as ready for review September 10, 2025 11:48
Signed-off-by: Jakob Borg <jakob@kastelo.net>
@calmh
Copy link
Member Author

calmh commented Sep 10, 2025

There's some room for improvement on storing the fileinfos as well but I'm leaving that for a separate pr

@bt90
Copy link
Contributor

bt90 commented Sep 10, 2025

(removed a few comments that didn't make much sense)

@rasa
Copy link
Member

rasa commented Sep 10, 2025

(removed a few comments that didn't make much sense)

I think file_names and file_versions would benefit from having WITHOUT ROWID. No longer true.

return nil, wrap(err)
}
defer func() {
_, _ = conn.ExecContext(ctx, "PRAGMA foreign_keys = ON")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add:

PRAGMA foreign_key_check

see https://www.sqlite.org/lang_altertable.html#otheralter

That link also doesn't show the need to use PRAGMA legacy_alter_table = ON ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That page seems out of date, the legacy alter table is to avoid the drop of fileinfos when we drop the (old) files table, google that pragme for the lowdown

Copy link
Contributor

@bt90 bt90 Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the section in the docs which proves your point:

https://www.sqlite.org/lang_altertable.html

But we should still run PRAGMA foreign_key_check as we're meddling with FKs here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@bt90
Copy link
Contributor

bt90 commented Sep 10, 2025

(removed a few comments that didn't make much sense)

I think file_names and file_versions would benefit from having WITHOUT ROWID.

From the sqlite docs:

The WITHOUT ROWID optimization is likely to be helpful for tables that have non-integer or composite (multi-column) PRIMARY KEYs and that do not store large strings or BLOBs

WITHOUT ROWID tables will work correctly (that is to say, they provide the correct answer) for tables with a single INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in that case. Hence, it is good design to avoid creating WITHOUT ROWID tables with single-column PRIMARY KEYs of type INTEGER.

@bt90
Copy link
Contributor

bt90 commented Sep 10, 2025

possibly that query can be optimised anyhow.

Are we talking about this one?

SELECT fi.fiprotobuf, bl.blprotobuf, n.name, g.size, g.modified FROM fileinfos fi
INNER JOIN files g on fi.sequence = g.sequence
LEFT JOIN blocklists bl ON bl.blocklist_hash = g.blocklist_hash
INNER JOIN file_names n ON g.name_idx = n.idx
WHERE g.local_flags & {{.FlagLocalGlobal}} != 0 AND NOT g.deleted AND g.local_flags & {{.LocalInvalidFlags}} = 0 AND NOT EXISTS (
SELECT 1 FROM FILES f
INNER JOIN devices d ON d.idx = f.device_idx
WHERE f.name_idx = g.name_idx AND f.version_idx = g.version_idx AND d.device_id = ?
)
UNION ALL
SELECT fi.fiprotobuf, bl.blprotobuf, n.name, g.size, g.modified FROM fileinfos fi
INNER JOIN files g on fi.sequence = g.sequence
LEFT JOIN blocklists bl ON bl.blocklist_hash = g.blocklist_hash
INNER JOIN file_names n ON g.name_idx = n.idx
WHERE g.local_flags & {{.FlagLocalGlobal}} != 0 AND g.deleted AND g.local_flags & {{.LocalInvalidFlags}} = 0 AND EXISTS (
SELECT 1 FROM FILES f
INNER JOIN devices d ON d.idx = f.device_idx
WHERE f.name_idx = g.name_idx AND d.device_id = ? AND NOT f.deleted AND f.local_flags & {{.LocalInvalidFlags}} = 0
)

I guess we're paying the joins twice here. Currently on my phone but maybe we could use a single query and push the differing bits down to an OR in the WHERE clause.

@calmh
Copy link
Member Author

calmh commented Sep 10, 2025

@rasa

I think file_names and file_versions would benefit from having WITHOUT ROWID.

No, the integer primary key replaces the rowid. Combining it with "without rowid" means we lose auto increment-like semantics and need to set the primary key manually, which we don't.

@calmh
Copy link
Member Author

calmh commented Sep 10, 2025

@bt90

I guess we're paying the joins twice here.

Maybe? It wasn't obvious to me, I think it's good enough and can be left as a future optimisation.

Signed-off-by: Jakob Borg <jakob@kastelo.net>
Signed-off-by: Jakob Borg <jakob@kastelo.net>
@bt90
Copy link
Contributor

bt90 commented Sep 10, 2025

@bt90

I guess we're paying the joins twice here.

Maybe? It wasn't obvious to me, I think it's good enough and can be left as a future optimisation.

I'll take a stab at an optimized version. But that's unrelated to this PR as you already pointed out 😉

rasa
rasa previously requested changes Sep 10, 2025
Co-authored-by: Ross Smith II <ross@smithii.com>
Signed-off-by: Jakob Borg <jakob@kastelo.net>
@calmh calmh requested review from bt90 and rasa September 11, 2025 08:05
Signed-off-by: Jakob Borg <jakob@kastelo.net>
Copy link
Member

@rasa rasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Ross Smith II <ross@smithii.com>
Signed-off-by: Jakob Borg <jakob@kastelo.net>
@calmh calmh enabled auto-merge (squash) September 12, 2025 09:23
@calmh calmh merged commit 9ee208b into syncthing:main Sep 12, 2025
30 of 31 checks passed
Hancock33 added a commit to Hancock33/batocera.piboy that referenced this pull request Sep 14, 2025
----------------------------------------------------------------------------------------
amiberry.mk cbca08a4af7f656754d50a4934d6315483eaf4cc # Version: Commits on Sept 13, 2025
----------------------------------------------------------------------------------------
bugfix: P96 mode of 1024x600 would throw error on startup (fixes #1708)

This mode had a duplicate ID,

----------------------------------------------------------------------------------------
applewin.mk 541f2b172780c590c8039c7a6769fad3cd80a618 # Version: Commits on Sept 13, 2025
----------------------------------------------------------------------------------------
Merge pull request #300 from audetto/zoltanvb-gitlab-ci

Enable 64 and 32-bit builds for Linux and Windows,

-----------------------------------------------------------------------------------
clk.mk b698850ce8413307e56157659156132c0a5a4554 # Version: Commits on Sept 13, 2025
-----------------------------------------------------------------------------------
Merge pull request #1556 from TomHarte/StaticConstexpr

Further prefer `static constexpr`.,

-------------------------------------------------------------------------------------------
dolphin-emu.mk 695e06ca981dee93a2be517594b7c622bf18d41f # Version: Commits on Sept 13, 2025
-------------------------------------------------------------------------------------------
Merge pull request #13930 from Dentomologist/retroachievements_avoid_leaderboard_list_crash

RetroAchievements: Avoid crash due to uninitialized memory read,

-------------------------------------------------------------------------------------------
duckstation.mk ee9f32185eb034bcacc3d04be206e1501047defa # Version: Commits on Sept 13, 2025
-------------------------------------------------------------------------------------------
dep/rcheevos: Remove files deleted upstream,

---------------------------------------------------------------------------------------
flycast.mk ac32a8035243951803d8a8e9f45ad2317a2da91f # Version: Commits on Sept 12, 2025
---------------------------------------------------------------------------------------
Fetch translations & Recreate libretro_core_options_intl.h,

-------------------------------------------------------------------------------------
fsuae.mk f362278ccd4c60991caac3b4d240d4a3f751bea2 # Version: Commits on Sept 13, 2025
-------------------------------------------------------------------------------------
Also add joystick mappings for port 0 and mouse mappings for port 1,

--------------------------------------------------------------------------------------
ikemen.mk 4515f67e9716e9f73d8f82069cecd425c870de44 # Version: Commits on Sept 13, 2025
--------------------------------------------------------------------------------------
style: fix code style issues with gofmt,

------------------------------------------------------------------------------------
play.mk 19b3995e6d09d9cbd4de9871ad94ea183ca53a6e # Version: Commits on Sept 10, 2025
------------------------------------------------------------------------------------
Implement PSUBSB.,

--------------------------------------------------------------------------------------
ppsspp.mk c7aa2bcd01c3aa4e243a7f9054ca5d1cedc67c23 # Version: Commits on Sept 12, 2025
--------------------------------------------------------------------------------------
Merge pull request #20808 from hrydgard/atlas-in-common

Move some image atlas generation logic to Common/Render,

-------------------------------------------------------------------------------------
rpcs3.mk 6106e8f79f021764f9feca56ee5541bdef8128db # Version: Commits on Sept 12, 2025
-------------------------------------------------------------------------------------
rsx: fix 3D aspect ratio,

----------------------------------------------------------------
ruffle.mk nightly-2025-09-13 # Version: Commits on Sept 13, 2025
----------------------------------------------------------------
## What's Changed

* avm2: Dont construct frame in nested 'goto' frame for SWF 9 by @nivkner in ruffle-rs/ruffle#21572

* chore: Update translations by @RuffleBuild in ruffle-rs/ruffle#21642

**Full Changelog**: ruffle-rs/ruffle@nightly-2025-09-12...nightly-2025-09-13,

------------------------------------------------------------------------------------
ymir.mk 1d83a4a3fd25fff68f39e5cecbd2fc2db5e52278 # Version: Commits on Sept 12, 2025
------------------------------------------------------------------------------------
feat(debug): Visually display CD device connection in Filters view,

-------------------------------------------------------------------------------------
box64.mk 2c6816be8f9f6b0ed3c87cd97f9fb468ac61e042 # Version: Commits on Sept 12, 2025
-------------------------------------------------------------------------------------
Update release.yml

Another fix for TERMUX CI,

--------------------------------------------------------------------------------------
ecwolf.mk 7e6bd186af0b5a48ca0104e1efd8f740246a7e98 # Version: Commits on Sept 13, 2025
--------------------------------------------------------------------------------------
Fixed reverse labels for keypad plus and minus (fixes #315)\n

--------------------------------------------------------------------------------------------
jazz2-native.mk 51808c1c98ab2aa954ea75709c66137303c2d994 # Version: Commits on Sept 12, 2025
--------------------------------------------------------------------------------------------
Fixed build,

----------------------------------------------------
nblood.mk r14265 # Version: Commits on Sept 13, 2025
----------------------------------------------------
-

-----------------------------------------------------------------------------------------
openmohaa.mk c8c9ae2468657fc5d48064c45b0e76ffae53cb10 # Version: Commits on Sept 12, 2025
-----------------------------------------------------------------------------------------
Basic standalone Debian package,

---------------------------------------------------------------------------------------
stalker.mk bd7f2ef7a29635feed304c834ef914efff2e072a # Version: Commits on Sept 12, 2025
---------------------------------------------------------------------------------------
cmake/XRay.Build.cmake: don't set CMAKE_ARCHIVE_OUTPUT_DIRECTORY

We don't want static libraries to be in the final bin folder.,

-------------------------------------------------------
syncthing.mk v2.0.9 # Version: Commits on Sept 13, 2025
-------------------------------------------------------
## Major changes in 2.0

- Database backend switched from LevelDB to SQLite. There is a migration on

  first launch which can be lengthy for larger setups. The new database is

  easier to understand and maintain and, hopefully, less buggy.

- The logging format has changed to use structured log entries (a message

  plus several key-value pairs). Additionally, we can now control the log

  level per package, and a new log level WARNING has been inserted between

  INFO and ERROR (which was previously known as WARNING...). The INFO level

  has become more verbose, indicating the sync actions taken by Syncthing. A

  new command line flag `--log-level` sets the default log level for all

  packages, and the `STTRACE` environment variable and GUI has been updated

  to set log levels per package. The `--verbose` and `--logflags` command

  line options have been removed and will be ignored if given.

- Deleted items are no longer kept forever in the database, instead they are

  forgotten after fifteen months. If your use case require deletes to take

  effect after more than a fifteen month delay, set the

  `--db-delete-retention-interval` command line option or corresponding

  environment variable to zero, or a longer time interval of your choosing.

- Modernised command line options parsing. Old single-dash long options are

  no longer supported, e.g. `-home` must be given as `--home`. Some options

  have been renamed, others have become subcommands. All serve options are

  now also accepted as environment variables. See  `syncthing --help` and

  `syncthing serve --help` for details.

- Rolling hash detection of shifted data is no longer supported as this

  effectively never helped. Instead, scanning and syncing is faster and more

  efficient without it.

- A \default folder\ is no longer created on first startup.

- Multiple connections are now used by default between v2 devices. The new

  default value is to use three connections: one for index metadata and two

  for data exchange.

- The following platforms unfortunately no longer get prebuilt binaries for

  download at syncthing.net and on GitHub, due to complexities related to

  cross compilation with SQLite:

  - dragonfly/amd64

  - solaris/amd64

  - linux/ppc64

  - netbsd/*

  - openbsd/386 and openbsd/arm

  - windows/arm

- The handling of conflict resolution involving deleted files has changed. A

  delete can now be the winning outcome of conflict resolution, resulting in

  the deleted file being moved to a conflict copy.

This release is also available as:

* APT repository: https://apt.syncthing.net/

* Docker image: `docker.io/syncthing/syncthing:2.0.9` or `ghcr.io/syncthing/syncthing:2.0.9`

  (`{docker,ghcr}.io/syncthing/syncthing:2` to follow just the major version)

## What's Changed

### Fixes

* fix(sqlite): add _txlock=immediate to modernc implementation by @calmh in syncthing/syncthing#10384

* fix(api): limit size of allowed authentication request by @calmh in syncthing/syncthing#10386

### Other

* chore(ursrv): update regex patterns for Syncthing-Fork entries by @Catfriend1 in syncthing/syncthing#10380

* chore: clean up migrated database by @calmh in syncthing/syncthing#10381

* chore(sqlite): use normalised tables for file names and versions by @calmh in syncthing/syncthing#10383

* chore(model): slightly deflake TestRecvOnlyRevertOwnID by @calmh in syncthing/syncthing#10390

**Full Changelog**: syncthing/syncthing@v2.0.8...v2.0.9,

-----------------------------------------------------------------------------------------
retroarch.mk d04919199061c45a23feb3c4f75b10ea2aef28cc # Version: Commits on Sept 13, 2025
-----------------------------------------------------------------------------------------
Fetch translations from Crowdin,

--------------------------------------------------------------------------------------
gzdoom.mk 1fff657204491479fd01b41a4086098d5321c204 # Version: Commits on Sept 12, 2025
--------------------------------------------------------------------------------------
Fix Hexen transcription error

It's actually \archimage\, with an i, not \archmage\.

Cf. https://github.com/OpenSourcedGames/Hexen/blob/bc35ff0cb70db033942cc9e5abdffae59f2f139f/Hexen%20Source/MN_MENU.C#L1008,

------------------------------------------------------------------------------------
tr1x.mk 169c6c1f5a13421a75b03d76be87ff2fd2f0cfef # Version: Commits on Sept 12, 2025
------------------------------------------------------------------------------------
console: add backdrop to TR2

Resolves #2150.,

------------------------------------------------------------------------------------
tr2x.mk 169c6c1f5a13421a75b03d76be87ff2fd2f0cfef # Version: Commits on Sept 12, 2025
------------------------------------------------------------------------------------
console: add backdrop to TR2

Resolves #2150.,

-----------------------------------------------------------------------------------------------
libretro-fceumm.mk 5cd4a43e16a7f3cd35628d481c347a0a98cfdfa2 # Version: Commits on Sept 13, 2025
-----------------------------------------------------------------------------------------------
Merge pull request #637 from NRS-NewRisingSun/asic_modularize

New Mappers and modularization,

------------------------------------------------------------------------------------------------
libretro-flycast.mk ac32a8035243951803d8a8e9f45ad2317a2da91f # Version: Commits on Sept 12, 2025
------------------------------------------------------------------------------------------------
Fetch translations & Recreate libretro_core_options_intl.h,

---------------------------------------------------------------------------------------------
libretro-pc98.mk 02b08deb3833305251fb3ee6c5d59b0efb5b52ff # Version: Commits on Sept 13, 2025
---------------------------------------------------------------------------------------------
fix build error,

-----------------------------------------------------------------------------------------------
libretro-ppsspp.mk c7aa2bcd01c3aa4e243a7f9054ca5d1cedc67c23 # Version: Commits on Sept 12, 2025
-----------------------------------------------------------------------------------------------
Merge pull request #20808 from hrydgard/atlas-in-common

Move some image atlas generation logic to Common/Render,

----------------------------------------------------------------------------------------------
libretro-tic80.mk bae9661882c4086073967207e7f1dca59aed34b7 # Version: Commits on Sept 13, 2025
----------------------------------------------------------------------------------------------
[Linux] Associates .tic and .png to TIC-80 (#2835),

--------------------------------------------------------------------------------------------
glsl-shaders.mk 503b81c81ce2b10eab5b3cf34f33f91bee85c713 # Version: Commits on Sept 12, 2025
--------------------------------------------------------------------------------------------
crt-pi rectify some errors (#530)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants