Skip to content

Delete/Cascade can be slow for larget databases with complex interdependencies #1113

@erikd

Description

@erikd
  • Linux
  • ghc 8.6
  • cabal-install 3.2
  • PostgreSQL backend

I have a very simple delete cascade function:

deleteCascadeSlotNo :: MonadIO m => Word64 -> ReaderT SqlBackend m Bool
deleteCascadeSlotNo slotNo = do
  keys <- selectKeysList [ BlockSlotNo ==. Just slotNo ] []
  mapM_ deleteCascade keys
  pure $ not (null keys)

It is not enforced by the schema, but I know that slotNo is unique. I have also confirmed (by adding debug) that the list of keys contains at most 1 element. However, the database has 25 tables (with more to come) with a relatively high level of linkage between the tables, in part to ensure proper delete/cascade behavior.

Currently, when deleteCascadeSlotNo is used on the block table (over 5 million entries) it only needs to delete the most recently added row or two. However, for debugging/testing I may want to delete many more rows and here is where we run into problems. For larger delete/cascade operations, the time taken increases dramatically.

Rows to delete Time taken
1000 0m44s
2000 1m25s
4000 3m04s
8000 7m14s
1600 13m18s

The speed is pretty much linear in the number of row to be deleted, but for large numbers of row (say 1 million) it can still be slow (30 minutes or more).

The slowness seem to be due to the fact the the operation requires a lot of back and forth communication between Haskell/Persistent and PostgreSQL. The delete/cascade operation would be much quicker if more of the work was handed off to Postgres.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions