Skip to content

Locking in NodeEnvironment is completely broken #13264

@rmuir

Description

@rmuir

after the lucene 5.3 upgrade, i looked at how ES uses lucene's filesystem locking. most places are ok, obtaining a lock and doing stuff in a try/finally. However NodeEnvironment is a totally different story. Can we fix the use of locking here?

  1. deleteShardDirectorySafe is anything but safe. it calls deleteShardDirectoryUnderLock which doesn't actually delete under a lock either!!!! It calls this bogus method: acquireFSLockForPaths which acquires then releases locks. Why? Why? Why?
  2. assertEnvIsLocked is only called under assert. why? Look at findAllIndices, its about to do something really expensive, why can't the call to ensureValid be a real check?
  3. assertEnvIsLocked has a bunch of leniency, why in the hell would it return true when closed or when there are no locks at all, thats broken.

After this stuff is fixed, any places here doing heavy operations (e.g. N filesystem operations) should seriously consider calling ensureValid on any locks that are supposed to be held. It means you do N+1 operations or whatever but man, if what we are doing is not important, then why are we using fs locks?

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed/EngineAnything around managing Lucene and the Translog in an open shard.>bugdiscuss

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions