Skip to content

Share A Single File System Instance In HadoopTableOperations #92

@mccheah

Description

@mccheah

We shouldn't use Util.getFS every time we want a FileSystem object in HadoopTableOperations. An example of where this breaks down is if file system object caching is disabled (set fs.<scheme>.impl.disable.cache). When such caching is disabled, a long string of calls on HadoopTableOperations in quick succession will create and GC FileSystem objects very quickly, leading to degraded JVM behavior.

An example of where one would want to disable file system caching is so that different instances of HadoopTableOperations can be set up with FileSystem objects that are configured with different Configuration objects - for example, configuring different Hadoop properties when invoking the data source in various iterations, given that we move forward with #91. Unfortunately, Hadoop caches file system objects by URI, not Configuration, so if one wants different HadoopTableOperations instances to load differently configured file system objects with the same URI, they will instead receive the same FileSystem object back every time, unless they disable FileSystem caching.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions