feat(parquet): Move footerOffset into FileMetaData#217
feat(parquet): Move footerOffset into FileMetaData#217zeroshade merged 4 commits intoapache:mainfrom
Conversation
zeroshade
left a comment
There was a problem hiding this comment.
I like the idea, just some nitpicks
parquet/metadata/metadata_test.go
Outdated
| serialized, err := faccessor.SerializeString(context.Background()) | ||
| assert.NoError(t, err) | ||
| faccessorCopy, err := metadata.NewFileMetaData([]byte(serialized), nil) | ||
| faccessorCopy, err := metadata.NewFileMetaData([]byte(serialized), 0, nil) |
There was a problem hiding this comment.
Shouldn't it be invalid to use 0 for the footer offset?
parquet/metadata/file.go
Outdated
| func (f *FileMetaData) Size() int { return f.metadataLen } | ||
|
|
||
| // SourceSz is the total size of the source file | ||
| func (f *FileMetaData) SourceSz() int64 { return f.footerOffset } |
There was a problem hiding this comment.
Because the FileMetadata can potentially be separate from the file itself, it might make more sense to name this differently or at least use a different description for the comment.
parquet/metadata/file.go
Outdated
| // NewFileMetaData takes in the raw bytes of the serialized metadata to deserialize | ||
| // and will attempt to decrypt the footer if a decryptor is provided. | ||
| func NewFileMetaData(data []byte, fileDecryptor encryption.FileDecryptor) (*FileMetaData, error) { | ||
| func NewFileMetaData(data []byte, footerOffset int64, fileDecryptor encryption.FileDecryptor) (*FileMetaData, error) { |
There was a problem hiding this comment.
The only thing I'm not a fan of here is that this is a breaking change. Now, I don't think there are many (if any) packages that are importing and using this function directly (as creating new file metadata would primarily be done via the Writer rather than done directly) but it's still a concern that I'd rather not create a breaking change if we can avoid it.
Is there a way we can avoid this being a breaking change at all? If 0 is valid, maybe we create a setter?
There was a problem hiding this comment.
Yes, I forgot the backwards compatibility. Can we name the setter/getter function like:
// SetSourceFileSize get the total size of the source file from meta data.
func (f *FileMetaData) SetSourceFileSize() int64 { return f.sourceFileSize }
// GetSourceFileSize set the total size of the source file in meta data.
func (f *FileMetaData) GetSourceFileSize() int64 { return f.sourceFileSize }
// file.go
type FileMetaData struct {
....
// sourceFileSize is not a part of FileMetaData, but it is mainly used to parse meta data.
// Users can manually set this value and they are responsible for the validity of it.
sourceFileSize int64
}
parquet/metadata/file.go
Outdated
| // SetSourceFileSize get the total size of the source file from meta data. | ||
| func (f *FileMetaData) GetSourceFileSize() int64 { return f.sourceFileSize } | ||
|
|
||
| // GetSourceFileSize set the total size of the source file in meta data. | ||
| func (f *FileMetaData) SetSourceFileSize(sourceFileSize int64) { f.sourceFileSize = sourceFileSize } |
There was a problem hiding this comment.
doc strings are reversed currently, gotta swap those 😄
Rationale for this change
After looking into the code, I found that
footerOffsetis mainly used in parsing the meta data of parquet file.I think we can store it in
FileMetaData, so we don't need to getfooterOffsetagain when we useWithMetadata.What changes are included in this PR?
Move
Reader.footerOffsetintoFileMetadataAre these changes tested?
Are there any user-facing changes?