Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

PARQUET-1265: Segfault on static ApplicationVersion initialization#452

Closed
majetideepak wants to merge 3 commits intoapache:masterfrom
majetideepak:PARQUET-1265
Closed

PARQUET-1265: Segfault on static ApplicationVersion initialization#452
majetideepak wants to merge 3 commits intoapache:masterfrom
majetideepak:PARQUET-1265

Conversation

@majetideepak
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Apr 11, 2018

Another solution is to avoid using a regex for parsing a well-known version:

diff --git a/src/parquet/metadata.cc b/src/parquet/metadata.cc
index 91304cf..c3c9809 100644
--- a/src/parquet/metadata.cc
+++ b/src/parquet/metadata.cc
@@ -32,11 +32,11 @@
 namespace parquet {
 
 const ApplicationVersion ApplicationVersion::PARQUET_251_FIXED_VERSION =
-    ApplicationVersion("parquet-mr version 1.8.0");
+    ApplicationVersion("parquet-mr", 1, 8, 0);
 const ApplicationVersion ApplicationVersion::PARQUET_816_FIXED_VERSION =
-    ApplicationVersion("parquet-mr version 1.2.9");
+    ApplicationVersion("parquet-mr", 1, 2, 9);
 const ApplicationVersion ApplicationVersion::PARQUET_CPP_FIXED_STATS_VERSION =
-    ApplicationVersion("parquet-cpp version 1.3.0");
+    ApplicationVersion("parquet-cpp", 1, 3, 0);
 
 template <typename DType>
 static std::shared_ptr<RowGroupStatistics> MakeTypedColumnStats(
@@ -448,6 +448,10 @@ std::shared_ptr<const KeyValueMetadata> FileMetaData::key_value_metadata() const
 
 void FileMetaData::WriteTo(OutputStream* dst) { return impl_->WriteTo(dst); }
 
+ApplicationVersion::ApplicationVersion(const std::string& application, int major,
+                                       int minor, int patch)
+    : application_(application), version{major, minor, patch} {}
+
 ApplicationVersion::ApplicationVersion(const std::string& created_by) {
   boost::regex app_regex{ApplicationVersion::APPLICATION_FORMAT};
   boost::regex ver_regex{ApplicationVersion::VERSION_FORMAT};
diff --git a/src/parquet/metadata.h b/src/parquet/metadata.h
index 7850358..4223014 100644
--- a/src/parquet/metadata.h
+++ b/src/parquet/metadata.h
@@ -74,6 +74,7 @@ class ApplicationVersion {
 
   ApplicationVersion() {}
   explicit ApplicationVersion(const std::string& created_by);
+  ApplicationVersion(const std::string& application, int major, int minor, int patch);
 
   // Returns true if version is strictly less than other_version
   bool VersionLt(const ApplicationVersion& other_version) const;

(the advantage here is that it doesn't break the API)

@majetideepak
Copy link
Copy Markdown
Author

I think the "static initialization order fiasco" (https://isocpp.org/wiki/faq/ctors#static-init-order) can happen if the static variables are initialized via a constructor as well.
Your suggested change is definitely an improvement over calling the boost regex library.

static const ApplicationVersion PARQUET_251_FIXED_VERSION;
static const ApplicationVersion PARQUET_816_FIXED_VERSION;
static const ApplicationVersion PARQUET_CPP_FIXED_STATS_VERSION;
static const ApplicationVersion& PARQUET_251_FIXED_VERSION();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to break the API anymore by making these functions rather than attributes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep these as functions to avoid "static initialization order fiasco"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

"""In short, suppose you have two static objects x and y which exist in separate source files, say x.cpp and y.cpp. Suppose further that the initialization for the y object (typically the y object’s constructor) calls some method on the x object."""

That doesn't seem to apply here (there are no external static objects involved in the initialization of the application versions). That said, I'm ok with leaving these as methods if you think it's more future-proof.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my idea is to keep it future-proof.

@robertnishihara
Copy link
Copy Markdown

Thanks for the PR! I was running into this issue, and this seems to solve it for me.

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Apr 11, 2018

There's a test failure on Travis-CI, I don't know if it's related:
https://travis-ci.org/apache/parquet-cpp/jobs/365217462#L3704-L3714

@majetideepak
Copy link
Copy Markdown
Author

It is related. I am investigating.

@robertnishihara
Copy link
Copy Markdown

Any reason not to merge this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants