New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swift: introduce usage of binlog #12745
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design looks good aside from a few comments. I'll do a more thorough review once those are addressed.
swift/third_party/binlog/patches/01-change-severity-printing.patch
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience!
| A log file is produced for each run under `CODEQL_EXTRACTOR_SWIFT_LOG_DIR` (the usual DB log directory). | ||
|
|
||
| You can use the environment variable `CODEQL_EXTRACTOR_SWIFT_LOG_LEVELS` to configure levels for | ||
| loggers and outputs. This must have the form of a comma separated `spec:level` list, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer if we called it min_level?
| matching logger names or one of `out:bin`, `out:text` or `out:console`, and `level` is one of `trace`, `debug`, `info`, | ||
| `warning`, `error`, `critical` or `no_logs` to turn logs completely off. | ||
| Current output default levels are no binary logs, `info` logs or higher in the text file and `warning` logs or higher on | ||
| standard error. By default, all loggers are configured with the lowest output level. Logger names are visible in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"lowest output level" is ambiguous - is it the lowest level (i.e. trace and above) or the lowest output (i.e. no_logs)?
| @@ -151,7 +152,13 @@ class SwiftDispatcher { | |||
| return *l; | |||
| } | |||
| waitingForNewLabel = e; | |||
| // TODO: more generic and informational visiting one-line log | |||
| if constexpr (std::is_convertible_v<E, const swift::ValueDecl*>) { | |||
| const swift::ValueDecl* x = e; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that was meant to be temporary to get the IDE to autocomplete stuff for me, I think I'll just drop this log and leave a TODO
| // TODO: more generic and informational visiting one-line log | ||
| if constexpr (std::is_convertible_v<E, const swift::ValueDecl*>) { | ||
| const swift::ValueDecl* x = e; | ||
| LOG_TRACE("{}", x->getName().getBaseIdentifier().str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add some context here? E.g. make the message "Visiting declaration: {}"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely, though I think I will be moving this kind of tracing to the translators, where I also have programmatic access to the swift entity type names because of macro metaprogramming. But I'll leave it for a PR down the road, and for now remove this draft log
| return dflt; | ||
| } | ||
|
|
||
| const char* getEnvOr(const char* var, const char* deflt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We use deflt here and dflt in getLevelFor. Can we pick one for consistency?
| // as we are configuring logging right now, we collect problems and log them at the end | ||
| std::vector<std::string> problems; | ||
| collectSeverityRules("CODEQL_EXTRACTOR_SWIFT_LOG_LEVELS", | ||
| {sourceRules, binary.level, text.level, console.level, problems}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to follow the data flow through LevelConfiguration&& configuration. I think it'd be clearer if collectSeverityRules were a member function that took a var and returned the problems (and had the other arguments hard-coded).
| namespace codeql { | ||
|
|
||
| // tools should define this to tweak the root name of all loggers | ||
| extern const char* const logRootName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make this a string_view?
| ret += *env; | ||
| ret += '\n'; | ||
| } | ||
| ret.pop_back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Maybe keep the newline to make it more obvious when the multi-line log message ends in a text log?
| static auto argDump(int argc, char** argv) { | ||
| std::string ret; | ||
| for (auto arg = argv + 1; arg < argv + argc; ++arg) { | ||
| ret += *arg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not for this PR) Have we considered using Abseil or Boost string libraries? This looks like a job for absl::StrJoin or boost::algorithm::join, and I wouldn't be surprised if we have other places that could benefit similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think I wanted to bring in abseil at some point, would be worth it I think
| size_t strSize() const { | ||
| if (id_ == undefined) return 17; // #ffffffffffffffff | ||
| if (id_ == 0) return 2; // #0 | ||
| return /* # */ 1 + /* hex digits */ static_cast<size_t>(ceil(log2(id_ + 1) / 4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: The floating-point log looks painful. Are any of the following better options?
- C++20
std::bit_width; - Abseil's backport
absl::bit_width; - A
while (id >>= 4) digits+=4;-type loop (which unfortunately compiles to a literal shift/add/cmp loop instead of using LZCNT); - Return a fixed-size, null-terminated buffer from
str()(e.g.std::array<char, 18>) and usestrlen(str().data()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
via the mserialize specialisations this is also used for every log. That's why I wanted to avoid option 4, to avoid each label being converted twice for each log (once to calculate its size, another to store it). Maybe I'll leave a todo to switch to bit_width, whichever comes first between abseil and C++20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did say "optional", so a todo sounds good. That said, I wouldn't make any bets on a floating-point log being faster than an int->hex conversion!
This introduces a logging module in the Swift extractor, with some first examples.
This is built on top of binlog, though it introduces more configurability: while binlog allows to set a global minimal severity of logs, this patch allows to set minimal log levels for:
Loggerinstances), e.g.extractor/dispatcherorextractor/loggingThe logging macros will pick up the
Loggerinstance to use by callinglogger(), so the writer for a specific area of the code can be put in place by either:Logger& logger()function within the scopeLogger loggerclass member (instance or static), asLogger::operator()()returns*thisLoggercalledloggerI've been hesitant to have this implicit reference to a default logger name as opposed to explicitly passing a logger reference to the logging macros, but for the moment decided for this, as in my experience passing an explicit logger instance always ends up being a repetitive馃檪
logorloggeradditional argument everywhere in the code, not particularly adding anything to readability. But comments are welcomeSome first actual logging has been introduced at the startup of the extractor, in the dispatcher printing names of encountered declarations. The more significant addition is logging of all emitted entities, leveraging
codegento create the proper binlog adapters. This creates logs likeLogging levels are configurable per output and per
Loggerinstance via theCODEQL_EXTRACTOR_SWIFT_LOG_LEVELSenvironment variable. As a complex example:will turn off generation of a text log file, redirecting all logs to standard error, but will make all
Loggers only write warnings or above, except for trap emission logs which will output all logs.For the moment as a default:
Once this is a bit more mature we should experiment with the binary file format on real world projects, to see if we can get any benefit from it. For the moment I propose to stick with textual format to minimize initial friction. The binary format can be read either with a globally installed
breadbinary or withbazel run @binlog//:bread -- /absolute/path/to/log.blog.