Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift: introduce usage of binlog #12745

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

redsun82
Copy link
Contributor

@redsun82 redsun82 commented Apr 3, 2023

This introduces a logging module in the Swift extractor, with some first examples.

This is built on top of binlog, though it introduces more configurability: while binlog allows to set a global minimal severity of logs, this patch allows to set minimal log levels for:

  • the different outputs (currently three: binary log file, textual log file and standard error)
  • the different sources (Logger instances), e.g. extractor/dispatcher or extractor/logging

The logging macros will pick up the Logger instance to use by calling logger(), so the writer for a specific area of the code can be put in place by either:

  • providing a Logger& logger() function within the scope
  • providing a Logger logger class member (instance or static), as Logger::operator()() returns *this
  • having a local variable, reference or parameter of type Logger called logger

I've been hesitant to have this implicit reference to a default logger name as opposed to explicitly passing a logger reference to the logging macros, but for the moment decided for this, as in my experience passing an explicit logger instance always ends up being a repetitive log or logger additional argument everywhere in the code, not particularly adding anything to readability. But comments are welcome 馃檪

Some first actual logging has been introduced at the startup of the extractor, in the dispatcher printing names of encountered declarations. The more significant addition is logging of all emitted entities, leveraging codegen to create the proper binlog adapters. This creates logs like

2023-03-31 10:19:49.287373955 TRAC [extractor/non_empty.swift.trap] codeql::CallExpr{ id: #9, type: #a, function: #b, arguments: [#14, #21, #26] } (TrapDomain.h:22)

Logging levels are configurable per output and per Logger instance via the CODEQL_EXTRACTOR_SWIFT_LOG_LEVELS environment variable. As a complex example:

export CODEQL_EXTRACTOR_SWIFT_LOG_LEVELS=out:console:trace,out:text:no_logs,*:warning,*.trap:trace

will turn off generation of a text log file, redirecting all logs to standard error, but will make all Loggers only write warnings or above, except for trap emission logs which will output all logs.

For the moment as a default:

  • the binary log file is disabled
  • the textual log file prints info or higher
  • warning or higher get printed to standard error

Once this is a bit more mature we should experiment with the binary file format on real world projects, to see if we can get any benefit from it. For the moment I propose to stick with textual format to minimize initial friction. The binary format can be read either with a globally installed bread binary or with bazel run @binlog//:bread -- /absolute/path/to/log.blog.

@redsun82 redsun82 requested a review from sashabu April 3, 2023 12:30
@github-actions github-actions bot added the Swift label Apr 3, 2023
Copy link
Contributor

@sashabu sashabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design looks good aside from a few comments. I'll do a more thorough review once those are addressed.

swift/extractor/infra/log/SwiftLogging.h Outdated Show resolved Hide resolved
swift/extractor/infra/log/SwiftLogging.h Outdated Show resolved Hide resolved
swift/extractor/infra/log/SwiftLogging.h Outdated Show resolved Hide resolved
swift/extractor/infra/log/SwiftLogging.h Outdated Show resolved Hide resolved
swift/extractor/infra/log/SwiftLogging.h Show resolved Hide resolved
swift/extractor/infra/log/SwiftLogging.h Outdated Show resolved Hide resolved
@redsun82 redsun82 marked this pull request as ready for review April 5, 2023 04:14
@redsun82 redsun82 requested review from a team as code owners April 5, 2023 04:14
@redsun82 redsun82 requested a review from sashabu April 5, 2023 04:14
Copy link
Contributor

@sashabu sashabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience!

A log file is produced for each run under `CODEQL_EXTRACTOR_SWIFT_LOG_DIR` (the usual DB log directory).

You can use the environment variable `CODEQL_EXTRACTOR_SWIFT_LOG_LEVELS` to configure levels for
loggers and outputs. This must have the form of a comma separated `spec:level` list, where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be clearer if we called it min_level?

matching logger names or one of `out:bin`, `out:text` or `out:console`, and `level` is one of `trace`, `debug`, `info`,
`warning`, `error`, `critical` or `no_logs` to turn logs completely off.
Current output default levels are no binary logs, `info` logs or higher in the text file and `warning` logs or higher on
standard error. By default, all loggers are configured with the lowest output level. Logger names are visible in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lowest output level" is ambiguous - is it the lowest level (i.e. trace and above) or the lowest output (i.e. no_logs)?

@@ -151,7 +152,13 @@ class SwiftDispatcher {
return *l;
}
waitingForNewLabel = e;
// TODO: more generic and informational visiting one-line log
if constexpr (std::is_convertible_v<E, const swift::ValueDecl*>) {
const swift::ValueDecl* x = e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate variable here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that was meant to be temporary to get the IDE to autocomplete stuff for me, I think I'll just drop this log and leave a TODO

// TODO: more generic and informational visiting one-line log
if constexpr (std::is_convertible_v<E, const swift::ValueDecl*>) {
const swift::ValueDecl* x = e;
LOG_TRACE("{}", x->getName().getBaseIdentifier().str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add some context here? E.g. make the message "Visiting declaration: {}"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, though I think I will be moving this kind of tracing to the translators, where I also have programmatic access to the swift entity type names because of macro metaprogramming. But I'll leave it for a PR down the road, and for now remove this draft log

return dflt;
}

const char* getEnvOr(const char* var, const char* deflt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We use deflt here and dflt in getLevelFor. Can we pick one for consistency?

// as we are configuring logging right now, we collect problems and log them at the end
std::vector<std::string> problems;
collectSeverityRules("CODEQL_EXTRACTOR_SWIFT_LOG_LEVELS",
{sourceRules, binary.level, text.level, console.level, problems});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to follow the data flow through LevelConfiguration&& configuration. I think it'd be clearer if collectSeverityRules were a member function that took a var and returned the problems (and had the other arguments hard-coded).

namespace codeql {

// tools should define this to tweak the root name of all loggers
extern const char* const logRootName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Make this a string_view?

ret += *env;
ret += '\n';
}
ret.pop_back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Maybe keep the newline to make it more obvious when the multi-line log message ends in a text log?

static auto argDump(int argc, char** argv) {
std::string ret;
for (auto arg = argv + 1; arg < argv + argc; ++arg) {
ret += *arg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not for this PR) Have we considered using Abseil or Boost string libraries? This looks like a job for absl::StrJoin or boost::algorithm::join, and I wouldn't be surprised if we have other places that could benefit similarly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think I wanted to bring in abseil at some point, would be worth it I think

size_t strSize() const {
if (id_ == undefined) return 17; // #ffffffffffffffff
if (id_ == 0) return 2; // #0
return /* # */ 1 + /* hex digits */ static_cast<size_t>(ceil(log2(id_ + 1) / 4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: The floating-point log looks painful. Are any of the following better options?

  1. C++20 std::bit_width;
  2. Abseil's backport absl::bit_width;
  3. A while (id >>= 4) digits+=4;-type loop (which unfortunately compiles to a literal shift/add/cmp loop instead of using LZCNT);
  4. Return a fixed-size, null-terminated buffer from str() (e.g. std::array<char, 18>) and use strlen(str().data()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via the mserialize specialisations this is also used for every log. That's why I wanted to avoid option 4, to avoid each label being converted twice for each log (once to calculate its size, another to store it). Maybe I'll leave a todo to switch to bit_width, whichever comes first between abseil and C++20 馃檪

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did say "optional", so a todo sounds good. That said, I wouldn't make any bets on a floating-point log being faster than an int->hex conversion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants