Skip to content

config: ignore duplicated entries#173

Closed
johnlinp wants to merge 2 commits intologrotate:masterfrom
johnlinp:master
Closed

config: ignore duplicated entries#173
johnlinp wants to merge 2 commits intologrotate:masterfrom
johnlinp:master

Conversation

@johnlinp
Copy link

@johnlinp johnlinp commented Feb 7, 2018

Closes: #38

@johnlinp
Copy link
Author

There are 2 commits here. The first one is a refactor and it shouldn't change any behavior of logrotate. The second one is the real fix, and it is very intuitive because it's based on the refactor.

Here is what I did:

  • refactor commit: when parsing the glob pattern in the config file, I don't want to do the glob immediately. Instead, I just stored the glob pattern for later use. After all the logInfo is parsed, I do the glob for every logInfo sequentially. This is good because missingok can be handled right away instead of previous implementation:
/* We don't yet know whether this stanza has "missingok"
 * set, so store the error message for later. */
  • real fix commit: I can handle globoverride in the glob loop, similar to how it handles missingok.

Note: a small catch on this is that I have to store the config file path and line number for each logInfo since I will have to print out the error message to show which log config contains errors.

Any comment would be appreciated.

@kdudka
Copy link
Member

kdudka commented Feb 13, 2018

Do I understand it correctly that, whenever a glob in logrotate's configuration matches a log file and sets something (or even sets nothing), then the log file can never be matched again to override the previous settings?

I would expect it to work the other way around (such that later matches override the previous ones). That would be more compatible with how logrotate handles multiple configuration files:

Any number of config files may be given on the command line. Later config files may override the options given in earlier files, so the order in which the logrotate config files are listed is important.

... or with the principle that local definitions override global ones.

@johnlinp
Copy link
Author

johnlinp commented Feb 21, 2018

Sorry, I don't understand. I thought logrotate doesn't allow 2 config files specifying duplicate log files.

For example, I have a log at /var/log/some-app.log and 2 config files:

# a.conf
/var/log/some-app.log {
    rotate 1
    addextension .a
}
# b.conf
/var/log/some-app.log {
    rotate 1
    addextension .b
}

Then

# logrotate --force a.conf b.conf

will output:

error: b.conf:1 duplicate log entry for /var/log/some-app.log

and /var/log/some-app.log was rotated as /var/log/some-app.log.1.a according to a.conf.

It seems that b.conf didn't override the option. Did I miss something? Thanks.

@kdudka
Copy link
Member

kdudka commented Feb 26, 2018 via email

@johnlinp
Copy link
Author

johnlinp commented Feb 27, 2018

If I understand correctly, you want the following behavior:

For the log files /var/log/some-app-1.log and /var/log/some-app-2.log, and a logrotate config:

# some-app.conf

/var/log/some-app-1.log {
    rotate 1
    addextension .a
}

/var/log/some-app-*.log {
    rotate 1
    addextension .b
    globoverride
}

You want both /var/log/some-app-1.log and /var/log/some-app-2.log to have extension of .b because the later log config should override the earlier one. However, I don't think it is desired since the later log config should be a "catch-all", as @wyohm described in the comment:

So the above would rotate ...reqsize.log daily and everything else weekly.

In my current implementation, the above logrotate config will give /var/log/some-app-1.log the extension .a and give /var/log/some-app-2.log the exntension .b.

@kdudka
Copy link
Member

kdudka commented Feb 27, 2018

Then I would write those entries in the reverse order and still achieve the desired result:

/var/log/some-app-*.log {
    rotate 1
    addextension .b
    globoverride
}

/var/log/some-app-1.log {
    rotate 1
    addextension .a
}

Try the following two examples in a shell:

echo .b | tee /var/log/some-app-*.log
echo .a | tee /var/log/some-app-1.log
echo .a | tee /var/log/some-app-1.log
echo .b | tee /var/log/some-app-*.log

What will be the result?

@johnlinp
Copy link
Author

@kdudka I see your point. I will modify my patch to achieve this behavior. Thanks.

@wyohm
Copy link

wyohm commented Feb 27, 2018

Was just about to write out my understanding of it, but it looks like it's agreed.

However, thinking about this made me consider how included files will behave. So if there are two conf files in /etc/logrotate.d/ called a.conf and b.conf, will settings from a.conf still be in effect when b.conf is being processed? Or does the context get unwound between processing conf files?

@kdudka
Copy link
Member

kdudka commented Feb 27, 2018

@wyohm I do not think that file inclusion should change anything. You should get the same result as if you inlined all the included config files. Or am I missing something?

@wyohm
Copy link

wyohm commented Feb 27, 2018

@kdudka That answers it, thanks. I'm not thinking that it will change behavior. Just that it highlights behavior which may not be desirable. For instance, if I want to understand the behavior of /etc/logrotate.d/mariadb I'd ideally want to only need to read /etc/logrotate.d/mariadb and /etc/logrotate.conf. But it's possible that /etc/logrotate.d/httpd (processed first because of alphabetical order) has set a global directive or a pattern with globoverride which may affect what's going on in /etc/logrotate.d/mariadb.

@johnlinp
Copy link
Author

johnlinp commented Feb 27, 2018

Let's make the semantic meaning of globoverride clear:

In @wyohm's original comment, this directive means "this log config will do a glob search and take what are not handled in the earlier log configs".

In @kdudka's comment, this directive means "this log config will do a glob search and take all the glob results, but can be taken off by the later log configs".

Do I understand this correctly?

@kdudka
Copy link
Member

kdudka commented Feb 27, 2018

@wyohm You have a similar problem even now without the globoverride option. If any config file in /etc/logrotate.d sets a global option, it will have global effect, too. I have quickly looked at files in /etc/logrotate.d on my system and none of them sets global options. But it is a matter of policy. As far as I know logrotate itself does not guarantee that included files do not set global options.

@wyohm
Copy link

wyohm commented Feb 27, 2018

@johnlinp Yes, that was the meaning of my original comment. Though I now prefer the behavior @kdudka has proposed. What I proposed may be simpler to implement and possibly understand conf files (log files can be processed as soon as they are matched), but what @kdudka proposes allows for more flexibility since multiple matches can add behavior.

@kdudka
Copy link
Member

kdudka commented Feb 27, 2018

@johnlinp Yes. I think that the options should be applied to all matched log files. And, if globoverride is used, the log files can be matched again and any options can be added, or previously used options can be overridden.

@wyohm
Copy link

wyohm commented Feb 27, 2018

@kdudka Agreed, that issue (includes setting globals) currently exists. It's just that I only started thinking about it when considering the behavior of globoverride. It gives more opportunity for a problem. If anyone else thinks its an issue, I can write a feature request (e.g.unwind include context). If not, I can live with it.

@johnlinp
Copy link
Author

Hmm.. I wasn't expect that later log config can add options to the previously matched logs. I think that completely overriden would be better. I think it's too complex for a system administrator to understand or debug since he/she has to accumulate all the effects in all matched log configs. If the behavior is completely override, it's easier to understand because the system administrator only needs to find out which the last matched log config is.

For example, for the 2 log files /a-b.log, /a-c.log and the following logrotate config:

daily
rotate 1

/a-*.log {
    globoverride
    rotate 2
}

/a-b.log {
    weekly
}

If the behavior is adding options, to figure out the options for /a-b.log, one should:

  1. check global options: daily and rotate 1
  2. see the first log config, find out it matches the pattern: change rotate 1 to rotate 2
  3. now it's daily and rotate 2
  4. see the second log config, find out it matches the pattern again: change daily to weekly
  5. it's finally weekly and rotate 2

If the behavior is completely overriding options, to figure out the options for /a-b.log, one can just:

  1. check global options: daily and rotate 1
  2. find out the last log config matches the pattern, which is the second one: weekly
  3. put them together: weekly and rotate 1

In conclusion, I think the add would increase too much complexity and I suggest the completely override behavior. Thanks.

@kdudka
Copy link
Member

kdudka commented Feb 27, 2018

Let's keep it simple. Currently it works like this:

  • All logrotate config files are read in a specified order.
  • Global options take effect on all log files.
  • Local options take effect on log files matching the given glob.
  • Local options override global ones.
  • Later options override earlier ones.

We only need to (optionally) change one detail:

  • A single log file can be matched by multiple globs.

Of course, you will be able to completely override previously parsed daily by weekly because they are mutually exclusive. On the other hand, I do not think that nomail should invalidate already read weekly.

By added complexity you mean complexity of the implementation?

As for complexity of the configuration interface, I see it exactly opposite. Let's assume we have globoverride enabled globally. If you want to quickly check that ifempty or notifempty is in effect on some logfile, you just grep /etc/logrotate* for ifempty and you will immediately see, which of them apply to the given log file. Checking whether an unrelated nomail directive invalidated a previously accepted ifempty directive would be in my opinion more complex.

@johnlinp
Copy link
Author

By increasing complexity, I mean the complexity of the configuration interface. (Although I guess the complexity of the implementation would increase too)

Another question:

/*.log {
    daily
    globoverride
}

/a-*.log {
    weekly
}

/a-b-*.log {
    monthly
}

What option should /a-b-c.loghave? Is it weekly because the second log config override the first one and the third one cannot override the second one? Or, is it monthly because the second one inherits globoverride from the first one so it can be overriden by the third one?

@kdudka
Copy link
Member

kdudka commented Feb 27, 2018 via email

@johnlinp
Copy link
Author

@kdudka I see. I will start working on it. Thanks!

@johnlinp
Copy link
Author

johnlinp commented Mar 1, 2018

I would like to implement globoverride by the following algorithm in readConfigFile():

  • At reading glob pattern: print "duplicate log entry" error message only when the earlier log entry has no globoverride option.
  • At reading "}": check if the log files overlaps any of the earlier log configs. If any log file in this log config curr_log overlaps an earlier log config earlier_log, then:
    • Create a new empty log config new_log
    • Copy earlier_log to new_log (with copyLogInfo())
    • Use curr_log to override new_log (with to-be-implement overrideLogInfo())
    • Take off the overlapped log files from curr_log and earlier_log and store them in new_log
    • Insert new_log into the global linked list logs

This algorithm make sure every log file will only appear at most once in the global linked list logs.

Any advice will be appreciated. Thanks.

@kdudka
Copy link
Member

kdudka commented Mar 1, 2018

Thanks for the analysis! I will need to look at the current code closer myself. Unfortunately I am too busy this week. Please give me some time to process this. Anyway, logrotate-3.14.0 is going to be released next Friday. So this feature will be targeted for logrotate-3.15.0.

@johnlinp
Copy link
Author

johnlinp commented Mar 1, 2018

By the way, I think the indentation in the code is a mess. The inconsistent of indentation makes it harder to develop. I would like to ask:

  1. Can I make the indentation consistent in my commit?
  2. If yes, do you prefer spaces or tabs?

Thank you.

@Saruspete
Copy link

Hello,

There's way more lines starting with a tab than with a space, so I'd say tabs:

grep -Pc '^[ ]+' *.h *.c
log.h:1
logrotate.h:34
queue.h:109
config.c:159
log.c:2
logrotate.c:422

grep -Pc '^[\t]+' *.h *.c
log.h:0
logrotate.h:2
queue.h:216
config.c:1420
log.c:56
logrotate.c:1703

@johnlinp
Copy link
Author

@Saruspete Sorry for the late reply. Thanks for the statistics. I will use tab to reorganize the indent.

@kdudka
Copy link
Member

kdudka commented Mar 20, 2018

Sorry for being silent on this PR. I have not found a sufficient block of time to have a deeper look at this. As for (re)indentation and coding style changes beyond what is strictly necessary to implement this, those need to be discussed (and eventually pushed) independently of this PR.

The up to now practice was to resemble the indentation and coding style of the surrounding code. It seems to cause problems in long term and it might now be the right time to unify the indentation and/or coding style to some extent.

However, such changes need to be discussed via appropriate channels, which a pull request named "config: ignore duplicated entries" is not.

@johnlinp
Copy link
Author

@kdudka Okay. I will create an independent issue for the indentation problem.

@johnlinp
Copy link
Author

The issue was created at #188.

@Frank-Steiner
Copy link

Hi, I'd just like to ask about the state of this. We need to do major changes to our logrotate config due to the DSGVO and have to overwrite many of the default stuff that opensuse ships. It would make things very easy if we could just add confs to overwrite the handling of certain log files without needing to change the shipped defaults. The current logrotate-master branch still seems to have the old behaviour.
cu,
Frank

@johnlinp
Copy link
Author

@Frank-Steiner Since the code indentation in current code base was kind of messy, I was waiting for #188 to close. If you need this feature, I can try to develop in current code base.

@Frank-Steiner
Copy link

Yes, I understand that you'd like to know how to write the code before you start, so whatever you decide is fine with me. As soon as there's a patch I'll be happy to test it! Thanks!

@kdudka
Copy link
Member

kdudka commented May 24, 2018

Sorry for taking too long to review this PR. I have to prioritize bug fixes over enhancements. This PR changes fundamental parts of logrotate's code, so it needs to be reviewed carefully to make sure we are not introducing any bugs or compatibility issues in the future release of logrotate.

@Saruspete
Copy link

Hello,

I'd really like to see this PR merged.
The original ticket #38 is opened since 2016, and we're waiting its upstream merge to be back-ported to RH7 (customer case 02036032).

Could you see to prioritize it please ?
Thanks a lot !

@johnlinp
Copy link
Author

@Saruspete Sure. I am developing a new patch for the algorithm described in #173 (comment).

@Saruspete
Copy link

Nice @johnlinp thanks for your work ! :)

Having reread the thread, here's what I understood :

  • Original request is to have "most specific rule wins".
  • Current implementation is to skip the full block if any matched file is already handled in another rule. So the winner is the first configuration file to be read. Inclusion is done by alphanumerical sort
  • The globoverride would allow a first-timer rule to explicitely say "I can be overridden". If not, any rule coming afterwards (from sorting order) would generate an error (like currently)
  • If a more generic (wildcard) future rule is the glob with "globoverride", as previous, more specific ones are not flagged "globoverride", that would generate an error.
  • So naming of files is important, as it defines their priority in the processing list (and subsequent duplicates blocks to be skipped). However, packages just provide it with their name.

In the unix world, we are already used to prefix / suffix our configuration files to enforce alphanumeric ordering (and export LC_ALL=C to ensure consistency...). but we either prefix the whole listing with numbers (like in rc.d) or suffix the defaults we want with ".local" (like fail2ban). As packages just push a configuration file matching their program / package name, we're likely to use suffixes to keep things clean.

The suffix is useful so that packages updates don't generate wild ".rpmnew" files (that would be interpreted too), they just replace the packaged one, and reads the user suffixed afterwards.

With that in mind, the rules specified by user (the suffixed ones) are often discovered after the original (packaged) ones.

If we'd want to say "I want this new rule to override the existing / package provided one", acting more like a "OverrideExisting".
If we'd want a "catch all not already matched" rule, we'd may just declare a wildcard glob with "SkipExisting" to say "If you find a duplicate file, skip me, keep using the old one".

Wouldn't that cover more real-world cases ?

@kdudka
Copy link
Member

kdudka commented Oct 2, 2018

I have not looked closely at the implementation yet but let's not over-complicate things. The configuration rules need to be intuitive and as simple as possible. What about the following rules?

  • If globoverride (or whatever the configuration directive will be) is not used, everything continues to work as before.
  • If globoverride is used for certain log files (either globally or by matched glob), it is possible to incrementally specify options for the affected log files by subsequent glob matches, possibly overriding any previously specified options.
  • The order in which configuration files are read stays unchanged.
  • The order in which glob results are processed stays unchanged (and by no means affect what is overridden and what not).

Am I missing anything?

@johnlinp
Copy link
Author

johnlinp commented Oct 8, 2018

@kdudka I found that the algorithm described in my previous comment cannot be easily done because the function overrideLogInfo() is impossible to be implemented with current struct logInfo definition. The reason overrideLogInfo() cannot be implemented is because struct logInfo contains lots of fields that cannot be determined if they are set by default value or set by config file.

For example, the field threshold defaults to be 1024 * 1024. And I have 2 log info:

  • log_1: has threshold of 512 * 1024
  • log_2: has threshold of 1024 * 1024

When I want to override log_1 with log_2, I have to determine whether log_2.threshold is a default value or set by user config file. If it's set by user config file, I should use it to override log_1.threshold. Otherwise (it's set by default) I should not use it.

However, current definition of struct logInfo does not contain such information about whether a field is set by default or set by user config file, so I am not able to implement the overrideLogInfo() function. May I add such information in struct logInfo? Will it change too much code for you?

@kdudka
Copy link
Member

kdudka commented Oct 8, 2018

I feel like I am repeating myself but why do we need all the complexity? Cannot we just go through all config files and apply options incrementally as we read them?

@johnlinp
Copy link
Author

johnlinp commented Oct 9, 2018

@kdudka Since a log config may override multiple log configs, it can produce multiple struct logInfos. Therefore, if you want to go through all config files and apply options incrementally as we read them, we should update a list of struct logInfo instances instead of only one (i.e. newlog in current implementation). Do you think this is a better way to do it? I was worried that the error handling might be complicated.

@kdudka
Copy link
Member

kdudka commented Oct 24, 2018

@johnlinp I had finally time to look at the code. I see what you are talking about -- your questions are pretty valid. However, there is one more question that needs to be discussed before we proceed with the implementation: What should we do with sharedscripts?

As it works now, each configuration section is associated with a set of globs and the actual set of files produced by expansion of the globs. If a prerotate/postrotate script runs with sharedscripts, it gets a line-separated list of globs as its first argument. However, if we implement the globoverride feature, there will not always be 1:1 mapping between the sets of globs and sets of files.

If we only update the set of files and keep the original set of globs while overriding a previous configuration, some files might be processed by multiple prerotate/postrotate scripts when sharedscripts is enabled. That is certainly not what the user expects. They should be able to override previously specified prerotate/postrotate script instead. The problem is that the current script interface is simply not ready for that.

Would it make sense to pass line-separated list of files (instead of globs) to prerotate/postrotate scripts when both sharedscripts and globoverride are in effect?

@johnlinp
Copy link
Author

@kdudka I didn't notice there was a sharedscripts feature. Should we move the discussion for sharedscripts to the issue #38?

@kdudka
Copy link
Member

kdudka commented Oct 25, 2018

Good idea, moved: #38 (comment)

@johnlinp
Copy link
Author

Accoding to the discussion in #38, I'm closing thie PR since I won't spend more time on it :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

allow several matches for one log file, most specific wins

5 participants