Skip to content

Ed's pet PR with no flake retries#17831

Draft
edsantiago wants to merge 7 commits intocontainers:mainfrom
edsantiago:ci_sqlite
Draft

Ed's pet PR with no flake retries#17831
edsantiago wants to merge 7 commits intocontainers:mainfrom
edsantiago:ci_sqlite

Conversation

@edsantiago
Copy link
Copy Markdown
Member

This should make system tests work with the requested DB

Signed-off-by: Ed Santiago santiago@redhat.com

None

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2023
@edsantiago
Copy link
Copy Markdown
Member Author

Debian Build barfing with:

time="2023-03-17T14:06:47Z"
    level=error
    msg="reading system config \"/etc/containers/containers.conf\":
       decode configuration /etc/containers/containers.conf: toml: line 4: Key 'engine' has already been defined."

...but all other Build tasks (Fedora) pass. What is different about Debian TOML parser? And if Debian is different in this regard, what other Debian differences are there, and maybe one of those explains #17607?

@edsantiago
Copy link
Copy Markdown
Member Author

Oh, never mind; this is probably because Debian gets 'engine ... = runc'. TOML is annoying.

@edsantiago edsantiago force-pushed the ci_sqlite branch 2 times, most recently from 75d88a8 to d2c2cf4 Compare March 17, 2023 16:01
@edsantiago edsantiago marked this pull request as draft March 17, 2023 16:01
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2023
@edsantiago
Copy link
Copy Markdown
Member Author

It works! It works! Lots of failures:

Error: adding container id to database: database is locked

...but these happen in int tests, too, except that those get retried, these don't. @mheon PTAL.

@edsantiago
Copy link
Copy Markdown
Member Author

Oh, and @vrothberg, this minimizes the need for an envariable

@mheon
Copy link
Copy Markdown
Member

mheon commented Mar 17, 2023

The bash here scares me, but LGTM

@mheon
Copy link
Copy Markdown
Member

mheon commented Mar 17, 2023

# Error: adding container id to database: database is locked

That's new

@edsantiago
Copy link
Copy Markdown
Member Author

@mheon no, what I mean is, what do we do with this? Obviously we can't enable it, because sqlite isn't ready. Are these failures helpful to you? (And no, "database is locked" is not new, look at int tests logs for any PR this week. It's all over the place, just, int tests get retried so these are not usually hard failures.)

@mheon
Copy link
Copy Markdown
Member

mheon commented Mar 17, 2023

@edsantiago My understanding was that it is ready, and all tests should be passing. The fact that we're flaking frequently is news to me, but I haven't been spending much time in PRs this week.

@mheon
Copy link
Copy Markdown
Member

mheon commented Mar 17, 2023

Per SQLite docs it looks like internal concurrency issue across the same DB connection, which I thought their locking would handle... Looking further.

@edsantiago
Copy link
Copy Markdown
Member Author

Just looking at the very first int log I could find today, here ya go, search in-page for "database is locked". (Weird, though, those tests pass. The "UNIQUE constraint failed" errors are hard though. I assume you know all about those.

@mheon
Copy link
Copy Markdown
Member

mheon commented Mar 17, 2023

We have some pretty clear concurrency issues on display here. The UNIQUE constraint bits look like a DB creation race that I thought would be impossible, but should be fairly easy to code around. The DB locking one is far more problematic. @vrothberg From my reading here, we may need to start using separate DB connections within the same Podman process for each access, or otherwise introduce some sort of locking to prevent more than 1 access to the DB at a time, because the locking on the connection doesn't seem to be sufficient to prevent this sort of thing.

@edsantiago
Copy link
Copy Markdown
Member Author

I'm sorry. These errors are everywhere, constant, so I genuinely thought they were part of the WIP sqlite migration. Didn't even bother to file flakes for them. For the locking, would transactions help?

@mheon
Copy link
Copy Markdown
Member

mheon commented Mar 17, 2023

The issue appears to be that SQLite is allowing a write transaction to fire at the same time as another thread is reading the same table, resulting in errors. I don't think we want to wrap all reads in transactions for perf reasons, unfortunately.

The confusing bit is that this seems to contradict the sqlite docs, which say that by default all accesses through a single connection are sequential only. Simplest fix might just be added a mutex to control access and ensure that remains true.

Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing against your perl script but maybe it is not very well know: containers.conf supports the common .d config directory layout.
So instead of throwing this all in /etc/containers/containers.conf create a new file in /etc/containers/containers.conf.d/ with this entry. Seems much simpler to me.

@edsantiago
Copy link
Copy Markdown
Member Author

@vrothberg and I spent a while discussing conf options this week, including the .d possibility. The perl script isn't needed for that... but it will be needed at some point for testing containers.conf overrides. I'll follow up on Monday.

@vrothberg
Copy link
Copy Markdown
Member

Using .d should "just work" in CI even when a test sets a custom containers.conf via CONTAINERS_CONF.

@vrothberg
Copy link
Copy Markdown
Member

Using .d should "just work" in CI even when a test sets a custom containers.conf via CONTAINERS_CONF.

Actually not. Whenever CONTAINERS_CONF is set, only this file will be loaded and all system/user configs will be ignored.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Mar 20, 2023

We've been talking about CONTAINERS_CONF_D or something where all of the normal CONTAINERS_CONF processing would be done, but the value of then environment variable would be processed at the end, perhaps it is time to add that.

@vrothberg
Copy link
Copy Markdown
Member

We've been talking about CONTAINERS_CONF_D or something where all of the normal CONTAINERS_CONF processing would be done, but the value of then environment variable would be processed at the end, perhaps it is time to add that.

I am working on that at the moment :^)

edsantiago and others added 7 commits August 19, 2024 05:32
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
...it's useless clutter.

Signed-off-by: Ed Santiago <santiago@redhat.com>
There is no new version yet but we like to use the new code[1] to debug
a flake[2] in the podman CI. It will not fix it but the new error might
give us a better idea what is going on.

[1] checkpoint-restore/go-criu#175
[2] containers#18856

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. machine release-note-none

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants