Concept for the Codeberg CI #78
Labels
No labels
accessibility
bug
bug
infrastructure
Codeberg
contributions welcome
docs
duplicate
enhancement
infrastructure
legal
licence / ToS
please chill
we are volunteers
public relations
question
question
user support
s/Forgejo
s/Forgejo/migration
s/Pages
s/Weblate
s/Woodpecker
security
service
upstream
wontfix
No milestone
No project
No assignees
15 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Codeberg/Community#78
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
So, I'm basically planning to build a CI which is integrated with Gitea and can handle all limitations and requirements of Codeberg. This is the proposed concept, I'd be glad to get a lot of feedback!
Requirements
The plan
→ many repositories already have a Dockerfile, so they would build automatically. If that's not wanted for a repository, a
.cifilewould override theDockerfile; an empty.cifilewould basically disable the CI for this repository.POST /repos/{owner}/{repo}/statuses/{sha}to set the build status of a commitAnnotations could look like this:
The resulting graph would then be that one:
Plugins
We could link to other Dockerfile templates (e.g. using pongo2) to run specific tasks:
Finances
I'd suggest to limit by CPU cores, memory and parallel builds, an example based on the current membership fees, with Hetzner CX21 build servers, could look like this:
I'm not sure about how wrong my calculations are, and have no overview about our general finances, but it seems like it would be possible to support a CI system like that.
I really don't like the idea to misuse the dockerfiles for annotations. This could lead to unwanted errors or problems with docker itself.
I would prefer a separate file like the proposed
.cifileto configure CI/CD.Also, I don't really get what you mean with "Automatically deploy & allocate servers on Hetzner"? Why not think about the possibility to host own servers? What if I want to use a different hoster?
Otherwise, the possibility to get a server from codeberg, if I don't want to host my own, sounds like a good idea.
Do you want to build this CI server only for codeberg or should this be a standalone product and you want to provide integration in gitea? Can users with self hosted gitea instances use this CI server?
The
.cifilewould just be another Dockerfile that replaces the main one in CI builds for repositories which don't want the main Dockerfile to specify the CI behaviour. I think using the mainDockerfileby default would lead to more people using the CI "by accident", which hopefully leads to an "oh, that just works" moment.In my opinion, Dockerfiles are a great way to specify what to do when building an application in an isolated way - if they fail, the build failed, if they succeed, the build was successful. I think annotations are required for CI features not possible with only Docker, but you're right that they should not break the normal Docker build process.
Parallel tasks are probably the biggest feature that would break Docker, and maybe was a bit overengineered by me, so maybe we could leave that away - to keep further compatibility,
@ifcould be replaced with@include <filename> if <condition>, so it would only apply to CI builds anyways.@dois probably unnecessary if there are easy-to-use Docker images for the most common CI tasks (e.g. withFROM codeberg.org/ci/create-releaseandRUN create-release ...).That would leave us with the following annotations:
@artifact- I think it's important that builds can have files attached; it not being executed by Docker wouldn't be a problem@push- After the Dockerfile has been built, it will need to be pushed somewhere - we could make that a repository setting, or automatically push tocodeberg.org/<user>/<repo>as an alternative.@include- This leaves a possibility to import another file only within the CI when using the Dockerfile directly, or under certain circumstances. Maybe the blocks should be removed though, so only other files could be included.Codeberg is currently hosted on Hetzner AFAIK, so that would be the simplest way here to provide build resources to users - I'd propose to make it flexible enough so other hosters could be added later though.
My idea would be to have a "standalone" server, and a "master" server - the master server wouldn't run any jobs by itself (but would host the build output pages and artifacts), and would then start/stop new servers as needed (or use a fixed set of standalone servers when hosted manually), and distribute the tasks across the servers.
The plan is to keep to the requirements of Codeberg, but in the end it should be compatible with any Gitea instance, or basically any Git server that supports statuses and receive hooks.
So if I have a normal
Dockerfilewithout any of the annotations the CI will work also?That's the idea - the annotations are just to do extra stuff Docker itself is not capable of within the CI.
@momar : thumbs up, great concept. A few random notes, by no means intended to derail anything:
Most critical issue imo is the UI integration with gitea, this will be probably be the harder part.
Assuming API calls to launch VMs are nicely wrapped and isolated in a module, various backends shall be straightforward to integrate, including VMs on own servers (libvirt&friends come to mind, there are surely other options?). This seems to be a logical 2nd step.
Docker is nice and popular, but a huge number of large and very popular projects running CI unit tests do not use docker (llvm, gcc, +all compilers, tensorflow, mxnet, +all deep learning frameworks, just to name a few examples -- they are not at Codeberg yet, but we surely want to keep the door open;).
Otoh, a native script can always invoke docker. Having the .ciconfig for example as simple script (or some format embedding the build+test script) is far more flexible, even simpler to implement.
@Rinma :
Yes, this is long-term the most economical option, for sure. Also guarantees a fixed cost budget, and compute pool we can fairly distribute between projects.
Please don't hesitate to contact us if helpful.
What about hosting drone.io and integrating Gitea with it?
The alternative would be Jenkins.
Integrating with Gitea is just adding "Jenkins" user as one of member of project. And jenkins/drone.io would have another login page.
The problem is that we'd then need a solution to set up an environment - the CI scripts for some of them might work best under Debian, and for others under Ubuntu. With Docker, they can choose, add the files they need and run the scripts they need with minimal effort:
As Docker is an established standard to set up an isolated environment, I think it's suited perfectly as a basis for a CI - projects who don't want to offer Docker support could still use the example above as a
.cifileand could skip writing documentation for Docker.For a native script, we'd have to think about isolation and a lot more (so, every build gets its own VM? how about caching of build steps? what about providing multiple distributions? and so on...)
In my experience, Jenkins needs to be constantly updated to be secure and is mostly suited for single-user/-company deployments. Drone is a great alternative, and I thought a lot about how it could be used with Codeberg, but we'd need to add a possibility to limit resources per user (I think that's not yet possible), and to add/remove servers on demand (which probably somehow would work with Kubernetes). I also miss the possibility to host build artifacts - maybe we could add this to Drone.
So yeah, Drone would probably be the easiest and fastest to set up (although we might need to add some features ourselves), while my idea would be using a more widespread standard (Drone also uses Docker, but additionally requires a pipeline file) and could integrate more tightly with a Codeberg Docker Registry (which I think is also being planned in some issue).
Maybe we could also use Drone, but if there's no .drone.yml but a Dockerfile, use a default .drone.yml:
@varshitbhat :
Technically very appealing, indeed.
And strategically? For a startup the most common outcome is binary: success and getting bought out or gone bust. Both scenarios seem problematic in the future ... even if the source code is public, which developer or community would take it over and continue to maintain it? Isn't an organically grown developer community preferable?
Jenkins is oldfashioned but really great for single projects (have used this at scale and a lot in the past). Unfortuntately it lacks the necessary built-in security measures, and as a mature project it might be somewhat hard to add them ex-post (running arbitrary code on the build nodes, not necessarily isolated in VMs or containers)?
cc: @kolaente who brought this up elsewhere
@momar :
This is actually pretty similar to docker or lxc containers. For provisioning tools like virt-install the environment is specified by command line arguments: a .ciconfig-parser would, after checking and sanitizing the input, pass the appropriate parameters to the tool.
A VM can be snapshot'd and suspended/resumed. The overhead compared to user-space containers like docker or lxc is mostly the kernel (nowadays still small in relation to build tools etc).
On a second thought, docker containers can run arbitrary scripts as well (commonly exemplified in first-step tutorials), so the practical difference is maybe not that big, and both solutions are probably workable for projects.
What do you think is the best approach for UI-integration with Gitea? Do you have a commit-status-API in mind to render the results, or something else?
IMHO we should avoid building an own ci as much as we can. Drone took a few years to get to the point where it is now, and it has a whole community behind it (nothing against the codeberg community - it just is a huge effort to make).
I would prefer to have a tighter integration with drone in Gitea (And I think this is the general view on this among Gitea's maintainers) instead of building our own thing.
We, as a global open source community (this includes codeberg, drone, Gitea) should stand together and don't double-spend ressources on things which are already solved in that community by reinventing the wheel. Instead, we should improve the tools we all use in way everyone can profit.
I tweeted about this before: https://twitter.com/kolaente/status/1159588122914643968
Ok, so let me start of with a few things. It looks like you've put a ton of thought into this, and please don't feel like I am dismissing this in any way. While I am one of the project leads of Gitea (for verification you can see the email in my profile), I am speaking in a personal capacity right now. I also have contributed some code to Drone (to aid in the integration between the two projects). There are also several others on the Gitea team who contribute to drone (one even has merge access to the main Drone repo). I also have built an integration for Gitea with another CI (buildkite).
With my background on this subject on this subject established, I agree with what @kolaente has said, in that we (Gitea) should focus on the core project, as with a small team building yet another CI is not doable, but we can instead work on better integration with it.
If you (and the Codeberg team) do decide to go ahead, that is still great news. As with Gitforges (Gitea, Gitlab, pagure, etc..) there are many, however each have their own project goals. I caution against starting a new opensource project, when what your goals may already be met by other projects.
Maybe you could even build an interface between Gitea and Drone that parses your custom Docker image and generates a drone file, and also handles the allowed amount of builds, and build minutes per user, and then passes it to drone. I also have a half built integration between docker registry and Gitea that I could open source which may help with hosting docker artifacts (I just need to find time to clean it up, and fix some minor issues).
Some general thoughts about the longevity of Drone, I see drone the company as already being successful as it has paying customers that appear to be enough to support all of Drone's development (at least paying Brad a salary). These are personal guesses on Drones financial stability, and I have no secret insight into anything. I think if Drone gets bought by a company, that will be ok as there are many community contributors to Drone, which can continue its development.
Hm, you're right on the one hand (no need to reinvent the wheel), but it's still unclear to me how we should manage servers and resources with Drone.
We'll probably need a layer anyways between Gitea and whatever CI is being used in the end, which would mean that we could build plain .drone.yml files with Drone, and Dockerfiles with a wrapper around Drone (to keep their UI & Gitea integration).
For server resources, I see 3 possibilities:
I won't get to start working on this until October, so I'll be open to more ideas.
Perhaps you could reach out to Brad as he has experience monitoring Drone use at scale. He supposedly has automated systems that monitor for abuse (lots of cryptominers, and people building custom personal android ROMS). He had to also disable cron tasks.
As for using their own cloud, Drone now has "ssh-runners" which means users could use their own servers for builds. This also means you don't need to hand out your agent secret to potentially untrusted users. I know in docker you can also limit CPU/RAM allocated to each container, maybe Drone has that concept so that users don't consume 100% of resources.
Does anyone knows a good solution how to currently use / connect an ci service with codeberg. I really could need it.
Yes, Thomas Leister did a great write-up here: https://thomas-leister.de/en/drone-ci-with-codeberg/.
FWIW, Jenkins connects to codeberg out-of-the box too (and every other build service that uses plain GIT SCM polling).
Thank you!
In the short term, I'd suggest to host per-user (or per-organization) controller instances of Drone (or maybe Concourse, which offers a lot more possibilities, but isn't integrated with Gitea as well) - that way, everyone could get his own Codeberg-hosted controller node (e.g. at https://ci.codeberg.org/momar/) and could then configure it with his own worker nodes (which could later be automated through Codeberg and e.g. the Hetzner API: I pay Codeberg 5€/mo, and they create a server with everything I need).
Possibly, we could offer worker nodes (for Drone or Concourse) through something like Firecracker VMs, but the question is how much money we wan to budget for this, and how the user experience with that budget would turn out.
Obviously, linking to manuals for setting up Drone or Jenkins would be a great idea too for the start, but I think we should also offer something where we can guarantee a great integration (e.g. with OAuth & build indicators, access rights and maybe a direct link from the repository). Seems like Drone would currently support this the best.
Hm, all this is possible right now (see Thomas Leister's link above and the drone documentation)?
The per-user application+secret setup required for the connection seems annoying, tho. Long-term a tighter integration seems sensible.
Github-actions are really nice in that they support at least Linux/windows/mac.
Are you also considering to support cross-platform development?
Your plan to use docker hints to something at least heavily centered around Linux...
At least for the time being, it might be worthwhile to focus on integrating 3rd party CIs seamlessly.
@hunger you can run the Drone CI pipeline on native Linux/Windows/Mac if needed.
https://docs.drone.io/runner/exec/overview/
Docker for CI is a terrible choice as it's very limited in terms of functionality so it makes cross-platform tests resource intensive as docker has to be set up to use virtual machine which makes it next to unusable due to the amount of required processing power
Consider https://sr.ht/ which is using kvm by default and allows for tests on specified kernels and distributions.
I was also told that we could use jenkins instance which is what devuan is doing on their gitea instance.
FWIW my recommendation would be to make the current gitea triggers more polished so that they can be more fine grained when it comes to repository events.
For example to use sourcehut on gitea i need to trigger a CI on a new merge request that has been requested to be reviewed.
DroneCI seems cool!
Related: https://github.com/go-gitea/gitea/issues/13539
I do not have sufficient experience with this, but based on my deployment of drone.io with gitea internally at dayjob, where it works rather nicely to process technical docs (customized publican workflow with asciidoc→docbook→pdf) for the past 8 months, here are my 2c:
If, for example, drone.io was to be used (has nice integration with gitea), have a central place for the drone.io "servers", to be run by and at Codeberg. These must run all the time for projects where enabled, but are only to actually distribute the tasks and run the web-interface (show logs...), so these are likely very low demand servers. As they are exposed to port 443, it would probably require subdomains (user.codeberg.xyz). Now, anything low CPU demand might become high CPU demand based on scale, i lack the bigger [Codeberg] picture here.
Then, have projects/users to use their own runners, where code processing happens (the server address for the runners is part of drone.io docker config). This is not part of gitea settings so it would need to be added, together with some more config info:
I do not know how to spawn a custom on demand instance at Amazon or Hetzner just for the worker (then shut it down), which would be minutes/hours per month, rather then full instance running all the time, but this would make it very cheap and accessible to most projects/users and these workers absolutely do not need to run all the time (consume energy and so on), only during commits (master/branches), depending on config. This config/api call would probably also need to be part of something to be passed to the drone.io server, but i am not sure if drone.io supports this mode of operation.
Maybe what i am describing is what k8s or OpenCloud are solving, excuse my ignorance.
For my testing with a repo at Codeberg itself, i used cheapest Hetzner CX11 instance and it works well and is cheap. What i do not know about are repercussions of running drone.io pretty much out in the open, so i closed it down after testing. If anyone has an experience with drone.io, please share :)
I like drone.io, but if we want to offer it as a service, we should be wary of this: https://siliconangle.com/2020/08/05/harness-buys-drone-io-automate-software-delivery/
They seem to plan making some features enterprise features (https://harness.io/pricing/), and some of them would probably be needed for a hosted CI service.
Running a single instance for each user seems a bit complex, I'd much rather see worker orchestration incorporated into the tool.
It seems like the viable other alternatives are Screwdriver.cd, Concourse and Agola, which all incorporate options to select a worker, but none of them seems to do so securely so we can't just give people the right to create their own worker. :/
The easiest option would be to host all builds ourselves (and maybe require application per organization), but that obviously would be the most expensive option.
@momar
Not having to run your own server (as a user) is what I see as the whole point of a convenience like this. Otherwise, wouldn't I just set up a webhook on Codeberg to call my server when a new commit has arrived?
The alternative is what GitLab does - everyone can run a worker node for his project with a single command on his own server, but the CI is integrated tightly into the web UI.
If we want to and can host the build nodes (and I think that's actually planned), that would of course make it a lot easier - we can just have a single Drone instance with some workers and some additional limitations in place to prevent abuse.
Related: https://github.com/go-gitea/gitea/issues/13539
Regarding caching:
It seem that you can't cache encrypted traffic usefully.
See: https://www.irif.fr/~jch/software/polipo/
I'm not sure if Ubuntu and debian repos are still HTTP(without S).
But you can configure a docker image mirror. When our CI is based on docker, like Drone CI, that would be useful.
Some links i found in a quick search:
https://docs.docker.com/registry/recipes/mirror/
https://circleci.com/developer/orbs/orb/cci-x/docker-registry-image-cache
More research needed.
In relation to docker hub mirror, here is some documentation on how to handle that w/ Drone: https://discourse.drone.io/t/docker-hub-rate-limits-and-drone-cloud-exemption/8293/4
In today's meeting there was discussion around possibly limiting CI access to specific people (possibly just members), as we lack resources to offer unlimited CI for everyone. From a technical standpoint, w/ Drone we could use the admission controller: https://docs.drone.io/extensions/admission/ to limit who is able to login to a Drone install (Note: this is if we go with Drone, althoguh I'm sure other CI's have similar ways to limit access).
I'm generally for making some new features Member-only first, until we're sure that we can handle the traffic well.
Is there already a sort of testing environment up for Codeberg CI?
No, discussion in concept stage at this point.
Ok i see. I assume, that this discussion-issue is the place.
As far as I have understood, the drone system is a favorite but it has problems with propritary code inside. Jenkins would be an option but the user interface is of concern. Docker would be too limited. Gilab CI is the gold standard. Have I forgotten any option?
Also I am wondering, what the goal is (for the moment).
It could be:
Place a configuration file inside the repository, and then automatic build is activated for any tagged commit (maybe use a pattern for tags). The resulting files (binary packages .deb, .apk etc) are automatically generated and if successfully build deployed into the releses section of that repository.
Additionally it would be beneficial if tests can be performed on every commit to indicate if the compile process would fail or not with that commit.
We see problem with:
I am not a fan of reinventing the wheel, but I like the priciple "keep it as simple as possible". Maybe the gold standard cannot be reached at the moment, but what about just a small step in that direction? Docker can do for simple projects: compose a .pdf from .asciidoc, or compile a small commandline tool for linux. This could already be useful and has the charming advantage (over all other solutions9 that it is stupid simple.
The goal should basically be, basically as you said,
Place a configuration file inside the repository, and then automatic build is activated for any branch, PR or tagged commit. Creating a release should be a possible instruction in the configuration file (e.g. through tea), not neccessarily an inherent feature of the CI.I have recently started a project at https://codeberg.org/momar/codeberg-ci, summarizing the ideas from this issue - it's using existing many technologies, but still would require lots of implementation ourselves. As a result, it would support:
Docker has a similar problem - it requires doing lots of stuff ourselves, and then the effort for Docker and libvirt is pretty much similar.
OK, great. If you need someone to test it, I volunteer (with my project X11-Basic https://codeberg.org/kollo/X11Basic). I already have created a Dockerfile which automates the build process for debian/jessie.
Why was this issue closed?
Because I couldn't see any big use in keeping the discussion open. We had the CI/CD feedback discussion and this one, now we're about to realize something.
Most significant ideas from this discussion made it into https://codeberg.org/momar/codeberg-ci (with issues, too), so if you need to comment on something feel free to do it there (or here, no problem, but we're probably keeping issues open for too long which comes in pretty annoying at times, so it was closed).
Please also note that this issue is 2 years old and while there are many valid ideas, the original idea is not quite what we are about to work on now. So rather than requiring a user to read through a lot of discussion that lead us somewhere, it might be a good idea to have a fresh start at some time (like with the CI/CD survey).
Also, some Codeberg members are now working with the Drone fork Woodpecker (and are also creating PRs and so on), which means that the work is currently focused there. It looks like the easiest way right now to go ahead with it, and some ideas from my CI repo might make it into Woodpecker.
As mentioned somewhere else, it would be nice to have codeberg related repos hosted in the official codeberg` org and not personal repos - would make it easier to find those, as even team members don't seem to know about it.
we don't have to put any user-contributed stuff under the codeberg org IMO, especially if it's not a collaborative effort (yet). There are more repos which have been superseded by other ones, there is a lot of abandoned stuff that might never get into production ...
And everything more people are working on // which got an official priority is in the Codeberg Org AFAIK.