Skip to content

Conversation

@sliverc
Copy link
Contributor

@sliverc sliverc commented Nov 18, 2016

Goal is when a write happens on api reading should still be possible and multiple processes should run simultaneously.

To accomplish this following needed to change:

  • collection factory needs to be created for each routine (no more global access to it)
  • removing of read and write mutexes
  • making database changes atomic per resource using batch
  • per resource locking
  • new task api introduced which supports ability for async calls
  • all lengthy processes are async and handled through the task api

Example how the new task API works:

(1) POST /api/repos/:name/snapshots ...
Snapshot repository. This will return status code 202 to
indicate that process is accepted but not finished yet.
Additionally will task id be returned to follow status of
running task.

(2) GET /api/tasks/.../output
Get log of currently running task

(3) GET /api/tasks/.../wait
Waits till task is finished

This also solves the problem with mirror api integration as task can run in the background (as mentioned in #166 ).

Mirror api integration based on this PR see #573

As this is a lengthy PR once its core concept is agreed upon I will be happy to update the documentation as well.

Checklist

  • unit-test added (if change is algorithm)
  • functional test added/updated (if change is functional)
  • man page updated (if applicable)
  • bash completion updated (if applicable)
  • documentation updated
  • author name in AUTHORS

@smira
Copy link
Contributor

smira commented Mar 16, 2017

I think allowing concurrent DB access is not enough - we need to make sure package pool and published repositories are safe to be used concurrently (which is not the case today), so DB lock protects today implicitly package pool/published repos.

@sliverc
Copy link
Contributor Author

sliverc commented Mar 17, 2017

Yes this is not enough. Our approach to this has actually developed over time. Now we do a locking on a task basis and each task can define what resources it requires.

I haven't gotten to update the PR... will do so when I find time but if you wanna have a look see master branch of our fork:
https://github.com/adfinis-forks/aptly/

@smira
Copy link
Contributor

smira commented Mar 17, 2017

That sounds interesting, and I think locking on per-resource level might be the way to go here...

@sliverc sliverc changed the title Non Locking API Non locking and async API May 22, 2017
@sliverc
Copy link
Contributor Author

sliverc commented May 22, 2017

@smira
I have finally gotten around to update this PR. I have updated the description to better explain what this PR does.

In general are there two core concepts:

  1. locking per resource
  2. task api for async job support

Let me know what you think.

@sliverc sliverc mentioned this pull request May 22, 2017
6 tasks
@smira
Copy link
Contributor

smira commented May 22, 2017

Thanks, I'll schedule it for post-1.1.0 as this looks scary enough for planned June release (already a lot of changes queued).

@sliverc
Copy link
Contributor Author

sliverc commented May 23, 2017

I agree properly better to postpone it and polish it a bit more. What I want to note is though that we already use this code in production and it proved to be stable.
If you have any comments on the implementation let me know so I can have a look.

@sliverc sliverc force-pushed the rm_api_locking branch 2 times, most recently from c05f09b to e46568c Compare July 7, 2017 07:27
@sliverc
Copy link
Contributor Author

sliverc commented Aug 10, 2017

@smira I have updated PR so it should cleanly apply to master again. As version 1.1.0 is now released it would be great if you could comment on this PR what you think and have we shall move forward to possible integrate into the next version. I am happy to adjust the code as needed.

@sliverc sliverc force-pushed the rm_api_locking branch 2 times, most recently from 40dc02b to 4a249e5 Compare October 9, 2017 10:35
@sliverc sliverc mentioned this pull request Nov 20, 2017
6 tasks
@sliverc sliverc force-pushed the rm_api_locking branch 3 times, most recently from 5a34d47 to 9d7ef2c Compare December 1, 2017 08:20
@hsitter
Copy link
Contributor

hsitter commented Apr 30, 2018

I love the idea.

Considering this changes return values of existing endpoints I'd say this needs some compatibility adjustments.

Specifically, everything should get an additional new route /api/v2/..., the existing routes in /api/... internally call the new v2 handlers but instead of returning the task they wait for it to finish and return the result. Meaning POST /api/repos/:name/snapshots continues to behave the way it does right now in production. When sending to the new v2 endpoints POST /api/v2/repos/:name/snapshots you get the task id instead. So, old endpoints stay the same, new v2 endpoints change to tasks.

Food for thought: I am wondering if a websocket isn't the more "modern" and flexible approach to this. i.e. instead of getting a taskid the client gets a websocket URI, then the client connects to the socket to get task updates from it. Calling /wait with a super long HTTP GET timeout seems somewhat underwhelming. Also, with a socket it'd probably be very easy to implement "streaming" of the output log of the task on the client side. Whereas right now you either have to GET /output numerous times while the task is running (getting mostly the same data over and over again), or have no output until /wait returns and get it all at once.

Taking it a step further, I could imagine this enabling us to put the CLI ontop of the API, so you could actually operate a remote aptly instance via your local CLI! Well, one can dream 😸

@sliverc
Copy link
Contributor Author

sliverc commented Apr 30, 2018

@apachelogger Totally agree that we should make this change in a backwards compatible fashion and the way you outlined should be straight forward.

I love the idea of web sockets... I think though we should make using web sockets on the api only an additional feature and not required per default though so simple REST client script are still possible.

This way we can also get the async api merged and add web sockets in a future release (which would then also let the CLI use the API).

@codecov
Copy link

codecov bot commented May 17, 2018

Codecov Report

Merging #459 into master will increase coverage by 0.3%.
The diff coverage is 65.66%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #459     +/-   ##
=========================================
+ Coverage   60.64%   60.94%   +0.3%     
=========================================
  Files          50       54      +4     
  Lines        6276     6422    +146     
=========================================
+ Hits         3806     3914    +108     
- Misses       2033     2081     +48     
+ Partials      437      427     -10
Impacted Files Coverage Δ
deb/changes.go 59.91% <ø> (+0.17%) ⬆️
context/context.go 11.72% <0%> (-0.4%) ⬇️
deb/publish.go 64.45% <100%> (+0.78%) ⬆️
deb/package_collection.go 54.83% <100%> (+3.32%) ⬆️
deb/local.go 92.78% <100%> (+5.65%) ⬆️
task/task.go 100% <100%> (ø)
database/leveldb.go 79.24% <100%> (-2.12%) ⬇️
task/resources.go 28.57% <28.57%> (ø)
task/output.go 51.85% <51.85%> (ø)
task/list.go 75.75% <75.75%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a704de...52d4336. Read the comment docs.

@sliverc sliverc mentioned this pull request May 22, 2018
6 tasks
@sliverc sliverc modified the milestone: 1.4.0 Jun 20, 2018
Oliver Sauder added 7 commits July 6, 2018 15:06
@smira
Copy link
Contributor

smira commented Jul 29, 2019

I wonder if it could be HTTP/2 Push events in the end.

I want to look into this more, but I don't see why exactly we can't make async return optional? So that sync/async options are both available. In the end in the local network environment, there's nothing unusual about setting HTTP response timeout to say 10 minutes and waiting for API completion in one call. Long polling, async response makes more sense when API is not local and we can't control timeouts on the way to aptly service endpoint.

@smira
Copy link
Contributor

smira commented Jul 29, 2019

I would like to try to see if it's possible to split into a smaller PRs and commit them one by one, with thorough review for each one.

smira added a commit to smira/aptly-fork that referenced this pull request Aug 1, 2019
This is spin-off of changes from aptly-dev#459.

Transactions are not being used yet, but batches are updated to work
with the new API.

`database/` package was refactored to split abstract interfaces and
implementation via goleveldb. This should make it easier to implement
new database types.
smira added a commit to smira/aptly-fork that referenced this pull request Aug 1, 2019
This is spin-off of changes from aptly-dev#459.

Transactions are not being used yet, but batches are updated to work
with the new API.

`database/` package was refactored to split abstract interfaces and
implementation via goleveldb. This should make it easier to implement
new database types.
smira added a commit to smira/aptly-fork that referenced this pull request Aug 7, 2019
This is spin-off of changes from aptly-dev#459.

Transactions are not being used yet, but batches are updated to work
with the new API.

`database/` package was refactored to split abstract interfaces and
implementation via goleveldb. This should make it easier to implement
new database types.
smira added a commit that referenced this pull request Aug 8, 2019
This is spin-off of changes from #459.

Transactions are not being used yet, but batches are updated to work
with the new API.

`database/` package was refactored to split abstract interfaces and
implementation via goleveldb. This should make it easier to implement
new database types.
smira added a commit to smira/aptly-fork that referenced this pull request Aug 9, 2019
For any action which is multi-step (requires updating more than 1 DB
key), use transaction to make update atomic.

Also pack big chunks of updates (importing packages for importing and
mirror updates) into single transaction to improve aptly performance and
get some isolation.

Note that still layers up (Collections) provide some level of isolation,
so this is going to shine with the future PRs to remove collection
locks.

Spin-off of aptly-dev#459
smira added a commit that referenced this pull request Aug 10, 2019
For any action which is multi-step (requires updating more than 1 DB
key), use transaction to make update atomic.

Also pack big chunks of updates (importing packages for importing and
mirror updates) into single transaction to improve aptly performance and
get some isolation.

Note that still layers up (Collections) provide some level of isolation,
so this is going to shine with the future PRs to remove collection
locks.

Spin-off of #459
smira added a commit to smira/aptly-fork that referenced this pull request Sep 3, 2019
Part of PR aptly-dev#459

This prepares for more methods to be exposed via the API.
sliverc pushed a commit that referenced this pull request Sep 3, 2019
Part of PR #459

This prepares for more methods to be exposed via the API.
@lbolla lbolla closed this Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants