Conversation
Codecov Report@@ Coverage Diff @@
## master #1512 +/- ##
=========================================
+ Coverage 53.96% 54.06% +0.1%
=========================================
Files 106 108 +2
Lines 18411 18489 +78
=========================================
+ Hits 9936 9997 +61
- Misses 7269 7274 +5
- Partials 1206 1218 +12Continue to review full report at Codecov.
|
2f30529 to
8926c5e
Compare
manager/scheduler/nodeset.go
Outdated
| switch { | ||
| case len(descriptor) > len(nodeLabelPrefix) && strings.EqualFold(descriptor[:len(nodeLabelPrefix)], nodeLabelPrefix): | ||
| if node.Spec.Annotations.Labels != nil { | ||
| value = node.Spec.Annotations.Labels[descriptor[len(nodeLabelPrefix):]] |
There was a problem hiding this comment.
IIRC, Docker engine allows label with empty string value. That wouldn't work in this case.
There was a problem hiding this comment.
In this code, an empty value would be treated as one of the values of the label. The alternative is to handle these specially somehow, but it would make the code more complicated. Any thoughts?
There was a problem hiding this comment.
One way is to keep empty value as a branch from a node (no change needed). Another way is to reject empty value, and add a log message of invalid preference label.
There was a problem hiding this comment.
I agree that both of those options could make sense. I disagree about logging in this case because it would cause an enourmous amount of log output unless you make sure that every node has every label that you are selecting on in any service.
I guess the question is what behavior we want. In the current code, an empty value is weighted equally with any of the other values. One advantage of doing it this way is that tasks don't become unschedulable if there are no nodes available with the labels set (you could always use constraints to limit the tasks to those nodes if you really want to).
The other approach would be to only allow scheduling the tasks on nodes that have the labels set, when the service calls uses those labels in a scheduling preference. This might be closer to what the user had in mind, but it's easier to end up in a bad situation (nothing can be scheduled), and hard to understand why that happens.
There was a problem hiding this comment.
it would cause an enourmous amount of log output unless you make sure that every node has every label that you are selecting on in any service.
I agree. If we take that approach, it should be done outside of the for loop, in a label validation procedure.
I think keeping empty value making it simpler for now. We can wait to see if it might cause problem.
manager/scheduler/nodeset.go
Outdated
| if node.Spec.Annotations.Labels != nil { | ||
| value = node.Spec.Annotations.Labels[descriptor[len(nodeLabelPrefix):]] | ||
| } | ||
| case len(descriptor) > len(engineLabelPrefix) && strings.EqualFold(descriptor[:len(engineLabelPrefix)], engineLabelPrefix): |
There was a problem hiding this comment.
This allows node.labels.az=az1 and engine.labels.rack=rack1 working together to form az.rack topology. It might add difficulty to debug error.
There was a problem hiding this comment.
I think this configuration should be supported. Any ideas on making it easier to debug?
There was a problem hiding this comment.
In that case we would need to ask user provide both node.labels. set and engine.labels. set. I think it ok.
| // TODO(aaronl): Support other items from constraint | ||
| // syntax like node ID, hostname, os/arch, etc? | ||
| default: | ||
| continue |
There was a problem hiding this comment.
If some nodes do not contain all the preferences, for example, node1 has node.labels.az=az1 and node.labels.rack=rack1, and node2 has only node.labels.rack=rack1. The path from root to node2 would have one less level. I think the algorithm handles it. It may be good to add comment here.
There was a problem hiding this comment.
The path from root to node2 would have one less level.
I don't think that's right. The levels of the tree are determined by the scheduling preferences, not the nodes themselves. If a node is missing a particular label, that level of the tree just has "" for the value, but there are the same number of levels for all nodes.
I'll add a comment to clarify this.
There was a problem hiding this comment.
You are right. It's a empty value on the node..
8926c5e to
c87f4f0
Compare
|
LGTM. |
c87f4f0 to
6d18096
Compare
|
Rebased on top of #1550, and made some changes to apply that optimization inside this code. Now the leaf nodes of the tree contain heaps that store up to |
6d18096 to
2358b27
Compare
2358b27 to
19fad95
Compare
19fad95 to
f18d016
Compare
|
Rebased. |
f18d016 to
9cc4c4b
Compare
| // so on. | ||
| for dt.nodeHeap.Len() > 0 { | ||
| heap.Pop(&dt.nodeHeap) | ||
| } |
There was a problem hiding this comment.
I feel like the heap structure is irrelevant now because the current state of heap cannot be used directly. On every pass we will sort all the nodes in this leaf node. Heapsort is a sorting implementation.
I haven't checked other places using heap. If it's similar, should we change the data structure to slice? How to implement sorting is separate issue and we can decouple them.
It's useful for generating the original list of nodes. Originally this PR made a slice with all nodes and used quickselect, but @LK4D4 suggested using a heap instead. It's a little more efficient because the heap only needs to contain up to n nodes at any one time, vs the slice which needs to start with all the nodes. Once we've built the tree, we could switch to a slice, but I had it continue to use the heap for sorting out of consistency. |
9cc4c4b to
d42beda
Compare
|
Rebased. |
|
Need rebase. |
d42beda to
5b7f29f
Compare
|
Rebased, thanks. |
|
LGTM |
|
LGTM. When will be this pr merged? |
|
I think this needs rebase though. @aaronlehmann |
5b7f29f to
3dcc0d7
Compare
|
Rebased. This didn't make it into Docker 1.13. Now that 1.14 development is happening, we can try to move it forward. The main thing we need to do is finalize the design, including the protobuf types. I'll work with @aluzzardi on this. |
3dcc0d7 to
43cc0d3
Compare
|
LGTM |
43cc0d3 to
5c6d877
Compare
5c6d877 to
67c0751
Compare
|
Rebased again. @aluzzardi @thaJeztah: Any thoughts on what the CLI for this should look like? The draft proposal had: This would allow some future expansion (i.e.
An alternative is to have two levels of nesting with different separators, for example: I don't like this much though. It feels very hard to read, and |
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
67c0751 to
d3b1cd8
Compare
|
ping @aluzzardi @thaJeztah @dnephin |
|
I think the most explicit version would be to repeat the flag. The order of flags on the command line is always significant. It is quite verbose, but I don't see any way around that. I think most |
|
Thanks for the input @dnephin. I like the simplicity of this solution. I have two concerns though:
|
Possibly. There is definitely precedence for the order of flags being significant.
True, this is something we already do for any flag that populates a list or mapping. In |
I think you meant precedent, but that's a nice pun ;).
Do you have any thoughts on using |
|
I would much rather stick with
|
|
Hm, okay, but I still think this case is a bit different from DNS servers or mounts. For those the order is usually not of primary importance, but with these "scheduling preferences" order is very crucial to what it's doing ("first spread tasks over the datacenters, then over the racks within the datacenters"). And what do you think about making the flag for Another thought: Using |
|
Ya, I don't have any idea of how to deal with that. All of this is much easier with a Compose file. |
|
Opened moby/moby#30725 for the Docker parts. |
Regarding the flags:
Regarding the syntax, I think it would be useful to sketch out what it would look like in compose and work our way down to the flags. @dnephin could you help out on this? what's the most compose-y way of representing those placement preferences? |
|
@aaronlehmann is it fair to say that our default (and always implicitly appended) placement preference is |
|
Design, Protos & Code LGTM We need to work out the flags though. @aaronlehmann Since the flags are |
Yes, and I don't see why it couldn't be overridden in the future. If we add support for "pack", then packing on
Hm, not sure where swarmctl comes in. I haven't implemented the flags for swarmctl, but I did implement them in moby/moby#30725 today based on the thread with @dnephin. But yes, I think we should go ahead and merge this, since the only area that still needs work is the flags, and this PR doesn't include flags. |
|
I think the Compose format would be something like this: deploy:
placement:
preferences:
- spread,engine.labels.dc
- spread,engine.labels.rackIf at some point there are more types, and they take extra properties we could add an expanded form: deploy:
placement:
preferences:
- type: spread
value: engine.labels.dc
weight: 4 |
I'm not a huge fan of the comma - it's not just any field, we're spreading based on the right hand side. how about Sorry for post-merge comment |
No problem at all about continuing to comment on this. No syntax is implemented in this PR and it's just a related discussion that's going on to inform the Docker PR.
Previously, @thaJeztah mentioned a preference for I think Any thoughts? I can update the Docker PR (moby/moby#30725) to use |
|
Mapping to @dnephin comment about short/long version, I was thinking something like: k/v options: porcelain: but maybe that's overkill, and we should go with the simpler approach you suggested: In which case, @dnephin compose snippets would look like: deploy:
placement:
preferences:
- spread: engine.labels.dc
- spread: engine.labels.rackdeploy:
placement:
preferences:
- spread: engine.labels.dc
weight: 4 |
|
@aluzzardi the "placement" is not supported in the Compose template V3.2 and included in the 17.04.0-ce-rc2. Is there any plan for the support? Thanks a lot |
This implements the proposal in #1473. Currently it's a non-randomized approach, but could be adapted to add random noise to the decisions it makes if that's the direction we decide we want to go.
There are two unit tests that validate this.
cc @dongluochen @aluzzardi