Closed
Conversation
Project: openstack/glance 1a9e0e94c36127d525c41f03c4223c3e0c2eda03
null
Check first matching rule for protected properties
When using roles to define protected properties, the first matching rule
in the config file should be used to grant/deny access. This change
enforces that behaviour.
References bug 1271426
Change-Id: Id897085f93bcd143ec443f477f666d4cabd77567
(Cherry-picked from b6dd538569ebf0f1580c8e1fadc5e0f8054c9b08)
Conflicts:
glance/common/property_utils.py
glance/tests/etc/property-protections.conf
glance/tests/unit/common/test_property_utils.py
Project: openstack/nova c09a51cf2675d2821f8c8275ed23aaa23aeeca3c null Setup destination disk from virt_disk_size When running live-migration --block-migrate on a qcow2 backed VM without cow image, the destination qcow2 file should be created with the virtual disk size. For raw images, the virt_disk_size is set to disk_size to ensure that virt_disk_size is always the size of the disk that should be re-created. Update unit tests to be more strict and check for sizes to be correct. Closes-Bug: #1257355 Change-Id: Ie3be46024f06b9f59af92f5e3918a1958386d4f1
Project: openstack/keystone 73968f733d073f03a6451644997b7fca6e92d041 null Remove netifaces requirement netifaces is not required to run the tests, so remove it from the requirements. Related-Bug: #1266513 Change-Id: Ifb3b262f47d629670b06c670353dbe798af4dc03
Project: openstack/keystone 9056b66a2b537fc5da9f42a6ac0ee82384b965b2
null
list_revoked_tokens sql speedup for havana
This consists of the following 3 patches:
Narrow columns used in list_revoked_tokens sql
Currently the SQL backend lists revoked tokens by selecting all of the
columns, including the massive "extra" column. This places a significant
burden on the client library and wastes resources. We only need the
id/expired columns to satisfy the API call.
In tests this query was several orders of magnitude faster with just two
thousand un-expired revoked tokens.
(cherry picked from commit ab7221246af394f24e47484e822b8dcda37411aa)
Add index to cover revoked token list
The individual expires and valid indexes do not fully cover the most
common query, which is the one that lists revoked tokens.
Because valid is only ever used in conjunction with expires, we do not
need it to have its own index now that there is a covering compound
index for expires and valid.
Note that he expires index is still useful alone for purging old tokens
as we do not filter for valid in that case.
(cherry picked from commit dd2c80c566f20a97a22e0d7d5a514be84772a955)
Remove unused token.valid index
Because valid is only ever used in conjunction with expires, we do not
need it to have its own index now that there is a covering compound
index for expires and valid.
Note that he expires index is still useful alone for purging old tokens
as we do not filter for valid in that case.
(cherry picked from commit 5d8a1a41420aa20d2aa21da6311c9d55b9e373b6)
Change-Id: I04d62b98d5d760a3fbc3c8db61530f7ebccb0a48
Closes-Bug: #1253755
Project: openstack/ceilometer 6524bf3834025d6368b6cd5e4cb0c265bd5fad02 null Replace mongo aggregation with plain ol' map-reduce Fixes bug 1262571 Previously, the mongodb storage driver an aggregation pipeline over the meter collection in order to construct a list of resources adorned with first & last sample timestamps etc. However mongodb aggregation framework performs sorting in-memory, in this case operating over a potentially very large collection. It is also hardcoded to abort any sorts in an aggregation pipeline that will consume more than 10% of physical memory, which is observed in this case. Now, we avoid the aggregation framework altogether and instead use an equivalent map-reduce. Change-Id: Ibef4a95acada411af385ff75ccb36c5724068b59 (cherry picked from commit ba6641afacfc52e7391d2095751ee96d62a64c25)
Project: openstack/nova a3d861f6fe561faeb87d940ac4112c98e57a4ec1 null Fix interface-attach removes existing interfaces from db The following commit 394c693e359ed4f19cc2f7d975b1f9ee5500b7f6 changed allocate_port_for_instance() to only return the ports that were created rather than all of the ports on the instance which broke the attach-interface code. This patch fixes this issue by remove the sync decorators from: allocate_for_instance, allocate_port_for_instance, and deallocate_port_for_instance to just _build_instance_nw_info which is called in all these cases instead. Closes-bug: #1223859 (cherry picked from commit 1957339df302e2da75e0dbe78b5d566194ab2c08) Conflicts: nova/network/neutronv2/api.py Change-Id: I66eb0c0ab926e0a8d1e2c9cfe1f7fd579ea3aa27
Project: openstack/cinder f02d4fed0c5b3019449dbf8cf81fff1e64337aa1 null GlusterFS: Ensure Cinder can write to shares Ensure the Cinder user can write to the GlusterFS share. This is required for snapshot functionality, and means the admin does not have to set this permission manually. Conflicts: cinder/tests/test_glusterfs.py Closes-Bug: #1236966 Change-Id: I4a9ea40df9681ca6931ad6b390aa21b09d6cfec9 (cherry picked from commit 371fa540600b20b97eae389e1f976145866cadae) GlusterFS: Complete snapshot_delete when info doesn't exist The snapshot_delete operation will fail if the snapshot info file doesn't contain a record for the snapshot, or does not exist. This happens in cases such as when snapshot_create fails to commit anything to disk. The driver should allow the manager to delete the snapshot in this case, as there is no action required for the driver to delete anything. Closes-Bug: #1252864 (cherry picked from commit d8a11168c908fe6c6a07fbb30a5bc88a6df6e939) Change-Id: I8686a1be09dbb7984072538bff6c026bb84eeb52
Project: openstack/nova 2c44ed7587703fdc5d2be00a092d7b671982d609 null VMware: fix bug when more than one datacenter exists In the case that there was more than one datacenter defined on the VC, then spawning an instance would result in an exception. The reason for this was that the nova compute would not set the correct datacenter for the selected datastore. The fix also takes care of the correct folder selection. This too was a result of not selecting the correct folder for the data center. The 'fake' configuration was updated to contain an additional data center with its on datastore. Closes-Bug: #1180044 Closes-Bug: #1214850 Co-authored-by: Shawn Harsock <hartsocks@vmware.com> (cherry picked from commit a25b2ac5f440f7ace4678b21ada6ebf5ce5dff3c) Conflicts: nova/tests/virt/vmwareapi/test_vmwareapi.py nova/virt/vmwareapi/fake.py Change-Id: Ib61811fffcbc80385efc3166c9e366fdaa6432bd
Project: openstack/cinder 240c81d00a49f924e1b9257fee76a7d924246c57 null GlusterFS: Use correct base argument when deleting attached snaps When deleting the most recent snapshot, the 'file_to_merge' field which translates into the base= field for libvirt's blockRebase call in Nova must be set depending on whether other snapshots exist. If there are no other snapshots, base = None, which results in libvirt clearing the qcow2 backing file pointer for the active disk image. If there are other snapshots, pass the parent of the file being deleted as the new base file. The snapshot info pointer for the prior base file must also be updated in this case. Closes-Bug: #1262880 (cherry picked from commit 186221779a92002ff9fa13c254710c0abb3803be) Conflicts: cinder/tests/test_glusterfs.py Change-Id: If7bc8259b031d0406346caafb8f688e65a38dba6
Project: openstack/ceilometer 28a2307b461a087ac981cba48be0920105a44ff2 null Fix the Alarm documentation of Web API V2 Correct the Alarm example on the API documentation to contain a valid Alarm sample and complete the Note section with information about the connection between the type and rules fields of the Alarm. Fixes bug #1245362 Change-Id: I5fccf51b820330595a627fd0001beec2d5f7c6e3
Project: openstack/requirements e0416fa18b50a6ec63aac5de6a13bc69545d4c91 null glance requires pyOpenSSL>=0.11 glance uses OpenSSL.crypto.sign() and OpenSSL.crypto.verify(), which are new in pyOpenSSL 0.11 Fix global requirement first, then glance will use it Change-Id: Id3b06be8ee203c3d15ccc2d846df0d0d8c4145ea Partial-Bug: #1268966
Project: openstack/horizon 36e0ab56136a2063ce56e7579d13393637ea0e21 null Import translations for Havana 2013.2.2 udpate * Import the latest translations of ~100% completed languages. 12 translated languages are avaialable. * Update POT files (English PO file) This commit is directly proposed to stable/havana branch because strings are different between stable/havana and master branches. Change-Id: I117ea214d121d4c70e8f3679c88d0c758c586f99
Project: openstack/neutron e631e89e2bcdf0ef9db25a0262156503dcffaa06 null Send DHCP notifications regardless of agent status The Neutron service, when under load, may not be able to process agent heartbeats in a timely fashion. This can result in agents being erroneously considered inactive. Previously, DHCP notifications for which active agents could not be found were silently dropped. This change ensures that notifications for a given network are sent to agents even if those agents do not appear to be active. Additionally, if no enabled dhcp agents can be found for a given network, an error will be logged. Raising an exception might be preferable, but has such a large testing impact that it will be submitted as a separate patch if deemed necessary. Closes-bug: #1192381 (cherry picked from commit 522f9f94681de5903422cfde11b93f5c0e71e532) Change-Id: Id3e639d9cf3d16708fd66a4baebd3fbeeed3dde8
Project: openstack/ceilometer 51328a33246388b2ceabfe4afcbeb9aa83e5f865 null Add keystone_authtoken.enforce_token_bind boilerplate Add boilerplate for the following config option: [keystone_authtoken] enforce_token_bind in the ceilometer.conf.sample. Change-Id: I4860ec3774385cc98c2600eb1449d356bc63b408
Project: openstack/nova abacc290caf2de0667b59dd4924e994c26eed712 null Avoid deadlock when stringifying NetworkInfo model In the libvirt driver, we log information about the instance we're about to generate XML for, which includes a NetworkInfo model. In reality, this is a NetworkInfoAsyncWrapper which acquires a lock on the model. Since we're in the middle of a log statement, we also hold the logging lock. This happens right after we've fired off an async request to update information about the instance, which first acquired the network model lock and then acquired the logging lock to make a seemingly innocuous log record. The resulting deadlock is fixed by this patch by stringifying the NetworkInfo object before making the log call in the libvirt driver. Change-Id: I81f35197ab7c74daa84fb51fbaa2df28c025da13 Closes-bug: 1276268 (cherry picked from commit 072e4ad20a6a7f4fff3b59eb7ef3f6fef9aa19d1)
Project: openstack/tempest 0dfef932b7c6ed7ceead683678f5de361a3d759e null Make test_show_host_detail work with multi host installation Get the host detail only if the host is compute host and run the tests for all the compute host not just for the first one. Closes-Bug: #1230000 Change-Id: I1830e6071c09d1e751048d74bd828d0eeac282f8 (cherry-picked from 699183617eb99449ae778fdd35475d551ab68b15)
Project: openstack-dev/devstack 3059cc97878cbf11f346c4daa813166fa530d57f null Disable key injection by default. Change-Id: Ib618da1bd21da09f8855ec4691bff79c4c3b3d9c
Project: openstack/nova 2abcc4e04c9838d1130ef18c0efd697cc8cc6918 null Add HEAD api response for test s3 server BucketHandler Current nova test s3 server lacks HEAD for BucketHandler. And it becomes an issue for Boto after version 2.25.0 [1], which changes underlying implementation of get_bucket method from GET to HEAD request. This change fixes this gap by adding HEAD response to test s3 server. And also added cases for testing bucket existence per Boto document suggestion[2], which applies for both Boto prior to or after version 2.25.0 [1] http://docs.pythonboto.org/en/latest/releasenotes/v2.25.0.html [2] http://docs.pythonboto.org/en/latest/s3_tut.html#accessing-a-bucket Change-Id: If992efa40f7f36d337d1b9b1f52859aa0ae51874 Closes-bug: #1277790 (cherry picked from commit 033f3776c4bc6d0db14b1b9da7c24e207e9628ab)
Project: openstack/neutron 5f959d76a0969e5d873dbcb6e80c834f4d376a4b null Avoid loading policy when processing rpc requests When Neutron server is restarted in the environment where multiple agents are sending rpc requests to Neutron, it causes loading of policy.json before API extensions are loaded. That causes different policy check failures later on. This patch avoids loading policy when creating a Context in rpc layer. Change-Id: I66212baa937ec1457e0d284b5445de5243a8931f Partial-Bug: 1254555
Project: openstack/heat 4d213f8b082938ec2587b72b5b920ce211200c46 null Improve coverage of storing credentials in parser.Stack Currently the trusts path is not directly tested, so add coverage and ensure the correct credentials are stored in each case. Related-Bug: #1247200 Conflicts: heat/tests/test_parser.py (cherry picked from commit 5397e94292dcbf61778bdaf8abdd3c14be728458) Change-Id: I0aa999e01015046946f242a9b52b484522dc6d72
Project: openstack/nova bcdc81319444474bef8c4140d58ec34ab45d8377 null Fix `NoopQuotaDriver.get_(project|user)_quotas` format The quota API extension expects `get_project_quotas` and `get_user_quotas` to return a dictionary where the value is another dictionary with a `limit` key. The `DbQuotaDriver` adhered to this spec, but the `NoopQuotaDriver` didn't. This fixes the `NoopQuotaDriver` to return the results in the correct format. Fixes bug 1244842 Change-Id: Iea274dab1c3f10c3cb0a2815f431e15b4d4934b1 (cherry picked from commit 711a12b4029cd1544d26d147d8a67e110e056124)
Project: openstack/heat 9279833b8d331392c13a45b563904e18d3a3461e null Add coverage for trusts parser.Stack delete path Related-Bug: #1247200 (cherry picked from commit 9904be6febc4acd39fb86afe119aa6427e890b9a) Change-Id: Ic55030be389ac71ec999e08533fa9d5fc05b5bd1
Project: openstack/heat ab5d961efd062662544218f36ae64277d39763fd null Catch error deleting trust on stack delete When deleting a stack, it's possible for deleting the trust to fail, for example if the user deleting the stack is not the user who created it, or an admin (which raises a Forbidden error), or due to some other transient error e.g connection to keystone interrupted. Currently in this case, we fail to mark the stack deleted in the DB and leave the status "DELETE, COMPLETE", which is misleading. Conflicts: heat/tests/test_parser.py Closes-Bug: #1247200 (cherry picked from commit 214ba503757e5bd9bf5a5fab6692c4e94d0536fa) Change-Id: Ie8e9ea48bc4f44e56ff4764123fcca733f5bd458
Project: openstack/glance 1982ca25ec3cb7755e3ea2672dd3140d920a5051
null
Filter out deleted images from storage usage
All database API's currently include deleted images in the calc of
storage usage. This is not an issue when deleted images don't have
locations. However, there are cases where a deleted image has deleted
locations as well and that causes the current algorithm to count those
locations as if they were allocating space.
Besides this bug, it makes sense to not load deleted / killed /
pending_delete images from the database if we're actually not
considering them as valid images.
The patch also filters out deleted locations.
NOTE: In the case of locations, it was not possible to add a test for
the deleted locations because it requires some changes that are not
worth in this patch. In order to mark a location as deleted, it's
necessary to go through the API and use a PATCH operation. Since this is
a database test, it doesn't make much sense to add API calls to it.
Calling the image_destroy function with an empty location list will
remove all the locations which won't help testing that specific case.
I'll work on a better solution for that in a follow-up patch.
DocImpact:
The patch now excludes deleted images from the count, this fixes a
bug but changes the existing behaviour.
The patch excludes images in pending_delete from the count, although
the space hasn't be freed yet. This may cause the quota to be
exceeded without raising an error until the image is finally deleted
from the store.
Conflicts:
glance/tests/functional/db/test_sqlalchemy.py
Closes-Bug: #1261738
(cherry picked from commit b35728019e0eb89c213eed7bc35a1f062c99dcca)
Change-Id: I82f08a8f522c81541e4f77597c2ba0aeb68556ce
Project: openstack/tempest 50eaa8c80189d0c938f2dbced70df4d14dc2cdfa null Use channel_timeout for SSH connection timeout Occasionally, SSH will get wedged so that a connection attempt is stuck forever. When this happens, we need Tempest to abort the attempt and try again. Currently, the individual connection timeout is set to the overall timeout, so there will only ever be one attempt if this happens. Using the channel_timeout instead will ensure that multiple connection attempts are made even when the connection is wedged. Fixes bug 1236524 Change-Id: Ie8dff41780bbf004cff5c880db202a8ae23a85c1 (cherry picked from commit b20cf3a30d42ed2ce0c34e338edf498258dfd721)
Project: openstack/neutron c7596bf8c952ecc6d73ff50a6ae9ec6384b94cd1 null Remove and recreate interface if already exists If the dhcp-agent machine restarts when openvswitch comes up it logs the following warning messages for all tap interfaces that do not exist: bridge|WARN|could not open network device tap2cf7dbad-9d (No such device) Once the dhcp-agent starts it recreates the interfaces and re-adds them to the ovs-bridge. Unfortunately, ovs does not reinitialize the interfaces as they are already in ovsdb and does not assign them a ofport number. This situation corrects itself though the next time a port is added to the ovs-bridge which is why no one has probably noticed this issue till now. In order to correct this we should first remove interface that exist and then readd them. Closes-bug: #1268762 Change-Id: I4bb0019135ab7fa7cdfa6d5db3bff6eafe22fc85 (cherry picked from commit b78eea6146145793a7c61705a1602cf5e9ac3d3a)
Project: openstack/ceilometer 9149861ea1b2a2abc200e79ab23d5e1fca5af752 null Add documentation for pipeline configuration Extend the configuration section of the development documentation of Ceilometer to contain the configuration options of pipelines. Change-Id: Ie9d01b89f7af96ba4a80a9b2f2e9141443cf3ecf Closes-Bug: #1272988
Project: openstack/neutron 711106ab5237bdc05f757a8eeeb063cf13310f0e null Multiple Neutron operations using script fails on Brocade Plugin Closes-Bug: 1223754 Change-Id: Ifdeed8407a1cb3df9f17267ea582caab385a63f3
Project: openstack/neutron 927e8a645a20f9d8d9971620c2f7aace3aa294e7 null Don't allow qpid receiving thread to die This patch is a partial backport of 22ec8ff616a799085239e3e529daeeefea6366c4 in oslo-incubator. https://review.openstack.org/#/c/32235/13 This patch ensures that the thread created by consume_in_thread() can not be killed off by an unexpected exception. Related-Bug: #1189711 Change-Id: I4370045b450b2b4b9b3bde1f6f3654cdecc722e2
Project: openstack/cinder 4470fdb1e7c379d118ad4e8707b47550ddd78d51 null delete.start/delete.end notification for hostless Third party GUIs may rely on notifications to track the progress of volume creation and deletion. In the case that a volume is being deleted after a failed attempt to create (the volume is listed in the database but is not actually resident in a backing store) the path that is taken in volume.api.delete() makes no notifications of deletion occurring. This patch adds a volume_utils.notify_about_volume_usage call to the beginning and end of the delete with a delete.start and delete.end respectively. The notifications serve as triggers for GUIs to refresh the state of the volume. This change makes the hostless delete path's functionality more consistent with the other paths through the delete code. Change-Id: I091b9d277834b341105569d41a48ef5c1fc105ce Closes-Bug: 1257053 (cherry picked from commit a347b99c261dc1c761a8bc51c2aee99d20161ca6)
openstack-gerrit
pushed a commit
that referenced
this pull request
Sep 10, 2015
Project: openstack/api-site 2850130977c6b742bbe921237350ce31aeef596f Add volume attributes description for Block Storage API For v1 #1: Add status attribute for JSON sample #2: Fix snapshot_id, source_volid description for response side. #3: Add attachments non-null sample For v2 Add volume info response attributes Change-Id: Iebe1eb2f12550d0e66bb594468ce6b28c9d3c756 Closes-Bug: #1331246
openstack-gerrit
pushed a commit
that referenced
this pull request
Sep 10, 2015
Project: openstack/api-site 0c1f7cacd19e1175ea583c54f7a566fa1f3a74bd
Compute v2.1 docs clean up (part 7) (security_group_default_rules)
Add os-security-group-default-rules
it is based on v2 ext file.
changes are #1 remove xml file, #2 change /v2 => /v2.1
JSON samples are not edited/changed.
Also ordering is alphabetical
os-security-groups
os-security-group-default-rules <= added
os-security-group-rules
Change-Id: I4c06148fe45b32f1aa936bba14d71cd6328fe439
Partial-Bug: #1488144
openstack-gerrit
pushed a commit
that referenced
this pull request
Sep 10, 2015
Project: openstack/api-site a3df332507cadc3d75f15f43f6de0016424d62ec
Compute v2.1 docs clean up (part 8) (fixed_ips)
Add os-fixed-ips
it is based on v2 ext file
Changes are #1 remove link to xml sample file
#2 change /v2 => /v2.1
JSON samples are not edited/changed.
Change-Id: I8da2147514bc4940532953951f3310dc6ba0fef7
Partial-Bug: #1488144
openstack-gerrit
pushed a commit
that referenced
this pull request
Sep 30, 2015
Project: openstack/python-neutronclient 7b4ef5d858e4715bb637f66b8d4efbe11de37bab neutron v2 command module cleanup #1 Purge "body[resource].update({key: value})" pattern and use "body[key] = value" pattern. The purged pattern is a bad convention in neutronclient and I commented not to use it many times but I got tired of it. Change-Id: I2fe0be30d648f59fa45c5951ccc5060c35527aff
openstack-gerrit
pushed a commit
that referenced
this pull request
Sep 30, 2015
Project: openstack/python-neutronclient 7b4ef5d858e4715bb637f66b8d4efbe11de37bab neutron v2 command module cleanup #1 Purge "body[resource].update({key: value})" pattern and use "body[key] = value" pattern. The purged pattern is a bad convention in neutronclient and I commented not to use it many times but I got tired of it. Change-Id: I2fe0be30d648f59fa45c5951ccc5060c35527aff
openstack-gerrit
pushed a commit
that referenced
this pull request
Oct 3, 2015
Project: openstack/swift c799d4de5296056b06e08d8025488472cfcb7d66
Validate against duplicate device part replica assignment
We should never assign multiple replicas of the same partition to the
same device - our on-disk layout can only support a single replica of a
given part on a single device. We should not do this, so we validate
against it and raise a loud warning if this terrible state is ever
observed after a rebalance.
Unfortunately currently there's a couple not necessarily uncommon
scenarios which will trigger this observed state today:
1. If we have less devices than replicas
2. If a server or zones aggregate device weight make it the most
appropriate candidate for multiple replicas and you're a bit unlucky
Fixing #1 would be easy, we should just not allow that state anymore.
Really we never did - if you have a 3 replica ring with one device - you
have one replica. Everything that iter_nodes'd would de-dupe. We
should just be insisting that you explicitly acknowledge your replica
count with set_replicas.
I have been lost in the abyss for days searching for a general solutions
to #2. I'm sure it exists, but I will not have wrestled it to
submission by RC1. In the meantime we can eliminate a great deal of the
luck required simply by refusing to place more than one replica of a
part on a device in assign_parts.
The meat of the change is a small update to the .validate method in
RingBuilder. It basically unrolls a pre-existing (part, replica) loop
so that all the replicas of the part come out in order so that we can
build up the set of dev_id's for which all the replicas of a given part
are assigned part-by-part.
If we observe any duplicates - we raise a warning.
To clean the cobwebs out of the rest of the corner cases we're going to
delay get_required_overload from kicking in until we achive dispersion,
and a small check was added when selecting a device subtier to validate
if it's already being used - picking any other device in the tier works
out much better. If no other devices are available in the tier - we
raise a warning. A more elegant or optimized solution may exist.
Many unittests did not meet the criteria #1, but the fix was straight
forward after being identified by the pigeonhole check.
However, many more tests were affected by #2 - but again the fix came to
be simply adding more devices. The fantasy that all failure domains
contain at least replica count devices is prevalent in both our ring
placement algorithm and it's tests. These tests were trying to
demonstrate some complex characteristics of our ring placement algorithm
and I believe we just got a bit too carried away trying to find the
simplest possible example to demonstrate the desirable trait. I think
a better example looks more like a real ring - with many devices in each
server and many servers in each zone - I think more devices makes the
tests better. As much as possible I've tried to maintain the original
intent of the tests - when adding devices I've either spread the weight
out amongst them or added proportional weights to the other tiers.
I added an example straw man test to validate that three devices with
different weights in three different zones won't blow up. Once we can
do that without raising warnings and assigning duplicate device part
replicas - we can add more. And more importantly change the warnings to
errors - because we would much prefer to not do that #$%^ anymore.
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Related-Bug: #1452431
Change-Id: I592d5b611188670ae842fe3d030aa3b340ac36f9
openstack-gerrit
pushed a commit
that referenced
this pull request
Dec 10, 2015
Project: openstack/swift 0553d9333ed0045c4d209065b315533a33e5d7d7 Put part-replicas where they go It's harder than it sounds. There was really three challenges. Challenge #1 Initial Assignment =============================== Before starting to assign parts on this new shiny ring you've constructed, maybe we'll pause for a moment up front and consider the lay of the land. This process is called the replica_plan. The replica_plan approach is separating part assignment failures into two modes: 1) we considered the cluster topology and it's weights and came up with the wrong plan 2) we failed to execute on the plan I failed at both parts plenty of times before I got it this close. I'm sure a counter example still exists, but when we find it the new helper methods will let us reason about where things went wrong. Challenge #2 Fixing Placement ============================= With a sound plan in hand, it's much easier to fail to execute on it the less material you have to execute with - so we gather up as many parts as we can - as long as we think we can find them a better home. Picking the right parts for gather is a black art - when you notice a balance is slow it's because it's spending so much time iterating over replica2part2dev trying to decide just the right parts to gather. The replica plan can help at least in the gross dispersion collection to gather up the worst offenders first before considering balance. I think trying to avoid picking up parts that are stuck to the tier before falling into a forced grab on anything over parts_wanted helps with stability generally - but depending on where the parts_wanted are in relation to the full devices it's pretty easy pick up something that'll end up really close to where it started. I tried to break the gather methods into smaller pieces so it looked like I knew what I was doing. Going with a MAXIMUM gather iteration instead of balance (which doesn't reflect the replica_plan) doesn't seem to be costing me anything - most of the time the exit condition is either solved or all the parts overly aggressively locked up on min_part_hours. So far, it mostly seemds if the thing is going to balance this round it'll get it in the first couple of shakes. Challenge #3 Crazy replica2part2dev tables ========================================== I think there's lots of ways "scars" can build up a ring which can result in very particular replica2part2dev tables that are physically difficult to dig out of. It's repairing these scars that will take multiple rebalances to resolve. ... but at this point ... ... lacking a counter example ... I've been able to close up all the edge cases I was able to find. It may not be quick, but progress will be made. Basically my strategy just required a better understanding of how previous algorithms were able to *mostly* keep things moving by brute forcing the whole mess with a bunch of randomness. Then when we detect our "elegant" careful part selection isn't making progress - we can fall back to same old tricks. Validation ========== We validate against duplicate part replica assignment after rebalance and raise an ERROR if we detect more than one replica of a part assigned to the same device. In order to meet that requirement we have to have as many devices as replicas, so attempting to rebalance with too few devices w/o changing your replica_count is also an ERROR not a warning. Random Thoughts =============== As usual with rings, the test diff can be hard to reason about - hopefully I've added enough comments to assure future me that these assertions make sense. Despite being a large rewrite of a lot of important code, the existing code is known to have failed us. This change fixes a critical bug that's trivial to reproduce in a critical component of the system. There's probably a bunch of error messages and exit status stuff that's not as helpful as it could be considering the new behaviors. Change-Id: I1bbe7be38806fc1c8b9181a722933c18a6c76e05 Closes-Bug: #1452431
openstack-gerrit
pushed a commit
that referenced
this pull request
Apr 8, 2016
Project: openstack/nova a9459d3c41fa28dbbdac01d058d4a45e78906f7c remove alembic from requirements.txt Alembic was used in attempt #1 of online schema migrations, however that was reverted in Icae28ceee3ec975c907d73b95babab58dcb30c23 when that approach was dropped. There are no other uses of alembic directly in Nova, so we should not list this requirement. Change-Id: I452bfc8454aedff1bbaffacc99d0845186ba4234
openstack-gerrit
pushed a commit
that referenced
this pull request
Feb 8, 2017
Project: openstack-infra/project-config 2e2d8519e3f9b2d8f9ce1330c6a0f8028c9e7c3f Step #1 - Shutting down Nova-Docker We are asking folks to evaluate Zun which has some of the use cases from nova-docker. Several emails have been sent to -dev@ and -operators@ many times over the last two years. Change-Id: I7adcc29cac151ec55f6cc322a880189e0e827db1
openstack-gerrit
pushed a commit
that referenced
this pull request
Mar 29, 2017
Project: openstack/glance 327682e8528bf4effa6fb16e8cabf744f18a55a1 Fix incompatibilities with WebOb 1.7 WebOb 1.7 changed [0] how request bodies are determined to be readable. Prior to version 1.7, the following is how WebOb determined if a request body is readable: #1 Request method is one of POST, PUT or PATCH #2 ``content_length`` length is set #3 Special flag ``webob.is_body_readable`` is set The special flag ``webob.is_body_readable`` was used to signal WebOb to consider a request body readable despite the content length not being set. #1 above is how ``chunked`` Transfer Encoding was supported implicitly in WebOb < 1.7. Now with WebOb 1.7, a request body is considered readable only if ``content_length`` is set and it's non-zero [1]. So, we are only left with #2 and #3 now. This drops implicit support for ``chunked`` Transfer Encoding Glance relied on. Hence, to emulate #1, Glance must set the the special flag upon checking the HTTP methods that may have bodies. This is precisely what this patch attemps to do. [0] Pylons/webob#283 [1] https://github.com/Pylons/webob/pull/283/files#diff-706d71e82f473a3b61d95c2c0d833b60R894 Closes-bug: #1657459 Closes-bug: #1657452 Co-Authored-By: Hemanth Makkapati <hemanth.makkapati@rackspace.com> Change-Id: I19f15165a3d664d5f3a361f29ad7000ba2465a85
openstack-gerrit
pushed a commit
that referenced
this pull request
Mar 29, 2017
Project: openstack/glance 327682e8528bf4effa6fb16e8cabf744f18a55a1 Fix incompatibilities with WebOb 1.7 WebOb 1.7 changed [0] how request bodies are determined to be readable. Prior to version 1.7, the following is how WebOb determined if a request body is readable: #1 Request method is one of POST, PUT or PATCH #2 ``content_length`` length is set #3 Special flag ``webob.is_body_readable`` is set The special flag ``webob.is_body_readable`` was used to signal WebOb to consider a request body readable despite the content length not being set. #1 above is how ``chunked`` Transfer Encoding was supported implicitly in WebOb < 1.7. Now with WebOb 1.7, a request body is considered readable only if ``content_length`` is set and it's non-zero [1]. So, we are only left with #2 and #3 now. This drops implicit support for ``chunked`` Transfer Encoding Glance relied on. Hence, to emulate #1, Glance must set the the special flag upon checking the HTTP methods that may have bodies. This is precisely what this patch attemps to do. [0] Pylons/webob#283 [1] https://github.com/Pylons/webob/pull/283/files#diff-706d71e82f473a3b61d95c2c0d833b60R894 Closes-bug: #1657459 Closes-bug: #1657452 Co-Authored-By: Hemanth Makkapati <hemanth.makkapati@rackspace.com> Change-Id: I19f15165a3d664d5f3a361f29ad7000ba2465a85
openstack-gerrit
pushed a commit
that referenced
this pull request
Apr 20, 2017
Project: openstack/nova ba9a42e1ac68297a86dc1118e889ad66973c7dc6 PowerVM Driver: spawn/delete #1: no-ops Initial change set introducing the PowerVM compute driver. This change set supplies the basic ComputeDriver methods to allow the n-cpu process to start successfully; and no-op spawn & delete methods. Subsequent change sets will build up to the functional spawn & delete support found in https://review.openstack.org/#/c/391288 Change-Id: Ic45bb064f4315ea9e63698a7c0e541c5b0de5051 Partially-Implements: blueprint powervm-nova-compute-driver
openstack-gerrit
pushed a commit
that referenced
this pull request
Apr 25, 2017
Project: openstack/neutron 528ec277c373dd2e7f862cdd5a501cc03be878c4 remove and shim callbacks The callback modules have been available in neutron-lib since commit [1] and are ready for consumption. As the callback registry is implemented with a singleton manager instance, sync complications can arise ensuring all consumers switch to lib's implementation at the same time. Therefore this consumption has been broken down: 1) Shim neutron's callbacks using lib's callback system and remove existing neutron internals related to callbacks (devref, UTs, etc.). 2) Switch all neutron's callback imports over to neutron-lib's. 3) Have all sub-projects using callbacks move their imports over to use neutron-lib's callbacks implementation. 4) Remove the callback shims in neutron-lib once sub-projects are moved over to lib's callbacks. 5) Follow-on patches moving our existing uses of callbacks to the new event payload model provided by neutron-lib.callback.events This patch implements #1 from above, shimming neutron's callbacks and removing devref + UTs. Rather than shimming using debtcollector, this patch leaves callback constants as-is, and simply references the lib class/function in its respective neutron callback module. This allows consumers to test callback types without changing code. For example, an except block block like that below continues to work even though the raised exception now lives in lib:: try: neutron_cb_registry.notify(...) except neutron_cb_exceptions.CallbackFailure: handle_exception() In addition this patch contains minor UT updates to support the shim approach. NeutronLibImpact [1] fea8bb64ba7ff52632c2bd3e3298eaedf623ee4f Change-Id: Ib6baee2aaeb044aaba42a97b35900d75dd43021f
openstack-gerrit
pushed a commit
that referenced
this pull request
Aug 25, 2017
Project: openstack/neutron 5d98e30e5c45e3bda70645c7dfc490d5a9deba76 fix formatting in ubuntu controller install guide The ubuntu controller install guide contains improper indentation and extraneous new lines. As a result the sub-steps for #1 are not shown in this HTML (generated) guide. This one needs to also get back-ported to pike. Change-Id: Ib2b263c8da49ccc8905cbd59331ce6694de232e6 Closes-Bug: #1712107
openstack-gerrit
pushed a commit
that referenced
this pull request
Feb 12, 2019
* Update zun from branch 'master'
- Merge "Pull image from registry"
- Pull image from registry
This commit complete the support of private docker registry.
Users can create a container with images from a specified
docker registry. The steps are as following:
1. Registry a docker registry in Zun (with options to specify
the username/password to authenticate against the registry).
2. Run a container with a reference to the registry created in #1.
Closes-Bug: #1702830
Change-Id: I92f73bf0d759d9e770905debc6f40a5697ef0856
openstack-gerrit
pushed a commit
that referenced
this pull request
Apr 25, 2019
* Update kuryr-kubernetes from branch 'master'
- Set MAC address for VF via netlink message to PF
SR-IOV binding driver uses pyroute2 library to set MAC addresses
to VFs. This is internally implemented via ioctl(SIOCSIFHWADDR)
giving it the name of that device. This is equal to calling
'ip link set dev $VFDEV address $MAC'. However, there is another
way to set MAC address for VF. It works via netlink RTM_SETLINK
message to the PF. This is equal to calling
'ip link set dev $PFDEV vf $VFID mac $MAC'.
How it works:
* ioctl(SIOCSIFHWADDR) asks the VF driver to set the MAC
--> VF driver asks PF to set MAC for it
--> PF sets the MAC for VF.
* RTM_SETLINK message asks the PF to set MAC for VF
--> PF sets the MAC for VF.
In case of setting directly via PF, PF additionally sets an
"administratively changed MAC" flag for that VF in the PF's
driver, and from that point on (until the PF's driver is
reloaded) that VF's MAC address can't be changed using the
method #1.
It's a security feature designed to forbid MAC changing by the
guest OS/app inside the container.
Above leads to the issue where SR-IOV CNI is not able to set MAC
address for VF if its MAC was previously administratively set at
least once (by hands or other software):
ioctl SIOCSIFHWADDR: Cannot assign requested address
kernel: igb 0000:05:00.0:
VF 0 attempted to override administratively set MAC address
Reload the VF driver to resume operations
After that CNI fails the whole transaction, i.e. fails to change
the interface name as well and subsequently fails the binding.
Netlink PF method to change MAC addresses should be used always.
This will additionally forbid the MAC changing from the inside
of container.
Change-Id: Ic47672e4ce645d9d37b520b6a412a44ae61036e1
Closes-Bug: 1825383
Co-authored-by: Danil Golov <d.golov@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
openstack-gerrit
pushed a commit
that referenced
this pull request
Aug 22, 2019
* Update neutron from branch 'master'
- Merge "doc: remove deprecated [neutron]/url from compute install guide"
- doc: remove deprecated [neutron]/url from compute install guide
The nova [neutron]/url config option was deprecated in Queens [1]
and is being removed in Train [2]. The neutron install guide
sections about configuring compute to work with neutron were still
using the url option so this change removes them. There are a few
things to note here:
1. The url option functionality is replaced with the endpoint_override
option from keystoneauth1 but we don't really want users using that
unless there is a real need. One of the main reasons for moving the
nova configuration to use keystoneauth1 was so that the network
service endpoint can be looked up via KSA dynamically based on the
configurable interfaces (public, internal, private) and service types
authority so the endpoint URL will just be pulled from the service
catalog. That means not having to hard-code the endpoint URL in nova
config which makes deployment and config management simpler. As such,
the url option removed in the install guide here is not replaced with
the endpoint_override option.
2. Following on #1, the install guide portion about the nova/neutron config
is updated with a link back to the nova config guide for the full set
of options in case an operator needs to tweak the config, e.g. to set
valid_interfaces or endpoint_override because the KSA defaults don't work
for their deployment.
3. With the old url option, if region_name was not specified, nova would
default to 'RegionOne'. That is not the case if not using the url option
so we leave the region_one config in the install guide example, otherwise
region_name would default to None.
[1] I41724a612a5f3eabd504f3eaa9d2f9d141ca3f69
[2] I6c068a84c4c0bd88f088f9328d7897bfc1f843f1
Change-Id: I30445edeb8509330571db28c7d61dd63886e9e61
Closes-Bug: #1840930
openstack-gerrit
pushed a commit
that referenced
this pull request
Oct 24, 2019
* Update kolla-ansible from branch 'master'
- Merge "Limit open file descriptors for Neutron agent containers"
- Limit open file descriptors for Neutron agent containers
See https://bugs.launchpad.net/oslo.rootwrap/+bug/1760471, in particular
comment #1 for an explanation of why inheriting the defaults of the
docker daemon can cause poor performance:
The performance difference likely comes from close_fds=True of subprocess.
Popen. On Python 2, Popen calls close(fd) on all file descriptors from 3 to
SC_OPEN_MAX. On my Fedora 27 "host", SC_OPEN_MAX is 1,024. But in docker,
SC_OPEN_MAX is... 1,048,576: 1,000x larger. On Python 3, Popen is smarter. On
Linux, it lists the content of /proc/self/fd/ to only close open file
descriptors. It doesn't depend on SC_OPEN_MAX value.
Change-Id: Iefef6039644192420abbd3bf614329cbc0d9a62a
Closes-Bug: #1848737
Related-Bug: #1760471
Related-Bug: #1757556
Related-Bug: #1824020
openstack-mirroring
pushed a commit
that referenced
this pull request
Aug 22, 2020
* Update tripleo-heat-templates from branch 'master'
- Merge "Fix pcs restart in composable HA"
- Fix pcs restart in composable HA
When a redeploy command is being run in a composable HA environment, if there
are any configuration changes, the <bundle>_restart containers will be kicked
off. These restart containers will then try and restart the bundles globally in
the cluster.
These restarts will be fired off in parallel from different nodes. So
haproxy-bundle will be restarted from controller-0, mysql-bundle from
database-0, rabbitmq-bundle from messaging-0.
This has proven to be problematic and very often (rhbz#1868113) it would fail
the redeploy with:
2020-08-11T13:40:25.996896822+00:00 stderr F Error: Could not complete shutdown of rabbitmq-bundle, 1 resources remaining
2020-08-11T13:40:25.996896822+00:00 stderr F Error performing operation: Timer expired
2020-08-11T13:40:25.996896822+00:00 stderr F Set 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role set=rabbitmq-bundle-meta_attributes name=target-role value=stopped
2020-08-11T13:40:25.996896822+00:00 stderr F Waiting for 2 resources to stop:
2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F Deleted 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role name=target-role
2020-08-11T13:40:25.996896822+00:00 stderr F
or
2020-08-11T13:39:49.197487180+00:00 stderr F Waiting for 2 resources to start again:
2020-08-11T13:39:49.197487180+00:00 stderr F * galera-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F Could not complete restart of galera-bundle, 1 resources remaining
2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F
After discussing it with kgaillot it seems that concurrent restarts in pcmk are just brittle:
"""
Sadly restarts are brittle, and they do in fact assume that nothing else is causing resources to start or stop. They work like this:
- Get the current configuration and state of the cluster, including a list of active resources (list #1)
- Set resource target-role to Stopped
- Get the current configuration and state of the cluster, including a list of which resources *should* be active (list #2)
- Compare lists #1 and #2, and the difference is the resources that should stop
- Periodically refresh the configuration and state until the list of active resources matches list #2
- Delete the target-role
- Periodically refresh the configuration and state until the list of active resources matches list #1
"""
So the suggestion is to replace the restarts with an enable/disable cycle of the resource.
Tested this on a dozen runs on a composable HA environment and did not observe the error
any longer.
Closes-Bug: #1892206
Change-Id: I9cc27b1539a62a88fb0bccac64e6b1ae9295f22e
openstack-mirroring
pushed a commit
that referenced
this pull request
Apr 15, 2021
* Update tripleo-heat-templates from branch 'master'
to fac82416f06ec89ce3fd71cf373a92016e91d0b2
- Merge "Expose additional network sysctl knobs"
- Expose additional network sysctl knobs
For BGP we need to expose a few additional sysctl entries.
Namely we need net.ipv4.conf.all.rp_filter and
net.ipv6.conf.all.forwarding. Let's expose them like
the other ones via the KernelIpv4ConfAllRpFilter and
KernelIpv6ConfAllForwarding heat parameters, respectively.
We set KernelIpv4ConfAllRpFilter to a default of 1 as that is
what is the default with RHEL >= 6
(https://access.redhat.com/solutions/53031)
We set KernelIpv6ConfAllForwarding to a default of 0 since that is
the default with at least RHEL >= 7.
Verified the defaults on RHEL/CentOS-7:
$ uname -a
Linux rhel-7.redhat.local 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Mar 25 21:21:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/sys/net/ipv4/conf/all/rp_filter /proc/sys/net/ipv6/conf/all/forwarding
1
0
and RHEL/CentOS-8:
$ uname -a
Linux localhost 4.18.0-293.el8.x86_64 #1 SMP Mon Mar 1 10:04:09 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/sys/net/ipv4/conf/all/rp_filter /proc/sys/net/ipv6/conf/all/forwarding
1
0
Co-Authored-By: Carlos Gonçalves <cgoncalves@redhat.com>
Change-Id: I6d7d598e374cdc5289a61a7fb6b532c80a714458
openstack-mirroring
pushed a commit
that referenced
this pull request
May 10, 2021
* Update ironic-python-agent from branch 'master'
to 9837f1c2f008d7e02dbf14f20f520c70e1281477
- Merge "Fix NVMe Partition image on UEFI"
- Fix NVMe Partition image on UEFI
The _manage_uefi code has a check where it attempts to just
identify the precise partition number of the device, in order
for configuration to be parsed and passed. However, the same code
did not handle the existence of a `p1` partition instead of just a
partition #1. This is because the device naming format is different
with NVMe and Software RAID.
Likely, this wasn't an issue with software raid due to how complex the
code interaction is, but the docs also indicate to use only whole disk
images in that case.
This patch was pulled down my one RH's professional services folks
who has confirmed it does indeed fix the issue at hand. This is noted
as a public comment on the Red Hat bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=1954096
Story: 2008881
Task: 42426
Related: rhbz#1954096
Change-Id: Ie3bd49add9a57fabbcdcbae4b73309066b620d02
openstack-mirroring
pushed a commit
that referenced
this pull request
May 11, 2021
* Update designate-tempest-plugin from branch 'master'
to 3675bd53b0894903abf47e768bb045160c687284
- Merge "New API test cases for a Zone test suite."
- New API test cases for a Zone test suite.
"test_get_primary_zone_nameservers"
1) Create a PRIMARY Zone
2) Retrive Zone Name Servers and validate that not empty
3) Get zone's "pool_id"
3) Make sure that the zone's Name Servers retrieved in #2
are the same as created in zone'a pool.
"test_create_zones" scenario"
1) Create PRIMARY zone and validate the creation
2) Get the Name Servers created in PRIMARY zone and extract hosts list.
Hosts list is used to provide "masters" on SECONDARY zone creation
3) Create a SECONDARY zone and validate the creation
# Note: the existing test was modified to cover both types:
PRIMARY and SECONDARY
"test_manually_trigger_update_secondary_zone_negative"
1) Create a Primary zone
2) Get the nameservers created in #1 and make sure that
those nameservers are not available (pingable)
3) Create a secondary zone
4) Manually trigger zone update and make sure that
the API fails with status code 500 as Nameservers aren’t available.
"test_zone_abandon"
1) Create a zone
2) Show a zone
3) Make sure that the created zone is in: Nameserver/BIND
4) Abandon a zone
5) Wait till a zone is removed from the Designate DB
6) Make sure that the zone is still in Nameserver/BIND
"test_zone_abandon_forbidden"
1) Create a zone
2) Show a zone
3) Make sure that the created zone is in: Nameserver/BIND
4) Abandon a zone as primary tenant (not admin)
5) Make sure that the API fails with: "403 Forbidden"
Change-Id: I6df991145b1a3a2e4e1d402dd31204a67fb45a11
openstack-mirroring
pushed a commit
that referenced
this pull request
Apr 27, 2022
* Update kuryr-kubernetes from branch 'master'
to b7e87c94b1a9af467806c297975b80cd8ff40de1
- Merge "Pools: Fix order of updated SGs"
- Pools: Fix order of updated SGs
According to the comments in vif_pool.py, if there are no ports in the
pool with the requested SG set, we should update the SG on another port,
starting from the ones that were created soonest. I think this logic is
to make sure we grab the ports with most outdated SGs. Anyway that code
is currently broken because of two issues:
1. _last_update dict is always updated by replacing whole dict,
basically meaning that it's only holding data for SG that got updated
most recently.
2. There's a race condition where in _get_port_from_pool multiple
threads can steal a port from themselves.
This commit solves #2 by switching to use OrderedDict to track which SG
is the one that was used most recently. This way we can just iterate the
OrderDict when choosing which port should get its SG updated and just
choose the next port if we get IndexError on pop(). This also solves #1
because _last_update is no longer used to decide which ports have the
most outdated SGs.
Change-Id: Ia3159ee007be865db404e2dcef688abe21592553
openstack-mirroring
pushed a commit
that referenced
this pull request
May 18, 2022
* Update devstack from branch 'master'
to 9eb64896dd785b96b191ce939396420f592e53b4
- Merge "Use proper sed separator for paths"
- Use proper sed separator for paths
I941ef5ea90970a0901236afe81c551aaf24ac1d8 added a sed command that
should match and delete path values but used '/' as sed separator. This
leads to error in unstack.sh runs when the path also contains '/':
+./unstack.sh:main:188 sudo sed -i '/directory=/opt/stack/ d' /etc/gitconfig
sed: -e expression #1, char 13: unknown command: `o'
So this patch replace '/' separator with '+'.
Change-Id: I06811c0d9ee7ecddf84ef1c6dd6cff5129dbf4b1
openstack-mirroring
pushed a commit
that referenced
this pull request
Jul 5, 2022
* Update neutron-tempest-plugin from branch 'master'
to 7b2f5c38a1b5483c0cb8a767e74ae12e3df6c63b
- Merge "Add a test for removing security group from ACTIVE instance"
- Add a test for removing security group from ACTIVE instance
Test name: "test_remove_sec_grp_from_active_vm"
1) Create SG associated with ICMP rule
2) Create Port (assoiated to SG #1) and use it to create the VM
3) Ping the VM, expected should be PASS
4) Remove the security group from VM by Port update
5) Ping the VM, expected should be FAIL
Change-Id: I9fbcdd0f30beeb6985bab4de4d53af639f408c75
openstack-mirroring
pushed a commit
that referenced
this pull request
Jul 15, 2022
* Update tooz from branch 'master'
to 1e86b9103584ce4360633df8e9c536a559a1f79b
- Merge "Fix inappropriate logic in memcachedlock.release()"
- Fix inappropriate logic in memcachedlock.release()
Whether 'was_deleted' was 'TRUE' or not, eventually we have to remove
self from '_acquired_locks'.
For example:
1. App #1 with coordinator 'A' wants to release lock "b"
2. 'self.coord.client.delete()' failed for some reason(.e.g,
BrokenPipeError,MemcacheUnexpectedCloseError)
3. According to the former logic,lock "b" will not remove
from "_acquired_locks", so "self.heartbeat()" will make it alive
forever until App #1 was down or lock "b" turned expired.
4. Now App #1 with coordinator 'A' wants to acquire lock "c", who
have the same lock-name with lock "b",It is clear that this will
fail and prevent the locked program from continuing to execute.
Change-Id: I6fc33b8e0a88510027bcfc30d1504489d2a91b4e
openstack-mirroring
pushed a commit
that referenced
this pull request
Aug 25, 2022
* Update nova from branch 'master'
to ccc06ac808458e009b9bee3cf8cdd43242204920
- Merge "Trigger reschedule if PCI consumption fail on compute"
- Trigger reschedule if PCI consumption fail on compute
The PciPassthroughFilter logic checks each InstancePCIRequest
individually against the available PCI pools of a given host and given
boot request. So it is possible that the scheduler accepts a host that
has a single PCI device available even if two devices are requested for
a single instance via two separate PCI aliases. Then the PCI claim on
the compute detects this but does not stop the boot just logs an ERROR.
This results in the instance booted without any PCI device.
This patch does two things:
1) changes the PCI claim to fail with an exception and trigger a
re-schedule instead of just logging an ERROR.
2) change the PciDeviceStats.support_requests that is called during
scheduling to not just filter pools for individual requests but also
consume the request from the pool within the scope of a single boot
request.
The fix in #2) would not be enough alone as two parallel scheduling
request could race for a single device on the same host. #1) is the
ultimate place where we consume devices under a compute global lock so
we need the fix there too.
Closes-Bug: #1986838
Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343
openstack-mirroring
pushed a commit
that referenced
this pull request
Sep 12, 2022
* Update tripleo-ci from branch 'master'
to a860a16e9313a7c0e8862af69ec4d94942a30721
- Mark tripleo-ci-centos-8-9-multinode-mixed-os non voting
This is temporary as described in related-bug (see comment #1)
Related-Bug: 1989341
Change-Id: I6587573bbfccdb4d83b1bae3364fc51ef4615bbb
openstack-mirroring
pushed a commit
that referenced
this pull request
Dec 13, 2022
* Update nova from branch 'master'
to 8b4104f9f78d0615720c0ba1e3e8cfced42efcc5
- Merge "Split PCI pools per PF"
- Split PCI pools per PF
Each PCI device and each PF is a separate RP in Placement and the
scheduler allocate them specifically so the PCI filtering and claiming
also needs to handle these devices individually. Nova pooled PCI devices
together if they had the same device_spec and same device type and numa
node. Now this is changed that only pool VFs from the same parent PF.
Fortunately nova already handled consuming devices for a single
InstancePCIRequest from multiple PCI pools, so this change does not
affect the device consumption code path.
The test_live_migrate_server_with_neutron test needed to be changed.
Originally this test used a compute with the following config:
* PF 81.00.0
** VFs 81.00.[1-4]
* PF 81.01.0
** VFs 81.01.[1-4]
* PF 82.00.0
And booted a VM that needed one VF and one PF. This request has two
widely different solutions:
1) allocate the VF from under 81.00 and therefore consume 81.00.0 and
allocate the 82.00.0 PF
This was what the test asserted to happen.
2) allocate the VF from under 81.00 and therefore consume 81.00.0 and
allocate the 81.00.0 PF and therefore consume all the VFs under it
This results in a different amount of free devices than #1)
AFAIK nova does not have any implemented preference for consuming PFs
without VFs. The test just worked by chance (some internal device and
pool ordering made it that way). However when the PCI pools are split
nova started choosing solution #2) making the test fail. As both
solution is equally good from nova's scheduling contract perspective I
don't consider this as a behavior change. Therefore the test is updated
not to create a situation where two different scheduling solutions are
possible.
blueprint: pci-device-tracking-in-placement
Change-Id: I4b67cca3807fbda9e9b07b220a28e331def57624
openstack-mirroring
pushed a commit
that referenced
this pull request
Jan 10, 2023
* Update ironic from branch 'master'
to 81e10265ce08bd525388111720b91ca10c99bb28
- Merge "Use association_proxy for ports node_uuid"
- Use association_proxy for ports node_uuid
This change adds 'node_uuid' to ironic.objects.port.Port
and adds a relationship using association_proxy in
models.Port. Using the association_proxy removes the need
to do the node lookup to populate node uuid for ports in
the api controller.
NOTE:
On port create a read is added to read the port from the
database, this ensures node_uuid is loaded and solves the
DetachedInstanceError which is otherwise raised.
Bumps Port object version to 1.11
With patch:
1. Returned 20000 ports in python 2.7768702507019043
seconds from the DB.
2. Took 0.433107852935791 seconds to iterate through
20000 port objects.
Ports table is roughly 12800000 bytes of JSON.
3. Took 5.662816762924194 seconds to return all 20000
ports via ports API call pattern.
Without patch:
1. Returned 20000 ports in python 1.0273635387420654
seconds from the DB.
2. Took 0.4772777557373047 seconds to iterate through
20000 port objects.
Ports table is roughly 12800000 bytes of JSON.
3. Took 147.8800814151764 seconds to return all 20000
ports via ports API call pattern.
Conclusion:
Test #1 plain dbapi.get_port_list() test is ~3 times
slower, but Test #3 doing the API call pattern test
is ~2500% better.
Story: 2007789
Task: 40035
Change-Id: Iff204b3056f3058f795f05dc1d240f494d60672a
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.