Skip to content

Stable/havana#1

Closed
zhangchunlong wants to merge 511 commits intomasterfrom
stable/havana
Closed

Stable/havana#1
zhangchunlong wants to merge 511 commits intomasterfrom
stable/havana

Conversation

@zhangchunlong
Copy link
Contributor

No description provided.

Thomas Leaman and others added 30 commits February 1, 2014 04:51
Project: openstack/glance  1a9e0e94c36127d525c41f03c4223c3e0c2eda03
null
Check first matching rule for protected properties

When using roles to define protected properties, the first matching rule
in the config file should be used to grant/deny access. This change
enforces that behaviour.

References bug 1271426

Change-Id: Id897085f93bcd143ec443f477f666d4cabd77567
(Cherry-picked from b6dd538569ebf0f1580c8e1fadc5e0f8054c9b08)
Conflicts:
    glance/common/property_utils.py
    glance/tests/etc/property-protections.conf
    glance/tests/unit/common/test_property_utils.py
Project: openstack/nova  c09a51cf2675d2821f8c8275ed23aaa23aeeca3c
null
Setup destination disk from virt_disk_size

When running live-migration --block-migrate on a qcow2 backed
VM without cow image, the destination qcow2 file should be created
with the virtual disk size. For raw images, the virt_disk_size
is set to disk_size to ensure that virt_disk_size is always the
size of the disk that should be re-created.

Update unit tests to be more strict and check for sizes to be correct.

Closes-Bug: #1257355

Change-Id: Ie3be46024f06b9f59af92f5e3918a1958386d4f1
Project: openstack/keystone  73968f733d073f03a6451644997b7fca6e92d041
null
Remove netifaces requirement

netifaces is not required to run the tests, so remove it from
the requirements.

Related-Bug: #1266513
Change-Id: Ifb3b262f47d629670b06c670353dbe798af4dc03
Project: openstack/keystone  9056b66a2b537fc5da9f42a6ac0ee82384b965b2
null
list_revoked_tokens sql speedup for havana

This consists of the following 3 patches:

    Narrow columns used in list_revoked_tokens sql

    Currently the SQL backend lists revoked tokens by selecting all of the
    columns, including the massive "extra" column. This places a significant
    burden on the client library and wastes resources. We only need the
    id/expired columns to satisfy the API call.

    In tests this query was several orders of magnitude faster with just two
    thousand un-expired revoked tokens.
    (cherry picked from commit ab7221246af394f24e47484e822b8dcda37411aa)

    Add index to cover revoked token list

    The individual expires and valid indexes do not fully cover the most
    common query, which is the one that lists revoked tokens.

    Because valid is only ever used in conjunction with expires, we do not
    need it to have its own index now that there is a covering compound
    index for expires and valid.

    Note that he expires index is still useful alone for purging old tokens
    as we do not filter for valid in that case.
    (cherry picked from commit dd2c80c566f20a97a22e0d7d5a514be84772a955)

    Remove unused token.valid index

    Because valid is only ever used in conjunction with expires, we do not
    need it to have its own index now that there is a covering compound
    index for expires and valid.

    Note that he expires index is still useful alone for purging old tokens
    as we do not filter for valid in that case.
    (cherry picked from commit 5d8a1a41420aa20d2aa21da6311c9d55b9e373b6)

Change-Id: I04d62b98d5d760a3fbc3c8db61530f7ebccb0a48
Closes-Bug: #1253755
Project: openstack/ceilometer  6524bf3834025d6368b6cd5e4cb0c265bd5fad02
null
Replace mongo aggregation with plain ol' map-reduce

Fixes bug 1262571

Previously, the mongodb storage driver an aggregation pipeline
over the meter collection in order to construct a list of resources
adorned with first & last sample timestamps etc.

However mongodb aggregation framework performs sorting in-memory,
in this case operating over a potentially very large collection.
It is also hardcoded to abort any sorts in an aggregation pipeline
that will consume more than 10% of physical memory, which is
observed in this case.

Now, we avoid the aggregation framework altogether and instead
use an equivalent map-reduce.

Change-Id: Ibef4a95acada411af385ff75ccb36c5724068b59
(cherry picked from commit ba6641afacfc52e7391d2095751ee96d62a64c25)
Project: openstack/nova  a3d861f6fe561faeb87d940ac4112c98e57a4ec1
null
Fix interface-attach removes existing interfaces from db

The following commit 394c693e359ed4f19cc2f7d975b1f9ee5500b7f6 changed
allocate_port_for_instance() to only return the ports that were created
rather than all of the ports on the instance which broke the attach-interface
code.

This patch fixes this issue by remove the sync decorators from:
allocate_for_instance, allocate_port_for_instance, and
deallocate_port_for_instance to just _build_instance_nw_info which
is called in all these cases instead.

Closes-bug: #1223859

(cherry picked from commit 1957339df302e2da75e0dbe78b5d566194ab2c08)

Conflicts:
	nova/network/neutronv2/api.py

Change-Id: I66eb0c0ab926e0a8d1e2c9cfe1f7fd579ea3aa27
Project: openstack/cinder  f02d4fed0c5b3019449dbf8cf81fff1e64337aa1
null
GlusterFS: Ensure Cinder can write to shares

Ensure the Cinder user can write to the GlusterFS share.  This
is required for snapshot functionality, and means the admin
does not have to set this permission manually.

Conflicts:
	cinder/tests/test_glusterfs.py

Closes-Bug: #1236966
Change-Id: I4a9ea40df9681ca6931ad6b390aa21b09d6cfec9
(cherry picked from commit 371fa540600b20b97eae389e1f976145866cadae)

GlusterFS: Complete snapshot_delete when info doesn't exist

The snapshot_delete operation will fail if the snapshot info file
doesn't contain a record for the snapshot, or does not exist.
This happens in cases such as when snapshot_create fails to commit
anything to disk.

The driver should allow the manager to delete the snapshot
in this case, as there is no action required for the driver
to delete anything.

Closes-Bug: #1252864

(cherry picked from commit d8a11168c908fe6c6a07fbb30a5bc88a6df6e939)

Change-Id: I8686a1be09dbb7984072538bff6c026bb84eeb52
Project: openstack/nova  2c44ed7587703fdc5d2be00a092d7b671982d609
null
VMware: fix bug when more than one datacenter exists

In the case that there was more than one datacenter defined on the VC,
then spawning an instance would result in an exception. The reason for this
was that the nova compute would not set the correct datacenter for the
selected datastore.

The fix also takes care of the correct folder selection. This too was a
result of not selecting the correct folder for the data center.

The 'fake' configuration was updated to contain an additional data
center with its on datastore.

Closes-Bug: #1180044
Closes-Bug: #1214850

Co-authored-by: Shawn Harsock <hartsocks@vmware.com>

(cherry picked from commit a25b2ac5f440f7ace4678b21ada6ebf5ce5dff3c)

Conflicts:

	nova/tests/virt/vmwareapi/test_vmwareapi.py
	nova/virt/vmwareapi/fake.py

Change-Id: Ib61811fffcbc80385efc3166c9e366fdaa6432bd
Project: openstack/cinder  240c81d00a49f924e1b9257fee76a7d924246c57
null
GlusterFS: Use correct base argument when deleting attached snaps

When deleting the most recent snapshot, the 'file_to_merge' field
which translates into the base= field for libvirt's blockRebase
call in Nova must be set depending on whether other snapshots exist.

If there are no other snapshots, base = None, which results in
libvirt clearing the qcow2 backing file pointer for the active
disk image.

If there are other snapshots, pass the parent of the file being
deleted as the new base file.  The snapshot info pointer for the
prior base file must also be updated in this case.

Closes-Bug: #1262880

(cherry picked from commit 186221779a92002ff9fa13c254710c0abb3803be)
Conflicts:
	cinder/tests/test_glusterfs.py

Change-Id: If7bc8259b031d0406346caafb8f688e65a38dba6
Project: openstack/ceilometer  28a2307b461a087ac981cba48be0920105a44ff2
null
Fix the Alarm documentation of Web API V2

Correct the Alarm example on the API documentation to contain a
valid Alarm sample and complete the Note section with information
about the connection between the type and rules fields of the
Alarm.

Fixes bug #1245362

Change-Id: I5fccf51b820330595a627fd0001beec2d5f7c6e3
Project: openstack/requirements  e0416fa18b50a6ec63aac5de6a13bc69545d4c91
null
glance requires pyOpenSSL>=0.11

glance uses  OpenSSL.crypto.sign() and OpenSSL.crypto.verify(), which are new in pyOpenSSL 0.11
Fix global requirement first, then glance will use it

Change-Id: Id3b06be8ee203c3d15ccc2d846df0d0d8c4145ea
Partial-Bug: #1268966
Project: openstack/horizon  36e0ab56136a2063ce56e7579d13393637ea0e21
null
Import translations for Havana 2013.2.2 udpate

* Import the latest translations of ~100% completed languages.
  12 translated languages are avaialable.
* Update POT files (English PO file)

This commit is directly proposed to stable/havana branch because
strings are different between stable/havana and master branches.

Change-Id: I117ea214d121d4c70e8f3679c88d0c758c586f99
Project: openstack/neutron  e631e89e2bcdf0ef9db25a0262156503dcffaa06
null
Send DHCP notifications regardless of agent status

The Neutron service, when under load, may not be able to process
agent heartbeats in a timely fashion.  This can result in
agents being erroneously considered inactive.  Previously, DHCP
notifications for which active agents could not be found were
silently dropped.  This change ensures that notifications for
a given network are sent to agents even if those agents do not
appear to be active.

Additionally, if no enabled dhcp agents can be found for a given
network, an error will be logged.  Raising an exception might be
preferable, but has such a large testing impact that it will be
submitted as a separate patch if deemed necessary.

Closes-bug: #1192381
(cherry picked from commit 522f9f94681de5903422cfde11b93f5c0e71e532)

Change-Id: Id3e639d9cf3d16708fd66a4baebd3fbeeed3dde8
Project: openstack/ceilometer  51328a33246388b2ceabfe4afcbeb9aa83e5f865
null
Add keystone_authtoken.enforce_token_bind boilerplate

Add boilerplate for the following config option:

  [keystone_authtoken]
  enforce_token_bind

in the ceilometer.conf.sample.

Change-Id: I4860ec3774385cc98c2600eb1449d356bc63b408
Project: openstack/nova  abacc290caf2de0667b59dd4924e994c26eed712
null
Avoid deadlock when stringifying NetworkInfo model

In the libvirt driver, we log information about the instance we're
about to generate XML for, which includes a NetworkInfo model. In
reality, this is a NetworkInfoAsyncWrapper which acquires a lock
on the model. Since we're in the middle of a log statement, we also
hold the logging lock. This happens right after we've fired off an
async request to update information about the instance, which first
acquired the network model lock and then acquired the logging lock
to make a seemingly innocuous log record.

The resulting deadlock is fixed by this patch by stringifying the
NetworkInfo object before making the log call in the libvirt
driver.

Change-Id: I81f35197ab7c74daa84fb51fbaa2df28c025da13
Closes-bug: 1276268
(cherry picked from commit 072e4ad20a6a7f4fff3b59eb7ef3f6fef9aa19d1)
Project: openstack/tempest  0dfef932b7c6ed7ceead683678f5de361a3d759e
null
Make test_show_host_detail work with multi host installation

Get the host detail only if the host is compute host
and run the tests for all the compute host not just
for the first one.

Closes-Bug: #1230000

Change-Id: I1830e6071c09d1e751048d74bd828d0eeac282f8
(cherry-picked from 699183617eb99449ae778fdd35475d551ab68b15)
Project: openstack-dev/devstack  3059cc97878cbf11f346c4daa813166fa530d57f
null
Disable key injection by default.

Change-Id: Ib618da1bd21da09f8855ec4691bff79c4c3b3d9c
Project: openstack/nova  2abcc4e04c9838d1130ef18c0efd697cc8cc6918
null
Add HEAD api response for test s3 server BucketHandler

Current nova test s3 server lacks HEAD for BucketHandler. And it becomes
an issue for Boto after version 2.25.0 [1], which changes underlying
implementation of get_bucket method from GET to HEAD request.

This change fixes this gap by adding HEAD response to test s3 server.
And also added cases for testing bucket existence per Boto document
suggestion[2], which applies for both Boto prior to or after version
2.25.0

[1] http://docs.pythonboto.org/en/latest/releasenotes/v2.25.0.html
[2] http://docs.pythonboto.org/en/latest/s3_tut.html#accessing-a-bucket

Change-Id: If992efa40f7f36d337d1b9b1f52859aa0ae51874
Closes-bug: #1277790
(cherry picked from commit 033f3776c4bc6d0db14b1b9da7c24e207e9628ab)
Project: openstack/neutron  5f959d76a0969e5d873dbcb6e80c834f4d376a4b
null
Avoid loading policy when processing rpc requests

When Neutron server is restarted in the environment where multiple agents
are sending rpc requests to Neutron, it causes loading of policy.json
before API extensions are loaded. That causes different policy check
failures later on.
This patch avoids loading policy when creating a Context in rpc layer.

Change-Id: I66212baa937ec1457e0d284b5445de5243a8931f
Partial-Bug: 1254555
Project: openstack/heat  4d213f8b082938ec2587b72b5b920ce211200c46
null
Improve coverage of storing credentials in parser.Stack

Currently the trusts path is not directly tested, so add coverage
and ensure the correct credentials are stored in each case.

Related-Bug: #1247200
Conflicts:
	heat/tests/test_parser.py

(cherry picked from commit 5397e94292dcbf61778bdaf8abdd3c14be728458)
Change-Id: I0aa999e01015046946f242a9b52b484522dc6d72
Project: openstack/nova  bcdc81319444474bef8c4140d58ec34ab45d8377
null
Fix `NoopQuotaDriver.get_(project|user)_quotas` format

The quota API extension expects `get_project_quotas` and `get_user_quotas` to
return a dictionary where the value is another dictionary with a `limit` key.

The `DbQuotaDriver` adhered to this spec, but the `NoopQuotaDriver` didn't.

This fixes the `NoopQuotaDriver` to return the results in the correct format.

Fixes bug 1244842

Change-Id: Iea274dab1c3f10c3cb0a2815f431e15b4d4934b1
(cherry picked from commit 711a12b4029cd1544d26d147d8a67e110e056124)
Project: openstack/heat  9279833b8d331392c13a45b563904e18d3a3461e
null
Add coverage for trusts parser.Stack delete path

Related-Bug: #1247200
(cherry picked from commit 9904be6febc4acd39fb86afe119aa6427e890b9a)
Change-Id: Ic55030be389ac71ec999e08533fa9d5fc05b5bd1
Project: openstack/heat  ab5d961efd062662544218f36ae64277d39763fd
null
Catch error deleting trust on stack delete

When deleting a stack, it's possible for deleting the trust to fail,
for example if the user deleting the stack is not the user who created
it, or an admin (which raises a Forbidden error), or due to some
other transient error e.g connection to keystone interrupted.

Currently in this case, we fail to mark the stack deleted in the DB
and leave the status "DELETE, COMPLETE", which is misleading.

Conflicts:
	heat/tests/test_parser.py

Closes-Bug: #1247200
(cherry picked from commit 214ba503757e5bd9bf5a5fab6692c4e94d0536fa)
Change-Id: Ie8e9ea48bc4f44e56ff4764123fcca733f5bd458
Project: openstack/glance  1982ca25ec3cb7755e3ea2672dd3140d920a5051
null
Filter out deleted images from storage usage

All database API's currently include deleted images in the calc of
storage usage. This is not an issue when deleted images don't have
locations. However, there are cases where a deleted image has deleted
locations as well and that causes the current algorithm to count those
locations as if they were allocating space.

Besides this bug, it makes sense to not load deleted / killed /
pending_delete images from the database if we're actually not
considering them as valid images.

The patch also filters out deleted locations.

NOTE: In the case of locations, it was not possible to add a test for
the deleted locations because it requires some changes that are not
worth in this patch. In order to mark a location as deleted, it's
necessary to go through the API and use a PATCH operation. Since this is
a database test, it doesn't make much sense to add API calls to it.
Calling the image_destroy function with an empty location list will
remove all the locations which won't help testing that specific case.

I'll work on a better solution for that in a follow-up patch.

DocImpact:

    The patch now excludes deleted images from the count, this fixes a
    bug but changes the existing behaviour.

    The patch excludes images in pending_delete from the count, although
    the space hasn't be freed yet. This may cause the quota to be
    exceeded without raising an error until the image is finally deleted
    from the store.

Conflicts:
	glance/tests/functional/db/test_sqlalchemy.py

Closes-Bug: #1261738
(cherry picked from commit b35728019e0eb89c213eed7bc35a1f062c99dcca)
Change-Id: I82f08a8f522c81541e4f77597c2ba0aeb68556ce
Project: openstack/tempest  50eaa8c80189d0c938f2dbced70df4d14dc2cdfa
null
Use channel_timeout for SSH connection timeout

Occasionally, SSH will get wedged so that a connection attempt is stuck
forever. When this happens, we need Tempest to abort the attempt and try
again. Currently, the individual connection timeout is set to the
overall timeout, so there will only ever be one attempt if this happens.
Using the channel_timeout instead will ensure that multiple connection
attempts are made even when the connection is wedged.

Fixes bug 1236524

Change-Id: Ie8dff41780bbf004cff5c880db202a8ae23a85c1
(cherry picked from commit b20cf3a30d42ed2ce0c34e338edf498258dfd721)
Project: openstack/neutron  c7596bf8c952ecc6d73ff50a6ae9ec6384b94cd1
null
Remove and recreate interface if already exists

If the dhcp-agent machine restarts when openvswitch comes up it logs the
following warning messages for all tap interfaces that do not exist:

bridge|WARN|could not open network device tap2cf7dbad-9d (No such device)

Once the dhcp-agent starts it recreates the interfaces and re-adds them to the
ovs-bridge. Unfortunately, ovs does not reinitialize the interfaces as they
are already in ovsdb and does not assign them a ofport number.

This situation corrects itself though the next time a port is added to the
ovs-bridge which is why no one has probably noticed this issue till now.

In order to correct this we should first remove interface that exist and
then readd them.

Closes-bug: #1268762

Change-Id: I4bb0019135ab7fa7cdfa6d5db3bff6eafe22fc85
(cherry picked from commit b78eea6146145793a7c61705a1602cf5e9ac3d3a)
Project: openstack/ceilometer  9149861ea1b2a2abc200e79ab23d5e1fca5af752
null
Add documentation for pipeline configuration

Extend the configuration section of the development documentation
of Ceilometer to contain the configuration options of pipelines.

Change-Id: Ie9d01b89f7af96ba4a80a9b2f2e9141443cf3ecf
Closes-Bug: #1272988
Project: openstack/neutron  711106ab5237bdc05f757a8eeeb063cf13310f0e
null
Multiple Neutron operations using script fails on Brocade Plugin

Closes-Bug: 1223754

Change-Id: Ifdeed8407a1cb3df9f17267ea582caab385a63f3
Project: openstack/neutron  927e8a645a20f9d8d9971620c2f7aace3aa294e7
null
Don't allow qpid receiving thread to die

This patch is a partial backport of
22ec8ff616a799085239e3e529daeeefea6366c4 in oslo-incubator.

https://review.openstack.org/#/c/32235/13

This patch ensures that the thread created by consume_in_thread() can
not be killed off by an unexpected exception.

Related-Bug: #1189711

Change-Id: I4370045b450b2b4b9b3bde1f6f3654cdecc722e2
Project: openstack/cinder  4470fdb1e7c379d118ad4e8707b47550ddd78d51
null
delete.start/delete.end notification for hostless

Third party GUIs may rely on notifications to track the
progress of volume creation and deletion.  In the case that
a volume is being deleted after a failed attempt to create
(the volume is listed in the database but is not actually
resident in a backing store) the path that is taken in
volume.api.delete() makes no notifications of deletion
occurring.

This patch adds a volume_utils.notify_about_volume_usage
call to the beginning and end of the delete with a
delete.start and delete.end respectively.  The notifications
serve as triggers for GUIs to refresh the state of the
volume.  This change makes the hostless delete path's
functionality more consistent with the other paths through
the delete code.

Change-Id: I091b9d277834b341105569d41a48ef5c1fc105ce
Closes-Bug: 1257053
(cherry picked from commit a347b99c261dc1c761a8bc51c2aee99d20161ca6)
openstack-gerrit pushed a commit that referenced this pull request Sep 10, 2015
Project: openstack/api-site  2850130977c6b742bbe921237350ce31aeef596f

Add volume attributes description for Block Storage API

For v1
  #1: Add status attribute for JSON sample
  #2: Fix snapshot_id, source_volid description for response side.
  #3: Add attachments non-null sample

For v2
  Add volume info response attributes

Change-Id: Iebe1eb2f12550d0e66bb594468ce6b28c9d3c756
Closes-Bug: #1331246
openstack-gerrit pushed a commit that referenced this pull request Sep 10, 2015
Project: openstack/api-site  0c1f7cacd19e1175ea583c54f7a566fa1f3a74bd

Compute v2.1 docs clean up (part 7) (security_group_default_rules)

Add os-security-group-default-rules
  it is based on v2 ext file.
    changes are #1 remove xml file, #2 change /v2 => /v2.1
    JSON samples are not edited/changed.

  Also ordering is alphabetical
    os-security-groups
    os-security-group-default-rules <= added
    os-security-group-rules

Change-Id: I4c06148fe45b32f1aa936bba14d71cd6328fe439
Partial-Bug: #1488144
openstack-gerrit pushed a commit that referenced this pull request Sep 10, 2015
Project: openstack/api-site  a3df332507cadc3d75f15f43f6de0016424d62ec

Compute v2.1 docs clean up (part 8) (fixed_ips)

Add os-fixed-ips
  it is based on v2 ext file
    Changes are #1 remove link to xml sample file
                #2 change /v2 => /v2.1
    JSON samples are not edited/changed.

Change-Id: I8da2147514bc4940532953951f3310dc6ba0fef7
Partial-Bug: #1488144
openstack-gerrit pushed a commit that referenced this pull request Sep 30, 2015
Project: openstack/python-neutronclient  7b4ef5d858e4715bb637f66b8d4efbe11de37bab

neutron v2 command module cleanup #1

Purge "body[resource].update({key: value})" pattern
and use "body[key] = value" pattern.
The purged pattern is a bad convention in neutronclient and
I commented not to use it many times but I got tired of it.

Change-Id: I2fe0be30d648f59fa45c5951ccc5060c35527aff
openstack-gerrit pushed a commit that referenced this pull request Sep 30, 2015
Project: openstack/python-neutronclient  7b4ef5d858e4715bb637f66b8d4efbe11de37bab

neutron v2 command module cleanup #1

Purge "body[resource].update({key: value})" pattern
and use "body[key] = value" pattern.
The purged pattern is a bad convention in neutronclient and
I commented not to use it many times but I got tired of it.

Change-Id: I2fe0be30d648f59fa45c5951ccc5060c35527aff
openstack-gerrit pushed a commit that referenced this pull request Oct 3, 2015
Project: openstack/swift  c799d4de5296056b06e08d8025488472cfcb7d66

Validate against duplicate device part replica assignment

We should never assign multiple replicas of the same partition to the
same device - our on-disk layout can only support a single replica of a
given part on a single device.  We should not do this, so we validate
against it and raise a loud warning if this terrible state is ever
observed after a rebalance.

Unfortunately currently there's a couple not necessarily uncommon
scenarios which will trigger this observed state today:

 1. If we have less devices than replicas
 2. If a server or zones aggregate device weight make it the most
    appropriate candidate for multiple replicas and you're a bit unlucky

Fixing #1 would be easy, we should just not allow that state anymore.
Really we never did - if you have a 3 replica ring with one device - you
have one replica.  Everything that iter_nodes'd would de-dupe.  We
should just be insisting that you explicitly acknowledge your replica
count with set_replicas.

I have been lost in the abyss for days searching for a general solutions
to #2.  I'm sure it exists, but I will not have wrestled it to
submission by RC1.  In the meantime we can eliminate a great deal of the
luck required simply by refusing to place more than one replica of a
part on a device in assign_parts.

The meat of the change is a small update to the .validate method in
RingBuilder.  It basically unrolls a pre-existing (part, replica) loop
so that all the replicas of the part come out in order so that we can
build up the set of dev_id's for which all the replicas of a given part
are assigned part-by-part.

If we observe any duplicates - we raise a warning.

To clean the cobwebs out of the rest of the corner cases we're going to
delay get_required_overload from kicking in until we achive dispersion,
and a small check was added when selecting a device subtier to validate
if it's already being used - picking any other device in the tier works
out much better.  If no other devices are available in the tier - we
raise a warning.  A more elegant or optimized solution may exist.

Many unittests did not meet the criteria #1, but the fix was straight
forward after being identified by the pigeonhole check.

However, many more tests were affected by #2 - but again the fix came to
be simply adding more devices.  The fantasy that all failure domains
contain at least replica count devices is prevalent in both our ring
placement algorithm and it's tests.  These tests were trying to
demonstrate some complex characteristics of our ring placement algorithm
and I believe we just got a bit too carried away trying to find the
simplest possible example to demonstrate the desirable trait.  I think
a better example looks more like a real ring - with many devices in each
server and many servers in each zone - I think more devices makes the
tests better.  As much as possible I've tried to maintain the original
intent of the tests - when adding devices I've either spread the weight
out amongst them or added proportional weights to the other tiers.

I added an example straw man test to validate that three devices with
different weights in three different zones won't blow up.  Once we can
do that without raising warnings and assigning duplicate device part
replicas - we can add more.  And more importantly change the warnings to
errors - because we would much prefer to not do that #$%^ anymore.

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Related-Bug: #1452431
Change-Id: I592d5b611188670ae842fe3d030aa3b340ac36f9
openstack-gerrit pushed a commit that referenced this pull request Dec 10, 2015
Project: openstack/swift  0553d9333ed0045c4d209065b315533a33e5d7d7

Put part-replicas where they go

It's harder than it sounds.  There was really three challenges.

Challenge #1 Initial Assignment
===============================

Before starting to assign parts on this new shiny ring you've
constructed, maybe we'll pause for a moment up front and consider the
lay of the land.  This process is called the replica_plan.

The replica_plan approach is separating part assignment failures into
two modes:

 1) we considered the cluster topology and it's weights and came up with
    the wrong plan

 2) we failed to execute on the plan

I failed at both parts plenty of times before I got it this close.  I'm
sure a counter example still exists, but when we find it the new helper
methods will let us reason about where things went wrong.

Challenge #2 Fixing Placement
=============================

With a sound plan in hand, it's much easier to fail to execute on it the
less material you have to execute with - so we gather up as many parts
as we can - as long as we think we can find them a better home.

Picking the right parts for gather is a black art - when you notice a
balance is slow it's because it's spending so much time iterating over
replica2part2dev trying to decide just the right parts to gather.

The replica plan can help at least in the gross dispersion collection to
gather up the worst offenders first before considering balance.  I think
trying to avoid picking up parts that are stuck to the tier before
falling into a forced grab on anything over parts_wanted helps with
stability generally - but depending on where the parts_wanted are in
relation to the full devices it's pretty easy pick up something that'll
end up really close to where it started.

I tried to break the gather methods into smaller pieces so it looked
like I knew what I was doing.

Going with a MAXIMUM gather iteration instead of balance (which doesn't
reflect the replica_plan) doesn't seem to be costing me anything - most
of the time the exit condition is either solved or all the parts overly
aggressively locked up on min_part_hours.  So far, it mostly seemds if
the thing is going to balance this round it'll get it in the first
couple of shakes.

Challenge #3 Crazy replica2part2dev tables
==========================================

I think there's lots of ways "scars" can build up a ring which can
result in very particular replica2part2dev tables that are physically
difficult to dig out of.  It's repairing these scars that will take
multiple rebalances to resolve.

... but at this point ...

... lacking a counter example ...

I've been able to close up all the edge cases I was able to find.  It
may not be quick, but progress will be made.

Basically my strategy just required a better understanding of how
previous algorithms were able to *mostly* keep things moving by brute
forcing the whole mess with a bunch of randomness.  Then when we detect
our "elegant" careful part selection isn't making progress - we can fall
back to same old tricks.

Validation
==========

We validate against duplicate part replica assignment after rebalance
and raise an ERROR if we detect more than one replica of a part assigned
to the same device.

In order to meet that requirement we have to have as many devices as
replicas, so attempting to rebalance with too few devices w/o changing
your replica_count is also an ERROR not a warning.

Random Thoughts
===============

As usual with rings, the test diff can be hard to reason about -
hopefully I've added enough comments to assure future me that these
assertions make sense.

Despite being a large rewrite of a lot of important code, the existing
code is known to have failed us.  This change fixes a critical bug that's
trivial to reproduce in a critical component of the system.

There's probably a bunch of error messages and exit status stuff that's
not as helpful as it could be considering the new behaviors.

Change-Id: I1bbe7be38806fc1c8b9181a722933c18a6c76e05
Closes-Bug: #1452431
openstack-gerrit pushed a commit that referenced this pull request Apr 8, 2016
Project: openstack/nova  a9459d3c41fa28dbbdac01d058d4a45e78906f7c

remove alembic from requirements.txt

Alembic was used in attempt #1 of online schema migrations, however
that was reverted in Icae28ceee3ec975c907d73b95babab58dcb30c23 when
that approach was dropped.

There are no other uses of alembic directly in Nova, so we should not
list this requirement.

Change-Id: I452bfc8454aedff1bbaffacc99d0845186ba4234
openstack-gerrit pushed a commit that referenced this pull request Feb 8, 2017
Project: openstack-infra/project-config  2e2d8519e3f9b2d8f9ce1330c6a0f8028c9e7c3f

Step #1 - Shutting down Nova-Docker

We are asking folks to evaluate Zun which has some of
the use cases from nova-docker. Several emails have been
sent to -dev@ and -operators@ many times over the last
two years.

Change-Id: I7adcc29cac151ec55f6cc322a880189e0e827db1
openstack-gerrit pushed a commit that referenced this pull request Mar 29, 2017
Project: openstack/glance  327682e8528bf4effa6fb16e8cabf744f18a55a1

Fix incompatibilities with WebOb 1.7

WebOb 1.7 changed [0] how request bodies are determined to be
readable. Prior to version 1.7, the following is how WebOb
determined if a request body is readable:
  #1 Request method is one of POST, PUT or PATCH
  #2 ``content_length`` length is set
  #3 Special flag ``webob.is_body_readable`` is set

The special flag ``webob.is_body_readable`` was used to signal
WebOb to consider a request body readable despite the content length
not being set. #1 above is how ``chunked`` Transfer Encoding was
supported implicitly in WebOb < 1.7.

Now with WebOb 1.7, a request body is considered readable only if
``content_length`` is set and it's non-zero [1]. So, we are only left
with #2 and #3 now. This drops implicit support for ``chunked``
Transfer Encoding Glance relied on. Hence, to emulate #1, Glance must
set the the special flag upon checking the HTTP methods that may have
bodies. This is precisely what this patch attemps to do.

[0] Pylons/webob#283
[1] https://github.com/Pylons/webob/pull/283/files#diff-706d71e82f473a3b61d95c2c0d833b60R894

Closes-bug: #1657459
Closes-bug: #1657452
Co-Authored-By: Hemanth Makkapati <hemanth.makkapati@rackspace.com>
Change-Id: I19f15165a3d664d5f3a361f29ad7000ba2465a85
openstack-gerrit pushed a commit that referenced this pull request Mar 29, 2017
Project: openstack/glance  327682e8528bf4effa6fb16e8cabf744f18a55a1

Fix incompatibilities with WebOb 1.7

WebOb 1.7 changed [0] how request bodies are determined to be
readable. Prior to version 1.7, the following is how WebOb
determined if a request body is readable:
  #1 Request method is one of POST, PUT or PATCH
  #2 ``content_length`` length is set
  #3 Special flag ``webob.is_body_readable`` is set

The special flag ``webob.is_body_readable`` was used to signal
WebOb to consider a request body readable despite the content length
not being set. #1 above is how ``chunked`` Transfer Encoding was
supported implicitly in WebOb < 1.7.

Now with WebOb 1.7, a request body is considered readable only if
``content_length`` is set and it's non-zero [1]. So, we are only left
with #2 and #3 now. This drops implicit support for ``chunked``
Transfer Encoding Glance relied on. Hence, to emulate #1, Glance must
set the the special flag upon checking the HTTP methods that may have
bodies. This is precisely what this patch attemps to do.

[0] Pylons/webob#283
[1] https://github.com/Pylons/webob/pull/283/files#diff-706d71e82f473a3b61d95c2c0d833b60R894

Closes-bug: #1657459
Closes-bug: #1657452
Co-Authored-By: Hemanth Makkapati <hemanth.makkapati@rackspace.com>
Change-Id: I19f15165a3d664d5f3a361f29ad7000ba2465a85
openstack-gerrit pushed a commit that referenced this pull request Apr 20, 2017
Project: openstack/nova  ba9a42e1ac68297a86dc1118e889ad66973c7dc6

PowerVM Driver: spawn/delete #1: no-ops

Initial change set introducing the PowerVM compute driver.  This change
set supplies the basic ComputeDriver methods to allow the n-cpu process
to start successfully; and no-op spawn & delete methods.

Subsequent change sets will build up to the functional spawn & delete
support found in https://review.openstack.org/#/c/391288

Change-Id: Ic45bb064f4315ea9e63698a7c0e541c5b0de5051
Partially-Implements: blueprint powervm-nova-compute-driver
openstack-gerrit pushed a commit that referenced this pull request Apr 25, 2017
Project: openstack/neutron  528ec277c373dd2e7f862cdd5a501cc03be878c4

remove and shim callbacks

The callback modules have been available in neutron-lib since commit [1]
and are ready for consumption.

As the callback registry is implemented with a singleton manager
instance, sync complications can arise ensuring all consumers switch to
lib's implementation at the same time. Therefore this consumption has
been broken down:
1) Shim neutron's callbacks using lib's callback system and remove
existing neutron internals related to callbacks (devref, UTs, etc.).
2) Switch all neutron's callback imports over to neutron-lib's.
3) Have all sub-projects using callbacks move their imports over to use
neutron-lib's callbacks implementation.
4) Remove the callback shims in neutron-lib once sub-projects are moved
over to lib's callbacks.
5) Follow-on patches moving our existing uses of callbacks to the new
event payload model provided by neutron-lib.callback.events

This patch implements #1 from above, shimming neutron's callbacks and
removing devref + UTs. Rather than shimming using debtcollector, this
patch leaves callback constants as-is, and simply references the lib
class/function in its respective neutron callback module. This allows
consumers to test callback types without changing code. For example,
an except block block like that below continues to work even though
the raised exception now lives in lib::

try:
     neutron_cb_registry.notify(...)
except neutron_cb_exceptions.CallbackFailure:
     handle_exception()

In addition this patch contains minor UT updates to support the shim
approach.

NeutronLibImpact

[1] fea8bb64ba7ff52632c2bd3e3298eaedf623ee4f

Change-Id: Ib6baee2aaeb044aaba42a97b35900d75dd43021f
openstack-gerrit pushed a commit that referenced this pull request Aug 25, 2017
Project: openstack/neutron  5d98e30e5c45e3bda70645c7dfc490d5a9deba76

fix formatting in ubuntu controller install guide

The ubuntu controller install guide contains improper indentation and
extraneous new lines. As a result the sub-steps for #1 are not shown
in this HTML (generated) guide.

This one needs to also get back-ported to pike.

Change-Id: Ib2b263c8da49ccc8905cbd59331ce6694de232e6
Closes-Bug: #1712107
openstack-gerrit pushed a commit that referenced this pull request Feb 12, 2019
* Update zun from branch 'master'
  - Merge "Pull image from registry"
  - Pull image from registry
    
    This commit complete the support of private docker registry.
    Users can create a container with images from a specified
    docker registry. The steps are as following:
    
    1. Registry a docker registry in Zun (with options to specify
       the username/password to authenticate against the registry).
    2. Run a container with a reference to the registry created in #1.
    
    Closes-Bug: #1702830
    Change-Id: I92f73bf0d759d9e770905debc6f40a5697ef0856
openstack-gerrit pushed a commit that referenced this pull request Apr 25, 2019
* Update kuryr-kubernetes from branch 'master'
  - Set MAC address for VF via netlink message to PF
    
    SR-IOV binding driver uses pyroute2 library to set MAC addresses
    to VFs. This is internally implemented via ioctl(SIOCSIFHWADDR)
    giving it the name of that device. This is equal to calling
    'ip link set dev $VFDEV address $MAC'. However, there is another
    way to set MAC address for VF. It works via netlink RTM_SETLINK
    message to the PF. This is equal to calling
    'ip link set dev $PFDEV vf $VFID mac $MAC'.
    
    How it works:
    * ioctl(SIOCSIFHWADDR) asks the VF driver to set the MAC
      --> VF driver asks PF to set MAC for it
      --> PF sets the MAC for VF.
    * RTM_SETLINK message asks the PF to set MAC for VF
      --> PF sets the MAC for VF.
    
    In case of setting directly via PF, PF additionally sets an
    "administratively changed MAC" flag for that VF in the PF's
    driver, and from that point on (until the PF's driver is
    reloaded) that VF's MAC address can't be changed using the
    method #1.
    
    It's a security feature designed to forbid MAC changing by the
    guest OS/app inside the container.
    
    Above leads to the issue where SR-IOV CNI is not able to set MAC
    address for VF if its MAC was previously administratively set at
    least once (by hands or other software):
    
      ioctl SIOCSIFHWADDR: Cannot assign requested address
    
      kernel: igb 0000:05:00.0:
        VF 0 attempted to override administratively set MAC address
        Reload the VF driver to resume operations
    
    After that CNI fails the whole transaction, i.e. fails to change
    the interface name as well and subsequently fails the binding.
    
    Netlink PF method to change MAC addresses should be used always.
    This will additionally forbid the MAC changing from the inside
    of container.
    
    Change-Id: Ic47672e4ce645d9d37b520b6a412a44ae61036e1
    Closes-Bug: 1825383
    Co-authored-by: Danil Golov <d.golov@samsung.com>
    Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
openstack-gerrit pushed a commit that referenced this pull request Aug 22, 2019
* Update neutron from branch 'master'
  - Merge "doc: remove deprecated [neutron]/url from compute install guide"
  - doc: remove deprecated [neutron]/url from compute install guide
    
    The nova [neutron]/url config option was deprecated in Queens [1]
    and is being removed in Train [2]. The neutron install guide
    sections about configuring compute to work with neutron were still
    using the url option so this change removes them. There are a few
    things to note here:
    
    1. The url option functionality is replaced with the endpoint_override
       option from keystoneauth1 but we don't really want users using that
       unless there is a real need. One of the main reasons for moving the
       nova configuration to use keystoneauth1 was so that the network
       service endpoint can be looked up via KSA dynamically based on the
       configurable interfaces (public, internal, private) and service types
       authority so the endpoint URL will just be pulled from the service
       catalog. That means not having to hard-code the endpoint URL in nova
       config which makes deployment and config management simpler. As such,
       the url option removed in the install guide here is not replaced with
       the endpoint_override option.
    
    2. Following on #1, the install guide portion about the nova/neutron config
       is updated with a link back to the nova config guide for the full set
       of options in case an operator needs to tweak the config, e.g. to set
       valid_interfaces or endpoint_override because the KSA defaults don't work
       for their deployment.
    
    3. With the old url option, if region_name was not specified, nova would
       default to 'RegionOne'. That is not the case if not using the url option
       so we leave the region_one config in the install guide example, otherwise
       region_name would default to None.
    
    [1] I41724a612a5f3eabd504f3eaa9d2f9d141ca3f69
    [2] I6c068a84c4c0bd88f088f9328d7897bfc1f843f1
    
    Change-Id: I30445edeb8509330571db28c7d61dd63886e9e61
    Closes-Bug: #1840930
openstack-gerrit pushed a commit that referenced this pull request Oct 24, 2019
* Update kolla-ansible from branch 'master'
  - Merge "Limit open file descriptors for Neutron agent containers"
  - Limit open file descriptors for Neutron agent containers
    
    See https://bugs.launchpad.net/oslo.rootwrap/+bug/1760471, in particular
    comment #1 for an explanation of why inheriting the defaults of the
    docker daemon can cause poor performance:
    
    The performance difference likely comes from close_fds=True of subprocess.
    Popen. On Python 2, Popen calls close(fd) on all file descriptors from 3 to
    SC_OPEN_MAX. On my Fedora 27 "host", SC_OPEN_MAX is 1,024. But in docker,
    SC_OPEN_MAX is... 1,048,576: 1,000x larger. On Python 3, Popen is smarter. On
    Linux, it lists the content of /proc/self/fd/ to only close open file
    descriptors. It doesn't depend on SC_OPEN_MAX value.
    
    Change-Id: Iefef6039644192420abbd3bf614329cbc0d9a62a
    Closes-Bug: #1848737
    Related-Bug: #1760471
    Related-Bug: #1757556
    Related-Bug: #1824020
openstack-mirroring pushed a commit that referenced this pull request Aug 22, 2020
* Update tripleo-heat-templates from branch 'master'
  - Merge "Fix pcs restart in composable HA"
  - Fix pcs restart in composable HA
    
    When a redeploy command is being run in a composable HA environment, if there
    are any configuration changes, the <bundle>_restart containers will be kicked
    off. These restart containers will then try and restart the bundles globally in
    the cluster.
    
    These restarts will be fired off in parallel from different nodes. So
    haproxy-bundle will be restarted from controller-0, mysql-bundle from
    database-0, rabbitmq-bundle from messaging-0.
    
    This has proven to be problematic and very often (rhbz#1868113) it would fail
    the redeploy with:
    2020-08-11T13:40:25.996896822+00:00 stderr F Error: Could not complete shutdown of rabbitmq-bundle, 1 resources remaining
    2020-08-11T13:40:25.996896822+00:00 stderr F Error performing operation: Timer expired
    2020-08-11T13:40:25.996896822+00:00 stderr F Set 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role set=rabbitmq-bundle-meta_attributes name=target-role value=stopped
    2020-08-11T13:40:25.996896822+00:00 stderr F Waiting for 2 resources to stop:
    2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
    2020-08-11T13:40:25.996896822+00:00 stderr F * rabbitmq-bundle
    2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
    2020-08-11T13:40:25.996896822+00:00 stderr F Deleted 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role name=target-role
    2020-08-11T13:40:25.996896822+00:00 stderr F
    
    or
    
    2020-08-11T13:39:49.197487180+00:00 stderr F Waiting for 2 resources to start again:
    2020-08-11T13:39:49.197487180+00:00 stderr F * galera-bundle
    2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
    2020-08-11T13:39:49.197487180+00:00 stderr F Could not complete restart of galera-bundle, 1 resources remaining
    2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
    2020-08-11T13:39:49.197487180+00:00 stderr F
    
    After discussing it with kgaillot it seems that concurrent restarts in pcmk are just brittle:
    """
    Sadly restarts are brittle, and they do in fact assume that nothing else is causing resources to start or stop. They work like this:
    
    - Get the current configuration and state of the cluster, including a list of active resources (list #1)
    - Set resource target-role to Stopped
    - Get the current configuration and state of the cluster, including a list of which resources *should* be active (list #2)
    - Compare lists #1 and #2, and the difference is the resources that should stop
    - Periodically refresh the configuration and state until the list of active resources matches list #2
    - Delete the target-role
    - Periodically refresh the configuration and state until the list of active resources matches list #1
    """
    
    So the suggestion is to replace the restarts with an enable/disable cycle of the resource.
    
    Tested this on a dozen runs on a composable HA environment and did not observe the error
    any longer.
    
    Closes-Bug: #1892206
    
    Change-Id: I9cc27b1539a62a88fb0bccac64e6b1ae9295f22e
openstack-mirroring pushed a commit that referenced this pull request Apr 15, 2021
* Update tripleo-heat-templates from branch 'master'
  to fac82416f06ec89ce3fd71cf373a92016e91d0b2
  - Merge "Expose additional network sysctl knobs"
  - Expose additional network sysctl knobs
    
    For BGP we need to expose a few additional sysctl entries.
    Namely we need net.ipv4.conf.all.rp_filter and
    net.ipv6.conf.all.forwarding. Let's expose them like
    the other ones via the KernelIpv4ConfAllRpFilter and
    KernelIpv6ConfAllForwarding heat parameters, respectively.
    
    We set KernelIpv4ConfAllRpFilter to a default of 1 as that is
    what is the default with RHEL >= 6
    (https://access.redhat.com/solutions/53031)
    
    We set KernelIpv6ConfAllForwarding to a default of 0 since that is
    the default with at least RHEL >= 7.
    
    Verified the defaults on RHEL/CentOS-7:
    $ uname -a
    Linux rhel-7.redhat.local 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Mar 25 21:21:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
    $ cat /proc/sys/net/ipv4/conf/all/rp_filter /proc/sys/net/ipv6/conf/all/forwarding
    1
    0
    
    and RHEL/CentOS-8:
    $ uname -a
    Linux localhost 4.18.0-293.el8.x86_64 #1 SMP Mon Mar 1 10:04:09 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
    $ cat /proc/sys/net/ipv4/conf/all/rp_filter /proc/sys/net/ipv6/conf/all/forwarding
    1
    0
    
    Co-Authored-By: Carlos Gonçalves <cgoncalves@redhat.com>
    
    Change-Id: I6d7d598e374cdc5289a61a7fb6b532c80a714458
openstack-mirroring pushed a commit that referenced this pull request May 10, 2021
* Update ironic-python-agent from branch 'master'
  to 9837f1c2f008d7e02dbf14f20f520c70e1281477
  - Merge "Fix NVMe Partition image on UEFI"
  - Fix NVMe Partition image on UEFI
    
    The _manage_uefi code has a check where it attempts to just
    identify the precise partition number of the device, in order
    for configuration to be parsed and passed. However, the same code
    did not handle the existence of a `p1` partition instead of just a
    partition #1. This is because the device naming format is different
    with NVMe and Software RAID.
    
    Likely, this wasn't an issue with software raid due to how complex the
    code interaction is, but the docs also indicate to use only whole disk
    images in that case.
    
    This patch was pulled down my one RH's professional services folks
    who has confirmed it does indeed fix the issue at hand. This is noted
    as a public comment on the Red Hat bugzilla.
    https://bugzilla.redhat.com/show_bug.cgi?id=1954096
    
    Story: 2008881
    Task: 42426
    Related: rhbz#1954096
    Change-Id: Ie3bd49add9a57fabbcdcbae4b73309066b620d02
openstack-mirroring pushed a commit that referenced this pull request May 11, 2021
* Update designate-tempest-plugin from branch 'master'
  to 3675bd53b0894903abf47e768bb045160c687284
  - Merge "New API test cases for a Zone test suite."
  - New API test cases for a Zone test suite.
    
    "test_get_primary_zone_nameservers"
    1) Create a PRIMARY Zone
    2) Retrive Zone Name Servers and validate that not empty
    3) Get zone's "pool_id"
    3) Make sure that the zone's Name Servers retrieved in #2
    are the same as created in zone'a pool.
    
    "test_create_zones" scenario"
    1) Create PRIMARY zone and validate the creation
    2) Get the Name Servers created in PRIMARY zone and extract hosts list.
       Hosts list is used to provide "masters" on SECONDARY zone creation
    3) Create a SECONDARY zone and validate the creation
      # Note: the existing test was modified to cover both types:
        PRIMARY and SECONDARY
    
    "test_manually_trigger_update_secondary_zone_negative"
    1) Create a Primary zone
    2) Get the nameservers created in #1 and make sure that
       those nameservers are not available (pingable)
    3) Create a secondary zone
    4) Manually trigger zone update and make sure that
       the API fails with status code 500 as Nameservers aren’t available.
    
    "test_zone_abandon"
    1) Create a zone
    2) Show a zone
    3) Make sure that the created zone is in: Nameserver/BIND
    4) Abandon a zone
    5) Wait till a zone is removed from the Designate DB
    6) Make sure that the zone is still in Nameserver/BIND
    
    "test_zone_abandon_forbidden"
    1) Create a zone
    2) Show a zone
    3) Make sure that the created zone is in: Nameserver/BIND
    4) Abandon a zone as primary tenant (not admin)
    5) Make sure that the API fails with: "403 Forbidden"
    
    Change-Id: I6df991145b1a3a2e4e1d402dd31204a67fb45a11
openstack-mirroring pushed a commit that referenced this pull request Apr 27, 2022
* Update kuryr-kubernetes from branch 'master'
  to b7e87c94b1a9af467806c297975b80cd8ff40de1
  - Merge "Pools: Fix order of updated SGs"
  - Pools: Fix order of updated SGs
    
    According to the comments in vif_pool.py, if there are no ports in the
    pool with the requested SG set, we should update the SG on another port,
    starting from the ones that were created soonest. I think this logic is
    to make sure we grab the ports with most outdated SGs. Anyway that code
    is currently broken because of two issues:
    
    1. _last_update dict is always updated by replacing whole dict,
       basically meaning that it's only holding data for SG that got updated
       most recently.
    2. There's a race condition where in _get_port_from_pool multiple
       threads can steal a port from themselves.
    
    This commit solves #2 by switching to use OrderedDict to track which SG
    is the one that was used most recently. This way we can just iterate the
    OrderDict when choosing which port should get its SG updated and just
    choose the next port if we get IndexError on pop(). This also solves #1
    because _last_update is no longer used to decide which ports have the
    most outdated SGs.
    
    Change-Id: Ia3159ee007be865db404e2dcef688abe21592553
openstack-mirroring pushed a commit that referenced this pull request May 18, 2022
* Update devstack from branch 'master'
  to 9eb64896dd785b96b191ce939396420f592e53b4
  - Merge "Use proper sed separator for paths"
  - Use proper sed separator for paths
    
    I941ef5ea90970a0901236afe81c551aaf24ac1d8 added a sed command that
    should match and delete path values but used '/' as sed separator. This
    leads to error in unstack.sh runs when the path also contains '/':
    
    +./unstack.sh:main:188 sudo sed -i '/directory=/opt/stack/ d' /etc/gitconfig
    sed: -e expression #1, char 13: unknown command: `o'
    
    So this patch replace '/' separator with '+'.
    
    Change-Id: I06811c0d9ee7ecddf84ef1c6dd6cff5129dbf4b1
openstack-mirroring pushed a commit that referenced this pull request Jul 5, 2022
* Update neutron-tempest-plugin from branch 'master'
  to 7b2f5c38a1b5483c0cb8a767e74ae12e3df6c63b
  - Merge "Add a test for removing security group from ACTIVE instance"
  - Add a test for removing security group from ACTIVE instance
    
    Test name: "test_remove_sec_grp_from_active_vm"
    1) Create SG associated with ICMP rule
    2) Create Port (assoiated to SG #1) and use it to create the VM
    3) Ping the VM, expected should be PASS
    4) Remove the security group from VM by Port update
    5) Ping the VM, expected should be FAIL
    
    Change-Id: I9fbcdd0f30beeb6985bab4de4d53af639f408c75
openstack-mirroring pushed a commit that referenced this pull request Jul 15, 2022
* Update tooz from branch 'master'
  to 1e86b9103584ce4360633df8e9c536a559a1f79b
  - Merge "Fix inappropriate logic in memcachedlock.release()"
  - Fix inappropriate logic in memcachedlock.release()
    
    Whether 'was_deleted' was 'TRUE' or not, eventually we have to remove
    self from '_acquired_locks'.
    For example:
    1. App #1 with coordinator 'A' wants to release lock "b"
    2. 'self.coord.client.delete()' failed for some reason(.e.g,
    BrokenPipeError,MemcacheUnexpectedCloseError)
    3. According to the former logic,lock "b" will not remove
    from "_acquired_locks", so "self.heartbeat()" will make it alive
    forever until App #1 was down or lock "b" turned expired.
    4. Now App #1 with coordinator 'A' wants to acquire lock "c", who
    have the same lock-name with lock "b",It is clear that this will
    fail and prevent the locked program from continuing to execute.
    
    Change-Id: I6fc33b8e0a88510027bcfc30d1504489d2a91b4e
openstack-mirroring pushed a commit that referenced this pull request Aug 25, 2022
* Update nova from branch 'master'
  to ccc06ac808458e009b9bee3cf8cdd43242204920
  - Merge "Trigger reschedule if PCI consumption fail on compute"
  - Trigger reschedule if PCI consumption fail on compute
    
    The PciPassthroughFilter logic checks each InstancePCIRequest
    individually against the available PCI pools of a given host and given
    boot request. So it is possible that the scheduler accepts a host that
    has a single PCI device available even if two devices are requested for
    a single instance via two separate PCI aliases. Then the PCI claim on
    the compute detects this but does not stop the boot just logs an ERROR.
    This results in the instance booted without any PCI device.
    
    This patch does two things:
    1) changes the PCI claim to fail with an exception and trigger a
       re-schedule instead of just logging an ERROR.
    2) change the PciDeviceStats.support_requests that is called during
       scheduling to not just filter pools for individual requests but also
       consume the request from the pool within the scope of a single boot
       request.
    
    The fix in #2) would not be enough alone as two parallel scheduling
    request could race for a single device on the same host. #1) is the
    ultimate place where we consume devices under a compute global lock so
    we need the fix there too.
    
    Closes-Bug: #1986838
    Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343
openstack-mirroring pushed a commit that referenced this pull request Sep 12, 2022
* Update tripleo-ci from branch 'master'
  to a860a16e9313a7c0e8862af69ec4d94942a30721
  - Mark tripleo-ci-centos-8-9-multinode-mixed-os non voting
    
    This is temporary as described in related-bug (see comment #1)
    
    Related-Bug: 1989341
    Change-Id: I6587573bbfccdb4d83b1bae3364fc51ef4615bbb
openstack-mirroring pushed a commit that referenced this pull request Dec 13, 2022
* Update nova from branch 'master'
  to 8b4104f9f78d0615720c0ba1e3e8cfced42efcc5
  - Merge "Split PCI pools per PF"
  - Split PCI pools per PF
    
    Each PCI device and each PF is a separate RP in Placement and the
    scheduler allocate them specifically so the PCI filtering and claiming
    also needs to handle these devices individually. Nova pooled PCI devices
    together if they had the same device_spec and same device type and numa
    node. Now this is changed that only pool VFs from the same parent PF.
    Fortunately nova already handled consuming devices for a single
    InstancePCIRequest from multiple PCI pools, so this change does not
    affect the device consumption code path.
    
    The test_live_migrate_server_with_neutron test needed to be changed.
    Originally this test used a compute with the following config:
    * PF 81.00.0
    ** VFs 81.00.[1-4]
    * PF 81.01.0
    ** VFs 81.01.[1-4]
    * PF 82.00.0
    
    And booted a VM that needed one VF and one PF. This request has two
    widely different solutions:
    1) allocate the VF from under 81.00 and therefore consume 81.00.0 and
       allocate the 82.00.0 PF
       This was what the test asserted to happen.
    2) allocate the VF from under 81.00 and therefore consume 81.00.0 and
       allocate the 81.00.0 PF and therefore consume all the VFs under it
       This results in a different amount of free devices than #1)
    
    AFAIK nova does not have any implemented preference for consuming PFs
    without VFs. The test just worked by chance (some internal device and
    pool ordering made it that way). However when the PCI pools are split
    nova started choosing solution #2) making the test fail. As both
    solution is equally good from nova's scheduling contract perspective I
    don't consider this as a behavior change. Therefore the test is updated
    not to create a situation where two different scheduling solutions are
    possible.
    
    blueprint: pci-device-tracking-in-placement
    Change-Id: I4b67cca3807fbda9e9b07b220a28e331def57624
openstack-mirroring pushed a commit that referenced this pull request Jan 10, 2023
* Update ironic from branch 'master'
  to 81e10265ce08bd525388111720b91ca10c99bb28
  - Merge "Use association_proxy for ports node_uuid"
  - Use association_proxy for ports node_uuid
    
    This change adds 'node_uuid' to ironic.objects.port.Port
    and adds a relationship using association_proxy in
    models.Port. Using the association_proxy removes the need
    to do the node lookup to populate node uuid for ports in
    the api controller.
    
    NOTE:
     On port create a read is added to read the port from the
     database, this ensures node_uuid is loaded and solves the
     DetachedInstanceError which is otherwise raised.
    
    Bumps Port object version to 1.11
    
    With patch:
      1. Returned 20000 ports in python 2.7768702507019043
         seconds from the DB.
      2. Took 0.433107852935791 seconds to iterate through
         20000 port objects.
         Ports table is roughly 12800000 bytes of JSON.
      3. Took 5.662816762924194 seconds to return all 20000
         ports via ports API call pattern.
    
    Without patch:
      1. Returned 20000 ports in python 1.0273635387420654
         seconds from the DB.
      2. Took 0.4772777557373047 seconds to iterate through
         20000 port objects.
         Ports table is roughly 12800000 bytes of JSON.
      3. Took 147.8800814151764 seconds to return all 20000
         ports via ports API call pattern.
    
    Conclusion:
      Test #1 plain dbapi.get_port_list() test is ~3 times
      slower, but Test #3 doing the API call pattern test
      is ~2500% better.
    
    Story: 2007789
    Task: 40035
    Change-Id: Iff204b3056f3058f795f05dc1d240f494d60672a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.