rgw: qpid-proton amqp1.0 bucket notification#1
Closed
Conversation
Fixes: https://tracker.ceph.com/issues/50691 Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
…ging image Fixes: https://tracker.ceph.com/issues/50687 Signed-off-by: Adam King <adking@redhat.com>
crash on multipart upload to bucket with policy Fixes: https://tracker.ceph.com/issues/50556 Signed-off-by: Or Friedmann <ofriedma@redhat.com>
extend the common logic used by the deploy, ceph-volume, and shell commands for validating the `--config` arg during bootstrap Signed-off-by: Michael Fritch <mfritch@suse.com>
use the standard error message from FileNotFound: ``` cephadm bootstrap --mon-ip 192.168.1.1 --config ~/foobar ERROR: [Errno 2] No such file or directory: '/root/foobar' ``` Signed-off-by: Michael Fritch <mfritch@suse.com>
Fixes: https://tracker.ceph.com/issues/50113 Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
hash(str) is non-deterministic, probably because it is using the internal object ID or something and not the string content? In any case, explicitly hash the string content and use that instead. Also, sort the input pre-shuffle to ensure that variations in the original host list ordering don't screw with the result. Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Look in dict, not encoded JSON string Signed-off-by: Sage Weil <sage@newdream.net>
('orch ps')
Signed-off-by: Sage Weil <sage@newdream.net>
This makes 'orch ls' match up daemosn to services (and probably cleans up other bits and pieces) when the old daemon id -> service name calc code can't do its thing. Signed-off-by: Sage Weil <sage@newdream.net>
The rank_map is a bit of state to keep track of which ranks are occupied by which generation and daemon_id. Signed-off-by: Sage Weil <sage@newdream.net>
DaemonDescription CephadmDaemonDeploySpec DaemonPlacement unit.meta get_unique_name() (we include it in the daemon_id) Signed-off-by: Sage Weil <sage@newdream.net>
If we are passed a rank_map, use it maintain one daemon per rank, where the ranks are consecutive non-negative integers starting from 0. A bit of refactoring in place() so that we only do the rank allocations on slots we are going to use (no more than count). Signed-off-by: Sage Weil <sage@newdream.net>
This is more informative than just the hostnames. Signed-off-by: Sage Weil <sage@newdream.net>
- we need to assign all names and update the rank_map before we start creating daemons. - if we are using ranks, we should delete old daemons first, and fence them from the cluster (where possible). Signed-off-by: Sage Weil <sage@newdream.net>
Use ranked daemons for NFS. Ganesha does not like it if multiple
instances start up with the same rank, but we need stable ranks so that
a rank can "fail over" to a new instance of a new daemon on another host
(with the same rank) for NFS client reclaim to work.
Specify a nodeid of '{service_name}.{rank}' for ganesha.
Include a unique id in the daemon_id just because this avoids some issues
with the create/destroy ordering, and because the daemon_id doesn't matter
much anymore since we are using a stable rank.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Do the grace file manipulation from the mgr module. For add, this isn't especially important, but for remove it is very important. Clean out old ranks from the grace table before we record that the rank has been purged from the rank_map. Signed-off-by: Sage Weil <sage@newdream.net>
This avoids any hangs due to rados. Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
- use consistent hashing - statically map across ranks - disable backend checks so that clients don't move Signed-off-by: Sage Weil <sage@newdream.net>
Remove the grace object if we purge the service. Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Better to raise an error; eth0 will never be correct. Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
less error-prone, and it's simpler to manage the resource using RAII Signed-off-by: Kefu Chai <kchai@redhat.com>
before this change, cot never destructs the created ObjectStore instances. after this change, they are destructed upon returning from main(). Signed-off-by: Kefu Chai <kchai@redhat.com>
just for the sake of correctness, as they don't need a full-blown std::string, what they need is but a string like object. and they always create a std::string instance as a member variable if they want to have a copy of it. Signed-off-by: Kefu Chai <kchai@redhat.com>
Following crash occured at Sepia [1]:
```
INFO 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] ProtocolV2::start_accept(): targ
et_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] TRIGGER ACCEPTING, was NONE
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] SEND(26) banner: len_payload=16,
supported=1, required=0, banner="ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(10) banner: "ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT banner: payload_len=16
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(16) banner features: supported=1 required=0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] WRITE HelloFrame: my_type=osd, peer_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT HelloFrame: my_type=client peer_addr=v2:172.21.15.119:6803/31733
INFO 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] UPDATE: peer_type=client, policy(lossy=true server=true standby=false resetcheck=false)
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] GOT AuthRequestFrame: method=2, preferred_modes={1, 2}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4622-gaa1dc559/rpm/el8/BUILD/ceph-17.0.0-4622-gaa1dc559/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
0# 0x000055E84CF44C1F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F2BC88C0B20 in /lib64/libpthread.so.0
4# crimson::mon::Connection::get_conn() in ceph-osd
5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
7# 0x000055E84DF67669 in ceph-osd
8# 0x000055E84DF68775 in ceph-osd
9# 0x000055E846F47F60 in ceph-osd
10# 0x000055E85296770F in ceph-osd
11# 0x000055E85296CC50 in ceph-osd
12# 0x000055E852B1ECBB in ceph-osd
13# 0x000055E85267C73A in ceph-osd
14# main in ceph-osd
15# __libc_start_main in /lib64/libc.so.6
16# _start in ceph-osd
Fault at location: 0x98
```
[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136907
When the `handle_auth_request()` happens, there is no guarantee
`active_con` is being available. This is reflected in the classical
implementation:
```cpp
int MonClient::handle_auth_request(
Connection *con,
// ...
ceph::buffer::list *reply)
{
// ...
bool isvalid = ah->verify_authorizer(
cct,
*rotating_secrets,
payload,
auth_meta->get_connection_secret_length(),
reply,
&con->peer_name,
&con->peer_global_id,
&con->peer_caps_info,
&auth_meta->session_key,
&auth_meta->connection_secret,
ac);
```
The patch transplate the same logic to crimson.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
If the host IP/addr is known, use that. The addr might even be a FQDN instead of an IP address, in which case we want to look that up instead of the bare hostname. Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
- Use a centralized method get_mgr_ip() - Look up the hostname via DNS. This is a bit more reliable than getfqdn() since it will work even when podman adds the container name to /etc/hosts. Signed-off-by: Sage Weil <sage@newdream.net>
Previously we allowed the host.addr to be a DNS name (short or fqdn). This is problematic because of the inconsistent way that docker and podman handle /etc/hosts, and undesirable because relying on external DNS is an external source of failure for the cluster without any benefit in return (simply updating DNS is not sufficient to make ceph behave). So: update any non-IP to an IP as soon as we start up (presumably on upgrade). If we get a loopback address (127.0.0.1 or 127.0.1.1), then wait and hope that the next instance of the manager has better luck. Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
This reverts cfc1f91, which is no longer neceesary because (1) we don't use socket.getfqdn(), and (2) we generally do not rely on DNS or /etc/hosts at all anymore (with the exception of the upgrade transition). Signed-off-by: Sage Weil <sage@newdream.net>
…rvice-status-improvement-2021-05-26 doc/cephadm: enrich "service status" Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
* refs/pull/41483/head: cephadm: stop passing --no-hosts to podman mgr/nfs: use host.addr for backend IP where possible mgr/cephadm: convert host addr if non-IP to IP mgr/dashboard,prometheus: new method of getting mgr IP doc/cephadm: remove any reference to the use of DNS or /etc/hosts mgr/cephadm: use known host addr mgr/cephadm: resolve IP at 'orch host add' time Reviewed-by: Sebastian Wagner <swagner@suse.com>
doc: 15.2.13 Release Notes Reviewed-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Casey Bodley <cbodley@redhat.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com> Reviewed-by: Ramana Raja <rraja@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>
less repeating this way Signed-off-by: Kefu Chai <kchai@redhat.com>
doc/mgr: use confval directive to define options Reviewed-by: Neha Ojha <nojha@redhat.com>
crimson/monc: handle_auth_request() doesn't depend on active_con. Reviewed-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…xtent crimson/seastore: introduce and adopt LBAManager::get_mapping(t, offset) Reviewed-by: Kefu Chai <kchai@redhat.com>
os/bluestore: pass string_view to ctor of Allocator Reviewed-by: Igor Fedotov <ifedotov@suse.com>
os: let ObjectStore::create() return unique_ptr<> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
…user-no-hosts mgr/cephadm: Don't call _check_host without hosts Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com> Reviewed-by: Adam King <adking@redhat.com>
…blocking-io-during-index-resharding rgw: add the description of blocking io during index resharding Reviewed-by: Matt Benjamin mbenjamin@redhat.com Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
wangxuw
pushed a commit
that referenced
this pull request
Jun 15, 2021
`OpSequencer` assumes that ID of a previous client request
is always lower than ID of current one. This is reflected
by the assertion in `OpSequencer::start_op()`. It triggered
the following failure [1] in Teuthology:
```
DEBUG 2021-05-07 08:01:41,227 [shard 0] osd - client_request(id=1, detail=osd_op(client.4171.0:1 2.2 2.7c339972 (undecoded) ondisk+retry+read+known_if_redirected e29) v8) same_interval_since: 31
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3910-g1b18e076/rpm/el8/BUILD/ceph-
17.0.0-3910-g1b18e076/src/crimson/osd/osd_operation_sequencer.h:38: seastar::futurize_t<Result> crimson::osd::OpSequencer::start_op(HandleT&, uint64_t, uint64_t, FuncT&&) [with HandleT = crimson::PipelineHa
ndle; FuncT = crimson::interruptible::interruptor<InterruptCond>::wrap_function(Func&&) [with Func = crimson::osd::ClientRequest::start()::<lambda()> mutable::<lambda(Ref<crimson::osd::PG>)> mutable::<lambd
a()> mutable::<lambda()>; InterruptCond = crimson::osd::IOInterruptCondition]::<lambda()>; Result = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<>
>; seastar::futurize_t<Result> = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<> >; uint64_t = long unsigned int]: Assertion `prev_op < this_op' fai
led.
Aborting on shard 0.
Backtrace:
Segmentation fault.
Backtrace:
0# 0x00005592B028932F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F57B72E7B20 in /lib64/libpthread.so.0
4# gsignal in /lib64/libc.so.6
5# abort in /lib64/libc.so.6
6# 0x00007F57B58E2B09 in /lib64/libc.so.6
7# 0x00007F57B58F0DE6 in /lib64/libc.so.6
8# 0x00005592ABB8484D in ceph-osd
9# 0x00005592ABB8ACB3 in ceph-osd
10# seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<boost::intrusive_ptr<crimson::osd::PG> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&, seastar::future_state<boost::intrusive_ptr<crimson::osd::PG> >&&)#1}, boost::intrusive_ptr<crimson::osd::PG> >::run_and_dispose() in ceph-osd
11# 0x00005592B357F88F in ceph-osd
12# 0x00005592B3584DD0 in ceph-osd
```
[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-07_07:41:02-rados-master-distro-basic-smithi/6104530
Crash analysis resulted in two observations:
1. during the request execution the acting set got
changed, the request was interrupted and a try
to re-execute it emerged;
2. the interrupted request was the very first client
request the OSD has ever seen.
Code analysis showed a problem in how `ClientRequest`
establishes `prev_op_id`: although supposed to be performed
only once for a request, it can get executed twice but only
for the very first request `OpSequencer` saw.
```cpp
void ClientRequest::may_set_prev_op()
{
// set prev_op_id if it's not set yet
if (__builtin_expect(prev_op_id == 0, true)) {
prev_op_id = sequencer.get_last_issued();
}
}
```
Unfortunately, `0` isn't a distincted value that cannot
be returned by `get_last_issued()`:
```cpp
class OpSequencer {
// ...
uint64_t get_last_issued() const {
return last_issued;
}
// ...
// the id of last op which is issued
uint64_t last_issued = 0;
```
As a result, `OpSequencer` returned on the second call
a new value (actually `this_op`) violating the assertion.
The commit fixes the problem by switching from a designated
value to `std::optional`.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
wangxuw
pushed a commit
that referenced
this pull request
Jun 15, 2021
f7181ab has optimized the client parallelism. To achieve that `PG::do_osd_ops()` were converted to return basically future pair of futures. Unfortunately, the life- time management of `OpsExecuter` was kept intact. In the result, the object was valid only till fullfying the outer future while, due to the `rollbacker` instances, it should be available till `all_completed` becomes available. This issue can explain the following problem has been observed in a Teuthology job [1]. ``` DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - do_op_call: method returned ret=-17, outdata.length()=0 while num_read=1, num_write=0 DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - rollback_obc_if_modified: object 19:e17d4708:test-rados-api-smithi095-38404-2::foo:head got erro r generic:17, need_rollback=false ================================================================= ==33626==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000b9320 at pc 0x560f486b8222 bp 0x7fffc467a1e0 sp 0x7fffc467a1d0 READ of size 4 at 0x60d0000b9320 thread T0 #0 0x560f486b8221 (/usr/bin/ceph-osd+0x2c610221) #1 0x560f4880c6b1 in seastar::continuation<seastar::internal::promise_base_with_type<boost::intrusive_ptr<MOSDOpReply> >, seastar::noncopy able_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_ wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>, seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd:: IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson: :errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInte rruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::error ated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > >(seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail <crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_ future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&&)::{lambda(seastar::internal::promise_base_with_type<boos t::intrusive_ptr<MOSDOpReply> >&&, seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() (/usr/bin/ceph-osd+0x2c7646b1) #2 0x560f5352c3ae (/usr/bin/ceph-osd+0x374843ae) ceph#3 0x560f535318ef (/usr/bin/ceph-osd+0x374898ef) ceph#4 0x560f536e395a (/usr/bin/ceph-osd+0x3763b95a) ceph#5 0x560f532413d9 (/usr/bin/ceph-osd+0x371993d9) ceph#6 0x560f476af95a in main (/usr/bin/ceph-osd+0x2b60795a) ceph#7 0x7f7aa0af97b2 in __libc_start_main (/lib64/libc.so.6+0x237b2) ceph#8 0x560f477d2e8d in _start (/usr/bin/ceph-osd+0x2b72ae8d) ``` [1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-20_07:28:16-rados-master-distro-basic-smithi/6124735/ The commit deals with the problem by repacking the outer future. An alternative could be in switching from `std::unique_ptr` to `seastar::shared_ptr` for managing `OpsExecuter`. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
wangxuw
pushed a commit
that referenced
this pull request
Jun 15, 2021
Following crash occured at Sepia [1]:
```
INFO 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] ProtocolV2::start_accept(): targ
et_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] TRIGGER ACCEPTING, was NONE
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] SEND(26) banner: len_payload=16,
supported=1, required=0, banner="ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(10) banner: "ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT banner: payload_len=16
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(16) banner features: supported=1 required=0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] WRITE HelloFrame: my_type=osd, peer_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT HelloFrame: my_type=client peer_addr=v2:172.21.15.119:6803/31733
INFO 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] UPDATE: peer_type=client, policy(lossy=true server=true standby=false resetcheck=false)
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] GOT AuthRequestFrame: method=2, preferred_modes={1, 2}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4622-gaa1dc559/rpm/el8/BUILD/ceph-17.0.0-4622-gaa1dc559/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
0# 0x000055E84CF44C1F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F2BC88C0B20 in /lib64/libpthread.so.0
4# crimson::mon::Connection::get_conn() in ceph-osd
5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
7# 0x000055E84DF67669 in ceph-osd
8# 0x000055E84DF68775 in ceph-osd
9# 0x000055E846F47F60 in ceph-osd
10# 0x000055E85296770F in ceph-osd
11# 0x000055E85296CC50 in ceph-osd
12# 0x000055E852B1ECBB in ceph-osd
13# 0x000055E85267C73A in ceph-osd
14# main in ceph-osd
15# __libc_start_main in /lib64/libc.so.6
16# _start in ceph-osd
Fault at location: 0x98
```
[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136907
When the `handle_auth_request()` happens, there is no guarantee
`active_con` is being available. This is reflected in the classical
implementation:
```cpp
int MonClient::handle_auth_request(
Connection *con,
// ...
ceph::buffer::list *reply)
{
// ...
bool isvalid = ah->verify_authorizer(
cct,
*rotating_secrets,
payload,
auth_meta->get_connection_secret_length(),
reply,
&con->peer_name,
&con->peer_global_id,
&con->peer_caps_info,
&auth_meta->session_key,
&auth_meta->connection_secret,
ac);
```
The patch transplate the same logic to crimson.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox