Skip to content

Conversation

@hwware
Copy link
Contributor

@hwware hwware commented Sep 29, 2021

This PR adds the a human readable name to a node in clusters that are visible as part of error logs. This is useful so that admins and operators of Redis cluster have better visibility into failures without having to cross-reference the generated ID with some logical identifier (such as pod-ID or EC2 instance ID). This is mentioned in #8948. Specific nodenames can be set by using the variable cluster-announce-human-nodename. The nodename is gossiped using the clusterbus extension in #9530.

Dependent PRs

PR #9530 is done
PR #10536 is done

Release notes
Include a new configuration, `cluster-announce-human-nodename`, which allows configuring a unique identifier for a node that is be used in logs for debugging.

@hwware
Copy link
Contributor Author

hwware commented Sep 29, 2021

@madolson Could you please update #8948 on part: Human readable names for nodes to add this PR link, thanks

@madolson
Copy link
Contributor

madolson commented Oct 3, 2021

@hwware Sure. Should we be propagating the cluster node name to other nodes?

@hwware
Copy link
Contributor Author

hwware commented Oct 5, 2021

@madolson do you mean propogating nodename or hostname?
I have added the humanreadable nodename to the CLUSTER NODES commands shown below:
image

@madolson
Copy link
Contributor

madolson commented Oct 6, 2021

Yes, sorry, I meant node name.

@hwware
Copy link
Contributor Author

hwware commented Oct 12, 2021

Hey @madolson , is there anything that needs to added for this PR or is it fine?

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thought was that the node name could be assigned by an admin, say use the EC2 instance ID or some type of container identifier. This isn't all that simpler, as we need to propagate it through the cluster. Having an identifier that is composed of the ip + port could work as a simpler version of a user defined hostname. We don't need to persist the IP + port based nodename to the nodes.conf file, since we can derive it dynamically. I still think the admin defined codename is more useful, but I think we could go either way.

I also thought it would be useful to print the "codename" in the logs wherever we print the 40 character hex identifier, since that can be hard to retroactively identify. This can be implemented regardless of how the codename is implemented.

@hwware
Copy link
Contributor Author

hwware commented Nov 1, 2021

Hey @madolson , I have updated the code and added the gossip functionality for the human readable name. And the node's name can be set manually by using the command: CLUSTER SETNAME. Please take a look.
Thank you.

@hwware hwware force-pushed the cluster_nodename_feature branch from f35fd09 to 4476b94 Compare November 18, 2021 20:53
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to stay more ontop of this to see if we can merge this into 7.

@hwware hwware changed the title Cluster nodename feature WIP Cluster nodename feature Dec 3, 2021
@madolson
Copy link
Contributor

@hwware Hey, now that the hostname extension framework is merged, do you want to follow up on this? I think it should be easy to add a new extension that is the nodename.

@hwware
Copy link
Contributor Author

hwware commented Jan 11, 2022

@hwware Hey, now that the hostname extension framework is merged, do you want to follow up on this? I think it should be easy to add a new extension that is the nodename.

Thanks Madolson, I will start working on this PR. Thanks

@hwware hwware force-pushed the cluster_nodename_feature branch from 4476b94 to b1161cb Compare January 12, 2022 19:09
@hwware hwware requested a review from oranagra January 19, 2022 20:03
@hwware hwware changed the title WIP Cluster nodename feature Cluster nodename feature Jan 19, 2022
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. Main comment I have is just making sure we're clear about the intention. I was thinking of it mostly as a debugging/human readable name that would be exposed in places like info and logging.

@hwware
Copy link
Contributor Author

hwware commented Feb 7, 2022

Hi @madolson , I agree that this can be used as debugging/human readable name that would be exposed in places like info and logging. This would be better than having a node ID when trying to debug any issue with a particular node.

@hwware hwware changed the title Cluster nodename feature WIP Cluster nodename feature Feb 24, 2022
@hwware hwware changed the base branch from unstable to 6.2 February 24, 2022 17:56
@hwware hwware changed the base branch from 6.2 to unstable February 24, 2022 17:57
@hwware hwware changed the title WIP Cluster nodename feature Cluster nodename feature Mar 1, 2022
@hwware hwware changed the base branch from unstable to 7.0 March 25, 2022 16:40
@hwware hwware changed the base branch from 7.0 to unstable March 25, 2022 16:40
@zeekling
Copy link
Contributor

i think we can change the ip to hostname, and we can use hostname to identify the node. no need add nodename

image

@enjoy-binbin
Copy link
Contributor

@zeekling we cannot easily modify the output of the command (any commands), it will be a breaking change
and also node IP is useful information in any case i think, no reason to "remove" it

@zeekling
Copy link
Contributor

zeekling commented Apr 9, 2022

@zeekling we cannot easily modify the output of the command (any commands), it will be a breaking change and also node IP is useful information in any case i think, no reason to "remove" it

ok, Another question, can you confirm the impact on large clusters.For example: the cluster has 300 masters and 300 slaves

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, only thing missing now is tests.

@hwware hwware force-pushed the cluster_nodename_feature branch from 48a7eb4 to 0c24af6 Compare May 23, 2023 19:10
@hwware
Copy link
Contributor Author

hwware commented Jun 1, 2023

I thought I commented on this, we decided to merge this change, #12166, first and then we will merge this.

Hello @madolson , #12166 is merged and I have also resolved all the conflicts. Kindly review.

@madolson madolson self-requested a review June 6, 2023 04:22
@madolson
Copy link
Contributor

madolson commented Jun 6, 2023

@hwware We discussed this previously, but I'm going to double check no one else wants to review it, and then will merge it tomorrow.

@oranagra
Copy link
Member

the top comment needs an update, right?
i.e. that's a new config that affects the generated config files, and some logging, but doesn't affect any commands at the moment, right?
please make sure the top comment / commit comment mentions that, as well as reasoning and future plans.
thanks.

@madolson
Copy link
Contributor

Sorry for the delay, just got behind. Everything should be updated and consistent now.

@madolson
Copy link
Contributor

@hwware Will you double check the top comment and look at my last commit. I tried to write a slightly better test, simplify the code, and updated the redis.conf text.

@hwware
Copy link
Contributor Author

hwware commented Jun 15, 2023

Thanks @madolson for updating them. It looks good , Sorry I missed the @oranagra comments.

@madolson
Copy link
Contributor

@madolson madolson merged commit 070453e into redis:unstable Jun 18, 2023
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jun 18, 2023
auxHumanNodenameGetter limited to %.40s, since we did not limit the
length of config cluster-announce-human-nodename, %.40s will cause
nodename data loss (we will persist it in nodes.conf).

Additional modified auxHumanNodenamePresent to use sdslen.

Introduced in redis#9564.
@oranagra
Copy link
Member

sporadic test failure:

*** [err]: Human nodenames are visible in log messages in tests/unit/cluster/human-announced-nodename.tcl
log message of '"*Node * (nodename-1) reported node * (nodename-0) as not reachable*"' not found in ./tests/tmp/server.6806.100/stdout after line: 0 till line: 42

https://github.com/redis/redis/actions/runs/5313162983/jobs/9618669303?pr=12159

@madolson
Copy link
Contributor

I think this should fix it: #12330.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants