Skip to content

Add _gce_ network host setting#13612

Merged
dadoonet merged 1 commit intoelastic:masterfrom
dadoonet:pr/13605-gce-add-network-host-setting
Oct 7, 2015
Merged

Add _gce_ network host setting#13612
dadoonet merged 1 commit intoelastic:masterfrom
dadoonet:pr/13605-gce-add-network-host-setting

Conversation

@dadoonet
Copy link
Copy Markdown
Contributor

When running in GCE platform, an instance has access to:

http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip

Which gives back the private IP address, for example 10.240.0.2.

http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/externalIp

Gives back the public Ip address, for example 130.211.108.21.

As we have for ec2, we can support new network host settings:

  • _gce:privateIp:X_: The private IP address of the machine for a given network interface.
  • _gce:hostname_: The hostname of the machine.
  • _gce_: Same as _gce:privateIp:0_ (recommended).

Closes #13605.
Closes #13590.

BTW resolveIfPossible now throws IOException so code is also updated for ec2 discovery and
some basic tests have been added.

@dadoonet
Copy link
Copy Markdown
Contributor Author

For information, I just tested it on GCE platform.

Without network.host: _gce_, it gives:

[2015-09-16 13:13:13,204][INFO ][org.elasticsearch.transport] [Sleek] bound_address {127.0.0.1:9300}, publish_address {127.0.0.1:9300}
[2015-09-16 13:13:18,969][INFO ][org.elasticsearch.http   ] [Sleek] bound_address {127.0.0.1:9200}, publish_address {127.0.0.1:9200}

With network.host: _gce_, it gives:

[2015-09-16 13:15:30,097][INFO ][org.elasticsearch.transport] [Ooze] bound_address {10.240.0.2:9300}, publish_address {10.240.0.2:9300}
[2015-09-16 13:15:34,509][INFO ][org.elasticsearch.http   ] [Ooze] bound_address {10.240.0.2:9200}, publish_address {10.240.0.2:9200}

But it fails with _gce:publicIp_. The URL we check in that case should be http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/access-configs/0/external-ip.

Will push a fix.

Note that we can also add _gce:privateDns_ as http://metadata.google.internal/computeMetadata/v1/instance/hostname gives internal hostname like instance-2.c.dadoonet95.internal

Will update the PR.

@dadoonet
Copy link
Copy Markdown
Contributor Author

Was doing some more tests. Actually even if a VM is accessible externally using it's public IP address, you can't assign this IP:

[2015-09-16 14:34:11,883][DEBUG][org.elasticsearch.cloud.gce] [Aged Genghis] get network information for [access-configs/0/external-ip]
[2015-09-16 14:34:12,202][DEBUG][org.elasticsearch.cloud.gce] [Aged Genghis] ip found [130.211.108.21]
Exception in thread "main" BindTransportException[Failed to bind to [9300-9400]]; nested: ChannelException[Failed to bind to: /130.211.108.21:9400]; nested: BindException[Cannot assign requested address];
Likely root cause: java.net.BindException: Cannot assign requested address
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:433)
    at sun.nio.ch.Net.bind(Net.java:425)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
    at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Refer to the log for complete error details.

So it does not really make sense to support _gce:publicIp_...

@clintongormley clintongormley changed the title [discovery-gce] add _gce_ network host setting Add _gce_ network host setting Sep 16, 2015
rmuir added a commit to rmuir/elasticsearch that referenced this pull request Sep 17, 2015
@dadoonet
Copy link
Copy Markdown
Contributor Author

@bleskes Do you think you can review this?

@dadoonet dadoonet force-pushed the pr/13605-gce-add-network-host-setting branch from a37a898 to 1fcee31 Compare September 26, 2015 08:06
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call this hostname, using the same name GCE uses?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually used the same naming as we have for ec2. But I agree.

@dadoonet
Copy link
Copy Markdown
Contributor Author

@bleskes Updated to address your comments. Thanks for the review!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fined the hostname part of the naming confusing (as it is just one of the options to use). can we call this GceAddressResolverType or something else?

@bleskes
Copy link
Copy Markdown
Contributor

bleskes commented Sep 26, 2015

Thx David for implementing this. I left some comments. My biggest concern is that we are hard wired to to the first 0 interface. I glimpsed the doc and could find this to be guaranteed by google. Does it? If so, we need to document what we do. If not, we need to retrieve all of the interfaces and chose the first (in a predictable order) and document that.

@dadoonet
Copy link
Copy Markdown
Contributor Author

Hey @bleskes

I added a new commit:

  • GceHostnameType become GceAddressResolverType: note that we should may be do the same in ec2 discovery plugin?
  • add a new parameter cloud.gce.network.card if you want to use gce to get the IP Address but from another network card
  • GceComputeServiceMock can now read either .json or .txt files
  • Add new tests for network.host equal to _gce_, _gce:privateIp_ or _gce:doesnotexist_ and for cloud.gce.network.card
  • replace www.elastic.co in test by localhost so it could better work when no network connection is available (sometimes it's good to code in a plane :) )

@rjernst
Copy link
Copy Markdown
Member

rjernst commented Sep 28, 2015

GceComputeServiceMock can now read either .json or .txt files

Do we really need this leniency? Can we just support one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try and use google terminology here? they call this network interface. Also - are we sure 0 is always/95% of the time is there? I'm afraid that the OOB behavior will be bad..

@dadoonet dadoonet force-pushed the pr/13605-gce-add-network-host-setting branch from b73f7ce to 0c258a3 Compare October 6, 2015 16:48
@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 6, 2015

@bleskes I pushed a new commit (and rebased BTW). Let me know.

@rmuir
Copy link
Copy Markdown
Contributor

rmuir commented Oct 6, 2015

I dont see anything guaranteeing that the "first interface" is a private address. I don't think we should solve the problem this way, its too arbitrary and risky.

Instead we should add _site_local_ to ensure that private addresses are really private addresses if that is the intent and use it for these cloud providers.

@bleskes
Copy link
Copy Markdown
Contributor

bleskes commented Oct 6, 2015

@rmuir I presume you refer to #13969 , I respond here to make sure it's all in the same place, but if we continue the discussion (and it does refer to the other ticket), lets continue it there.

The _gce_ is designed to retrieve the instance's internal ip, meaning:

Every instance also has a network IP address that is addressable only within the network. Within the network, instances can also be addressed by instance name and the network will resolve an instance name into a network IP address.

From the docs about what a network is:

Every instance is a member of a single network ... Any communication between instances in different networks, even within the same project, must be through external IP addresses.

So it seems we're good here?

@dadoonet I failed to find the reference for the exact metadta url you are using. Can you dig it up for future reference?

@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 6, 2015

@bleskes @rmuir actually when you query metadata:

curl "http://metadata.google.internal/computeMetadata/v1/instance/?recursive=true" -H "Metadata-Flavor:Google"

You get back something like (I'm hiding non relevant parts):

{
   "hostname":"blabla.projectname.internal",
   "networkInterfaces":[
      {
         "accessConfigs":[
            {
               "externalIp":"104.155.53.203",
               "type":"ONE_TO_ONE_NAT"
            }
         ],
         "forwardedIps":[

         ],
         "ip":"10.240.0.2",
         "network":"projects/896329523726/networks/default"
      }
   ]
}

The public address is exposed within networkInterfaces.accessConfigs.externalIp if any.
networkInterfaces.ip will only give the private IP so I think we are safe here.

Is there any method which can "double-check" that the IP we get is actually non routable on internet - so it's a private one? If so, I could add a safe-guard and either stop the process or put a big WARN saying that the _gce_ address is not private.

@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 6, 2015

Also, it's confirmed by running this command:

$ gcloud compute instances list
NAME ZONE           MACHINE_TYPE  PREEMPTIBLE INTERNAL_IP EXTERNAL_IP    STATUS
cfp  europe-west1-b n1-standard-2             10.240.0.2  104.155.53.203 RUNNING

@rmuir
Copy link
Copy Markdown
Contributor

rmuir commented Oct 6, 2015

Is there any method which can "double-check" that the IP we get is actually non routable on internet - so it's a private one? If so, I could add a safe-guard and either stop the process or put a big WARN saying that the gce address is not private.

I don't agree with a warning or any leniency like this. I don't know how we are going from binding to localhost by default, to potentially binding to a public address by default, just with a warning. I don't want to see this PR rushed through, I am very concerned about this.

@rjernst
Copy link
Copy Markdown
Member

rjernst commented Oct 6, 2015

@dadoonet if you bind to the internal IP for gce, can you confirm external traffic is blocked? IIRC for aws, there was routing magic that made external requests look like they were going to the internal ip (but I might be confused and it was just the ip address was always the internal).

@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 6, 2015

@rjernst Thanks. So here is what I did:

  • start a GCE node using _local_ as the IP. It bounds to 10.240.0.2.
  • try to access the instance externally with the public IP: 104.155.53.203:9200 -> fails
  • add a firewall route to accept connections on 9200.
  • try to access the instance externally with the public IP: 104.155.53.203:9200 -> works

So I guess it's "bad" as we only bound to the private IP so it should not be accessible from a public IP, right?

@rjernst
Copy link
Copy Markdown
Member

rjernst commented Oct 6, 2015

So I guess it's "bad" as we only bound to the private IP so it should not be accessible from a public IP, right?

That's what I would expect, yes. Sounds like routing trickery...

@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 6, 2015

I hear your @rmuir. I can only see two choices:

  • We by default bound to _site_ when we have discovery.type: gce. So it bounds to 127.0.0.1 by default and can be accessed only by a process running on the same physical machine. But it's useless in term of building a cluster of nodes. And it means that users will have to explicitly define network.host: _gce_ which is I believe what they will do in 99.9% of the cases.
  • We by default bound to _local_ or _gce_ when we have discovery.type: gce. So it bounds to a private IP address. Any other machine running within this private network (within this GCE project) will be able to connect to this node.
    It's safe IMO until someone open a firewall route to this machine on 9200 port. Which is not recommended obviously.
    If someone wants really to refuse any connection within the same project, they will have to set network.host: _site_

My comments apply only if your PR #13954 is merged obviously.

Any other solution I could try to implement?

@rmuir
Copy link
Copy Markdown
Contributor

rmuir commented Oct 6, 2015

For GCE i do not think _site_ (which will bind to 10.x, 192.x, 172.16.x) is an option if that cloud provider is configuring static NAT so that they are still in fact "public".

Basically I don't think we should bind to anything publicly reachable.

I'd really rather us just be consistent and bind to _local_ always by default, that is so simple and easy to understand and way more secure. I do not think we should be "reverting" this change in the name of ease of use. I am sorry, I just don't agree with that.

@rmuir
Copy link
Copy Markdown
Contributor

rmuir commented Oct 6, 2015

Can we separate the "add GCE resolver logic", "add better exception handling for custom resolvers", "add lots of GCE tests" which are all in this PR, and seem like good changes, from the changing of the default, and just make a separate issue for that?

@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 6, 2015

Just a note:

and just make a separate issue for that?

The current PR is not about adding _gce_ as default. Sorry if there was confusion.
Adding _gce_ as default will be fixed (if we fix it) with #13969 which is another thing.

So I think we are all on the same page here.

@bleskes
Copy link
Copy Markdown
Contributor

bleskes commented Oct 7, 2015

LGTM. Thx @dadoonet

When running in GCE platform, an instance has access to:

http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip

Which gives back the private IP address, for example `10.240.0.2`.

http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/externalIp

Gives back the public Ip address, for example `130.211.108.21`.

As we have for `ec2`, we can support new network host settings:

* `_gce:privateIp:X_`: The private IP address of the machine for a given network interface.
* `_gce:hostname_`: The hostname of the machine.
* `_gce_`: Same as `_gce:privateIp:0_` (recommended).

Closes elastic#13605.
Closes elastic#13590.

BTW resolveIfPossible now throws IOException so code is also updated for ec2 discovery and
some basic tests have been added.
@dadoonet dadoonet force-pushed the pr/13605-gce-add-network-host-setting branch from 09e84b6 to 289cd5d Compare October 7, 2015 20:05
@dadoonet dadoonet removed the review label Oct 7, 2015
@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 7, 2015

Cool! Rebased and tested. Just need now to apply the changes to 2.x, 2.1 and 2.0:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Build Tools and Resources .......................... SUCCESS [  1.773 s]
[INFO] Rest API Specification ............................. SUCCESS [  1.412 s]
[INFO] Elasticsearch: Parent POM .......................... SUCCESS [ 13.108 s]
[INFO] Elasticsearch: Core ................................ SUCCESS [22:08 min]
[INFO] Distribution: Parent POM ........................... SUCCESS [  5.734 s]
[INFO] Distribution: TAR .................................. SUCCESS [01:21 min]
[INFO] Distribution: ZIP .................................. SUCCESS [01:19 min]
[INFO] Distribution: Deb .................................. SUCCESS [  7.956 s]
[INFO] Distribution: RPM .................................. SUCCESS [01:16 min]
[INFO] Plugin: Parent POM ................................. SUCCESS [  3.957 s]
[INFO] Plugin: Analysis: ICU .............................. SUCCESS [ 31.603 s]
[INFO] Plugin: Analysis: Japanese (kuromoji) .............. SUCCESS [ 25.595 s]
[INFO] Plugin: Analysis: Phonetic ......................... SUCCESS [ 24.531 s]
[INFO] Plugin: Analysis: Smart Chinese (smartcn) .......... SUCCESS [ 24.221 s]
[INFO] Plugin: Analysis: Polish (stempel) ................. SUCCESS [ 24.839 s]
[INFO] Plugin: Cloud: Google Compute Engine ............... SUCCESS [ 24.511 s]
[INFO] Plugin: Delete By Query ............................ SUCCESS [ 44.614 s]
[INFO] Plugin: Discovery: Azure ........................... SUCCESS [ 38.441 s]
[INFO] Plugin: Discovery: EC2 ............................. SUCCESS [ 31.232 s]
[INFO] Plugin: Discovery: Multicast ....................... SUCCESS [ 52.638 s]
[INFO] Plugin: Language: Expression ....................... SUCCESS [ 58.571 s]
[INFO] Plugin: Language: Groovy ........................... SUCCESS [05:24 min]
[INFO] Plugin: Language: JavaScript ....................... SUCCESS [ 38.987 s]
[INFO] Plugin: Language: Python ........................... SUCCESS [ 43.542 s]
[INFO] Plugin: Mapper: Murmur3 ............................ SUCCESS [ 27.527 s]
[INFO] Plugin: Mapper: Size ............................... SUCCESS [ 28.513 s]
[INFO] Plugin: Repository: Azure .......................... SUCCESS [ 33.372 s]
[INFO] Plugin: Repository: S3 ............................. SUCCESS [ 30.935 s]
[INFO] Plugin: Store: SMB ................................. SUCCESS [ 34.707 s]
[INFO] Plugin: JVM example ................................ SUCCESS [ 18.298 s]
[INFO] Plugin: Example site ............................... SUCCESS [ 15.260 s]
[INFO] QA: Parent POM ..................................... SUCCESS [  0.847 s]
[INFO] QA: Smoke Test Plugins ............................. SUCCESS [ 38.715 s]
[INFO] QA: Smoke Test Multi-Node IT ....................... SUCCESS [ 19.855 s]
[INFO] QA: Smoke Test Client .............................. SUCCESS [ 13.322 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 44:11 min
[INFO] Finished at: 2015-10-07T22:49:47+02:00
[INFO] Final Memory: 100M/696M
[INFO] ------------------------------------------------------------------------

@dadoonet dadoonet merged commit 289cd5d into elastic:master Oct 7, 2015
@dadoonet dadoonet deleted the pr/13605-gce-add-network-host-setting branch October 7, 2015 21:24
@clintongormley clintongormley added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Cloud GCE labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v2.0.0 v2.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add _gce_ network host setting [discovery-gce] nodes don't see each others

5 participants