Skip to content

Add Google Cloud Storage repository plugin#13578

Merged
tlrx merged 1 commit intoelastic:masterfrom
tlrx:add-google-cloud-storage-support
May 19, 2016
Merged

Add Google Cloud Storage repository plugin#13578
tlrx merged 1 commit intoelastic:masterfrom
tlrx:add-google-cloud-storage-support

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Sep 15, 2015

This pull request adds a new type of repository gcs to allow to snapshot/restore indices on Google Cloud Storage service:

curl -XPUT 'localhost:9200/_snapshot/my_gcs_repo' -d '                       
{                 
  "type": "gcs", 
  "settings":  {
      "project_id": "my-project-id", 
      "bucket":"my-bucket", 
      "credentials": "/path/to/service-account-file.json"
   }
}

For now it only supports authentication using Service Account files.

Many things still need to be done but I'd be glad to benefit from a first round of review.

  • add integration tests
  • add license files
  • add documentation
  • add HTTP proxy support
  • add HTTP proxy support with custom java key stores

Closes #12880

@tlrx tlrx added review :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Sep 15, 2015
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Sep 15, 2015

Note: it needs #13574 to work

@tlrx tlrx force-pushed the add-google-cloud-storage-support branch from 82b95d1 to 1e12955 Compare September 30, 2015 11:03
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever inject container, so, it shouldn't be annotated. We can also make the constructor package private, since we don't really inherit from GoogleCloudStorageBlobContainer. Lastly, why don't we just pass bucket and client since this is the only part of the store that we need here and it will simplify code below a bit.

@imotov
Copy link
Copy Markdown
Contributor

imotov commented Oct 7, 2015

Looks like a good start. Left a couple of comments.

@tlrx tlrx force-pushed the add-google-cloud-storage-support branch from 1e12955 to 0fbd402 Compare November 30, 2015 21:21
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Nov 30, 2015

@imotov Thanks for your review. I updated the code following your comments and I also rebased the pull request so that it now works with gradle and the security manager.

Once #14050 is in I should be able to add tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we bind to concrete implementation here, maybe it makes sense to get rid of GoogleCloudStorageService interface all together and rename InternalGoogleCloudStorageService to GoogleCloudStorageService. It feels like these interfaces don't buy us anything except added complexity. Replacing them with mock implementation would make little sense here since we already test all upper levels with mock implementation based on fs repo.

@imotov
Copy link
Copy Markdown
Contributor

imotov commented Dec 9, 2015

Left some comments. Let's figure out how to add some tests to this thing.

@ghost
Copy link
Copy Markdown

ghost commented Jan 17, 2016

Hi, is there any updates on this issue?

@chrislovecnm
Copy link
Copy Markdown

:1

@dakrone
Copy link
Copy Markdown
Member

dakrone commented Apr 6, 2016

ping @tlrx, I think this needs comments to be addressed and then another review?

@tlrx tlrx force-pushed the add-google-cloud-storage-support branch from b9b6d52 to 38b9e89 Compare April 8, 2016 16:13
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Apr 8, 2016

ping @tlrx, I think this needs comments to be addressed and then another review?

Sure, thanks for the kindly reminder @dakrone . Let's add an update to this pull request and some explanation why it takes so long to get in.

A bit of history

I started to work on this plugin when elasticsearch was around version 1.4. At this time, the Google Client library was almost incompatible with the way Snapshot/Restore worked in ES. We try to find a workaround for some time but the solution was hardly maintainable and had poor performance. Around ES 1.5 or 1.6 the internal snapshot/restore changed a bit and at the same time the Google library improved a lot. Great! But... Elasticsearch was on its way to 2.x and a lot of things changed and improved too: we moved the plugins from their dedicated repository back in elasticsearch repo, then move from maven to gradle, enabled the security manager, changed the way integration tests worked for plugins. It has been a lot of amazing improvements but keeping this pull request up-to-date required a LOT of attention and time. Knowing that things would stabilize, I put this on a shelve, I strongly apologize for that.

Current status

Anyway, I rebased and updated the code today to make it work with elasticsearch 5.0.0-alpha1-SNAPSHOT. It has been tested locally and on Google Compute Engine and it works nicely.

If you want to test it, you can check out this pull request, build it locally using Gradle and deploy it in a elasticsearch 5.0.0-alpha1-SNAPSHOT (download it here). This pull request contains a documentation page.

@imotov Can you please have another look when you have time? That would be great!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file intentionally empty?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we add empty file for libs that don't provide NOTICE.txt (I spotted few others like this in other plugins)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen this code so many times. Maybe it's time to move it somewhere where it can be reused?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Maybe we could move it to the BlobPath.buildAsString() because it is always used with this logic. If we do that, it makes sense to do it in another PR because other plugin like repository-s3 can benefit from it.

@imotov
Copy link
Copy Markdown
Contributor

imotov commented Apr 8, 2016

Left a few comments. Looks pretty good. I think we can simplify thing by getting rid of GoogleCloudStorageClient and just move all its functionality directly into GoogleCloudStorageBlobStore and GoogleCloudStorageBlobContainer. Also, what about ESBlobStoreRepositoryIntegTestCase?

@haizaar
Copy link
Copy Markdown

haizaar commented Apr 18, 2016

Current status

Anyway, I rebased and updated the code today to make it work with elasticsearch 5.0.0-alpha1-SNAPSHOT. It has been tested locally and on Google Compute Engine and it works nicely.

If you want to test it, you can check out this pull request, build it locally using Gradle and deploy it in a elasticsearch 5.0.0-alpha1-SNAPSHOT (download it here). This pull request contains a documentation page.

@tlrx thanks for the update. Do I read you right that the current plan is to release it as part of ES 5.0?

@haizaar
Copy link
Copy Markdown

haizaar commented Apr 19, 2016

/CC @motinani

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Apr 29, 2016

@imotov Thanks for your review! I removed the GoogleCloudStorageClient class and moved the implementation in the blob store.

Concerning the tests, I moved the ESBlobStoreRepositoryIntegTestCase, ESBlobStoreContainerTestCase and ESBlobStoreTestCase classes into the test framework so that they can be inherited in plugins (it looks like we already have such abstract test classes like AbstractNumericTestCase in the test framework). Then I created a dedicated test for each of theses classes. I had to turn the GoogleCloudStorageService into an interface so that it can be mocked and I implemented a MockHttpTransport that simulates the Google Cloud Storage service similarly to what is done for the discovery-gce plugin. I think we're good on testing side.

Can you please have another look? That would be great.

@haizaar I can't give you any estimated date for this but I'd be happy to see it ready and testable for a beta release of 5.

@haizaar
Copy link
Copy Markdown

haizaar commented Apr 29, 2016

Tanguy, thanks for the update. Really looking forward to it.
On 29 Apr 2016 17:52, "Tanguy Leroux" notifications@github.com wrote:

@imotov https://github.com/imotov Thanks for your review! I removed the
GoogleCloudStorageClient class and moved the implementation in the blob
store.

Concerning the tests, I moved the ESBlobStoreRepositoryIntegTestCase,
ESBlobStoreContainerTestCase and ESBlobStoreTestCase classes into the
test framework so that they can be inherited in plugins (it looks like we
already have such abstract test classes like AbstractNumericTestCase in
the test framework). Then I created a dedicated test for each of theses
classes. I had to turn the GoogleCloudStorageService into an interface so
that it can be mocked and I implemented a MockHttpTransport that
simulates the Google Cloud Storage service similarly to what is done for
the discovery-gce plugin. I think we're good on testing side.

Can you please have another look? That would be great.

@haizaar https://github.com/haizaar I can't give you any estimated date
for this but I'd be happy to see it ready and testable for a beta release
of 5.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#13578 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename it to createClient?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@imotov
Copy link
Copy Markdown
Contributor

imotov commented May 16, 2016

Left a minor comment. LGTM.

@tlrx tlrx force-pushed the add-google-cloud-storage-support branch from 9e7fab1 to 35d3bda Compare May 19, 2016 12:05
@tlrx tlrx merged commit 35d3bda into elastic:master May 19, 2016
@tlrx tlrx added v5.0.0-alpha3 and removed review labels May 19, 2016
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented May 19, 2016

Thanks @imotov !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >feature v5.0.0-alpha3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Google Cloud Storage Repository

7 participants