Skip to content

Use Azure upload method instead of our own implementation#26751

Merged
dadoonet merged 12 commits intoelastic:masterfrom
dadoonet:pr/use-azure-upload-method
Sep 28, 2017
Merged

Use Azure upload method instead of our own implementation#26751
dadoonet merged 12 commits intoelastic:masterfrom
dadoonet:pr/use-azure-upload-method

Conversation

@dadoonet
Copy link
Copy Markdown
Contributor

We are not following the Azure documentation about uploading blobs to Azure storage. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-java-how-to-use-blob-storage#upload-a-blob-into-a-container

Instead we are using our own implementation which might cause some troubles and rarely some blobs can be not immediately commited just after we close the stream. Using the standard implementation provided by Azure team should allow us to benefit from all the magic Azure SDK team already wrote.

And well... Let's just read the doc!

We are not following the Azure documentation about uploading blobs to Azure storage. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-java-how-to-use-blob-storage#upload-a-blob-into-a-container

Instead we are using our own implementation which might cause some troubles and rarely some blobs can be not immediately commited just after we close the stream. Using the standard implementation provided by Azure team should allow us to benefit from all the magic Azure SDK team already wrote.

And well... Let's just read the doc!
@dadoonet dadoonet self-assigned this Sep 22, 2017
@dadoonet dadoonet requested a review from imotov September 22, 2017 10:31
dadoonet added a commit that referenced this pull request Sep 22, 2017
While working on #26751, I found that we are passing the container name on every single method although we don't need it as it is stored within the blobstore object already.

This PR simplifies a bit that part of the code.
Copy link
Copy Markdown
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for finding it out! Can we backport it? I think it would be great to switch to this sooner rather then later. It looks like we are missing quite a bit of logic by uploading the blob our old way.

@dadoonet
Copy link
Copy Markdown
Contributor Author

@imotov Thanks for the review. I did some manual testings this morning and it does not work.

Apparently the file master.dat-temp is not written in the azure container...
Getting an exception saying that the container does not exist although I can see it in the azure Web interface...

I'm digging... Probably something stupid on my end. :)

dadoonet added a commit to dadoonet/elasticsearch that referenced this pull request Sep 25, 2017
While working on elastic#26751 and doing some manual integration testing I found that this elastic#22858 removed an important line of our code:

`AzureRepository` overrides default `initializeSnapshot` method which creates metadata files and do other stuff.

But with PR elastic#22858, I wrote:

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
    }
```

instead of

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
        super.initializeSnapshot(snapshotId, indices, clusterMetadata);
    }
```

As we never call `super.initializeSnapshot(...)` files are not created and we can't restore what we saved.

Closes elastic#26777.
jasontedor pushed a commit that referenced this pull request Sep 25, 2017
While working on #26751 and doing some manual integration testing I found that this #22858 removed an important line of our code:

`AzureRepository` overrides default `initializeSnapshot` method which creates metadata files and do other stuff.

But with PR #22858, I wrote:

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
    }
```

instead of

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
        super.initializeSnapshot(snapshotId, indices, clusterMetadata);
    }
```

As we never call `super.initializeSnapshot(...)` files are not created and we can't restore what we saved.

Closes #26777.
jasontedor pushed a commit that referenced this pull request Sep 25, 2017
While working on #26751 and doing some manual integration testing I found that this #22858 removed an important line of our code:

`AzureRepository` overrides default `initializeSnapshot` method which creates metadata files and do other stuff.

But with PR #22858, I wrote:

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
    }
```

instead of

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
        super.initializeSnapshot(snapshotId, indices, clusterMetadata);
    }
```

As we never call `super.initializeSnapshot(...)` files are not created and we can't restore what we saved.

Closes #26777.
jasontedor pushed a commit that referenced this pull request Sep 25, 2017
While working on #26751 and doing some manual integration testing I found that this #22858 removed an important line of our code:

`AzureRepository` overrides default `initializeSnapshot` method which creates metadata files and do other stuff.

But with PR #22858, I wrote:

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
    }
```

instead of

```java
    @OverRide
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
        super.initializeSnapshot(snapshotId, indices, clusterMetadata);
    }
```

As we never call `super.initializeSnapshot(...)` files are not created and we can't restore what we saved.

Closes #26777.
This commit:

* removes IT `testForbiddenContainerName()` as it is useless. The plugin does not create anymore the container but expects that the user has created it before registering the repository
* merges 2 IT classes so all IT tests are ran from one single class
* We don't remove/create anymore the container between each single test but only for the test suite
@dadoonet
Copy link
Copy Markdown
Contributor Author

@imotov I worked on IT so we can now pass them when needed (still a manual operation).
I tried to simplify and remove non needed things.

I tested everything manually:

  • Install elasticsearch 7.0.0-alpha1-SNAPSHOT
  • Install repository-azure plugin
  • Run the following test:
# Clean test env
curl -XDELETE localhost:9200/foo?pretty
curl -XDELETE localhost:9200/_snapshot/my_backup1/snap1?pretty
curl -XDELETE localhost:9200/_snapshot/my_backup1?pretty

# Create data
curl -XPUT localhost:9200/foo/doc/1?pretty -H 'Content-Type: application/json' -d '{
 "foo": "bar"
}'
curl -XPOST localhost:9200/foo/_refresh?pretty

# Create repository using default account
curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -H 'Content-Type: application/json' -d '{
 "type": "azure"
}'

# Backup
curl -XPOST "localhost:9200/_snapshot/my_backup1/snap1?pretty&wait_for_completion=true"

# Delete existing index
curl -XDELETE localhost:9200/foo?pretty

# Restore using default account
curl -XPOST "localhost:9200/_snapshot/my_backup1/snap1/_restore?pretty&wait_for_completion=true"

# Check
curl -XGET localhost:9200/foo/_search?pretty

# Remove backup
curl -XDELETE localhost:9200/_snapshot/my_backup1/snap1?pretty
curl -XDELETE localhost:9200/_snapshot/my_backup1?pretty

Everything is correct. I'm going to test with a bigger dataset now and check everything works.
Could you give a final review on the code please as I changed some code recently?

Thanks!

@dadoonet
Copy link
Copy Markdown
Contributor Author

I tested with much more data (300mb) and everything is working well.
LMK! :)

@imotov
Copy link
Copy Markdown
Contributor

imotov commented Sep 26, 2017

@dadoonet would it make sense to base this tests on ESBlobStoreRepositoryIntegTestCase? I think this base class has most of the tests that we want to run a repo to ensure that it behaves reasonably. If you find it lacking something, I think it would make sense to extend it so all other repos would benefit

We already have nice tests written. Let's just use them.
@dadoonet
Copy link
Copy Markdown
Contributor Author

@imotov Great! I did not remember about that class. Yeah. Definitely better using it as well.

I pushed new changes.

Copy link
Copy Markdown
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for moving the tests! Now we just need to figure out how to get it into CI to run on a regular basis.

import org.junit.BeforeClass;

import java.net.URISyntaxException;
import java.net.UnknownHostException;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

@dadoonet dadoonet merged commit 1ccb497 into elastic:master Sep 28, 2017
@dadoonet dadoonet deleted the pr/use-azure-upload-method branch September 28, 2017 11:15
dadoonet added a commit that referenced this pull request Sep 28, 2017
* Use Azure upload method instead of our own implementation

We are not following the Azure documentation about uploading blobs to Azure storage. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-java-how-to-use-blob-storage#upload-a-blob-into-a-container

Instead we are using our own implementation which might cause some troubles and rarely some blobs can be not immediately commited just after we close the stream. Using the standard implementation provided by Azure team should allow us to benefit from all the magic Azure SDK team already wrote.

And well... Let's just read the doc!

* Adapt integration tests to secure settings

That was a missing part in #23405.

* Simplify all the integration tests and *extends ESBlobStoreRepositoryIntegTestCase tests

    * removes IT `testForbiddenContainerName()` as it is useless. The plugin does not create anymore the container but expects that the user has created it before registering the repository
   * merges 2 IT classes so all IT tests are ran from one single class
   * We don't remove/create anymore the container between each single test but only for the test suite

Backport of #26751 in 6.x branch
@dadoonet
Copy link
Copy Markdown
Contributor Author

I backported it on 6.x yet.

I'm planning to backport on 6.0 but it's a bit harder as some PR have not been merged to 6.0 like #23518 and #23405.

dadoonet added a commit to dadoonet/elasticsearch that referenced this pull request Sep 29, 2017
)

* Use Azure upload method instead of our own implementation

We are not following the Azure documentation about uploading blobs to Azure storage. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-java-how-to-use-blob-storage#upload-a-blob-into-a-container

Instead we are using our own implementation which might cause some troubles and rarely some blobs can be not immediately commited just after we close the stream. Using the standard implementation provided by Azure team should allow us to benefit from all the magic Azure SDK team already wrote.

And well... Let's just read the doc!

* Adapt integration tests to secure settings

That was a missing part in elastic#23405.

* Simplify all the integration tests and *extends ESBlobStoreRepositoryIntegTestCase tests

    * removes IT `testForbiddenContainerName()` as it is useless. The plugin does not create anymore the container but expects that the user has created it before registering the repository
   * merges 2 IT classes so all IT tests are ran from one single class
   * We don't remove/create anymore the container between each single test but only for the test suite

Backport of elastic#26751 in 6.0 branch
@dadoonet
Copy link
Copy Markdown
Contributor Author

Backported to 6.0 as well with 9aa5595

dadoonet added a commit to dadoonet/elasticsearch that referenced this pull request Sep 29, 2017
)

* Use Azure upload method instead of our own implementation

We are not following the Azure documentation about uploading blobs to Azure storage. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-java-how-to-use-blob-storage#upload-a-blob-into-a-container

Instead we are using our own implementation which might cause some troubles and rarely some blobs can be not immediately commited just after we close the stream. Using the standard implementation provided by Azure team should allow us to benefit from all the magic Azure SDK team already wrote.

And well... Let's just read the doc!

* Adapt integration tests
* Simplify all the integration tests and extends ESBlobStoreRepositoryIntegTestCase tests

    * removes IT `testForbiddenContainerName()` as it is useless. The plugin does not create anymore the container but expects that the user has created it before registering the repository
   * merges 2 IT classes so all IT tests are ran from one single class
   * We don't remove/create anymore the container between each single test but only for the test suite

Backport of elastic#26751 in 5.6 branch
dadoonet added a commit that referenced this pull request Oct 3, 2017
…26839)

* Use Azure upload method instead of our own implementation

We are not following the Azure documentation about uploading blobs to Azure storage. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-java-how-to-use-blob-storage#upload-a-blob-into-a-container

Instead we are using our own implementation which might cause some troubles and rarely some blobs can be not immediately commited just after we close the stream. Using the standard implementation provided by Azure team should allow us to benefit from all the magic Azure SDK team already wrote.

And well... Let's just read the doc!

* Adapt integration tests
* Simplify all the integration tests and extends ESBlobStoreRepositoryIntegTestCase tests

    * removes IT `testForbiddenContainerName()` as it is useless. The plugin does not create anymore the container but expects that the user has created it before registering the repository
   * merges 2 IT classes so all IT tests are ran from one single class
   * We don't remove/create anymore the container between each single test but only for the test suite

Backport of #26751 in 5.6 branch
@dadoonet
Copy link
Copy Markdown
Contributor Author

dadoonet commented Oct 3, 2017

Backported to 5.6 with 28f17a7 (see #26839)

dadoonet added a commit that referenced this pull request Oct 4, 2017
While working on #26751, I found that we are passing the container name on every single method although we don't need it as it is stored within the blobstore object already.

This commit simplifies a bit that part of the code.

It also removes `repositoryName` from AzureBlobStore which was not used anymore.
Also we move some properties in AzureBlobContainer to `private` members.
dadoonet added a commit that referenced this pull request Oct 4, 2017
While working on #26751, I found that we are passing the container name on every single method although we don't need it as it is stored within the blobstore object already.

This commit simplifies a bit that part of the code.

It also removes `repositoryName` from AzureBlobStore which was not used anymore.
Also we move some properties in AzureBlobContainer to `private` members.

Backport of #26752 in 6.x branch
@lcawl lcawl added v6.0.0-rc2 and removed v6.0.0 labels Oct 30, 2017
@lcawl lcawl removed the v6.1.0 label Dec 12, 2017
@clintongormley clintongormley added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository Azure labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v6.0.0-rc2 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants