Skip to content

Add middleware to override storage class.#666

Merged
gaul merged 1 commit intomasterfrom
tier-blobstore
Jul 26, 2024
Merged

Add middleware to override storage class.#666
gaul merged 1 commit intomasterfrom
tier-blobstore

Conversation

@gaul
Copy link
Owner

@gaul gaul commented Jul 20, 2024

This is best-effort and some storage classes do not map exactly, particularly for non-S3 object stores. Fixes #625.

@gaul gaul requested a review from timuralp July 20, 2024 14:36
@gaul
Copy link
Owner Author

gaul commented Jul 20, 2024

@michaelcourcy could you take a look at this to see if it meets your needs?

@michaelcourcy
Copy link

Hi @gaul, is there any reason that you chose the word tier instead of storageClass in the way you name your constant and class ?

public static final String PROPERTY_TIER_BLOBSTORE = "s3proxy.tier-blobstore";
public final class TierBlobStore extends ForwardingBlobStore

Maybe tier relates to something else, something more global that make sense in the context of s3proxy ?

@michaelcourcy
Copy link

michaelcourcy commented Jul 21, 2024

Ok I understand S3 is a more specific use case of jcloud. org.jclouds.blobstore.domain.Tier map to org.jclouds.s3.domain.StorageClass.

ARCHIVE -> REDUCED_REDUDANCY
INFREQUENT -> STANDARD_IA
STANDARD -> STANDARD

@gaul gaul force-pushed the tier-blobstore branch from ba64f5d to f71597c Compare July 21, 2024 12:02
@gaul
Copy link
Owner Author

gaul commented Jul 21, 2024

Using Tier is probably confusing to users -- perhaps the user-facing interface should only reference S3 storage classes? I imagine most users will use S3-S3 mapping although S3-other object store will probably require understanding the jclouds concepts. BTW the jclouds ARCHIVE tier actually maps to S3 deep archive storage class. Unfortunately there will be some lossy conversions unless we add further support to jclouds.

@michaelcourcy
Copy link

BTW the jclouds ARCHIVE tier actually maps to S3 deep archive storage class.

Ah I missed the mapping code

         case STANDARD: return StorageClass.STANDARD;
         case INFREQUENT: return StorageClass.STANDARD_IA;
         case COOL: return StorageClass.STANDARD_IA;
         case COLD: return StorageClass.GLACIER_IR;
         case ARCHIVE: return StorageClass.DEEP_ARCHIVE;

Unfortunately there will be some lossy conversions unless we add further support to jclouds.

It's already a lot and far enough for starting this new feature IMO.

I will try to test it against AWS S3 and then later against CEPH. It's been a long time I haven't build a maven project.

@gaul
Copy link
Owner Author

gaul commented Jul 21, 2024

Thinking about this some more, anything that requires reading the source code is a non-starter. I think this would work better if the user specifies the desired storage class in S3 terms and S3Proxy should log when it does a lossy conversion. If we round-trip the StorageClass -> Tier -> StorageClass we have lossless conversions for:

      STANDARD(Tier.STANDARD),
      STANDARD_IA(Tier.COOL),
      GLACIER_IR(Tier.COLD),
      DEEP_ARCHIVE(Tier.ARCHIVE);

And lossy conversions for:

      INTELLIGENT_TIERING(Tier.STANDARD),
      REDUCED_REDUNDANCY(Tier.STANDARD),
      ONEZONE_IA(Tier.COOL),
      GLACIER(Tier.ARCHIVE),

Logging the conversions makes even more sense for non-S3 object stores. But I imagine most users will use S3 and be happy with the lossless mapping of the first four storage classes.

@michaelcourcy
Copy link

michaelcourcy commented Jul 21, 2024

I think this would work better if the user specifies the desired storage class in S3 terms

If user proxy against a s3 for sure but if he proxies against Azure or GCP then this s3 mapping may be confusing and against the spirit of the project ? (I'm asking I'm really not sure)

anything that requires reading the source code is a non-starter.

Sure but we'll add a wiki page

@michaelcourcy
Copy link

@gaul My test fails but maybe I missed the configuration.

I checkout tier-blobstore branch and build

mvn package

s3proxy.properties I introduce s3proxy.tier-blobstore=INFREQUENT

s3proxy.tier-blobstore=INFREQUENT
s3proxy.endpoint=http://127.0.0.1:8080
s3proxy.authorization=aws-v2-or-v4
s3proxy.identity=local-identity
s3proxy.credential=local-credential 
jclouds.provider=aws-s3
jclouds.identity=<redacted>
jclouds.credential=<redacted>

launch s3proxy ./s3proxy --properties s3proxy.properties

then check my configuration works

export AWS_ACCESS_KEY_ID=local-identity
export AWS_SECRET_ACCESS_KEY=local-credential
aws s3 --endpoint-url http://127.0.0.1:8080  ls s3://mcourcy-testia/
                           PRE k10/
                           PRE mic/

And try to put a file

echo "put me in test" > test.txt
aws s3 --endpoint-url http://127.0.0.1:8080 cp test.txt s3://mcourcy-testia/test.txt

But when I check the storageclass of test.txt it's STANDARD instead of STANDARD_IA

aws  s3api head-object --endpoint-url http://127.0.0.1:8080 --bucket mcourcy-testia --key test.txt

I get

aws  s3api head-object --endpoint-url http://127.0.0.1:8080 --bucket mcourcy-testia --key test.txt
{
    "LastModified": "2024-07-21T14:52:10+00:00",
    "ContentLength": 15,
    "ETag": "\"a99a4b4b9e8204c1489f9f37dd21ec1f\"",
    "ContentEncoding": "",
    "ContentType": "text/plain",
    "Metadata": {},
    "StorageClass": "STANDARD"
}

I confirm by checking in AWS console
image

@michaelcourcy
Copy link

Also tried with

s3proxy.tier-blobstore=COOL

But did not work.

@gaul gaul force-pushed the tier-blobstore branch from f71597c to e52ce01 Compare July 21, 2024 15:41
@gaul
Copy link
Owner Author

gaul commented Jul 21, 2024

Sorry please try the latest proposed PR -- previously it lacked the changes to Main.java to actually accept the configuration.

@michaelcourcy
Copy link

It works now !
Here is the configuration I used

s3proxy.endpoint=http://127.0.0.1:8080
s3proxy.authorization=aws-v2-or-v4
s3proxy.identity=local-identity
s3proxy.credential=local-credential
s3proxy.tier-blobstore=INFREQUENT
jclouds.provider=aws-s3
jclouds.identity=<redacted>
jclouds.credential=<redacted>

@michaelcourcy
Copy link

I will test it against CEPH S3.

I created this wiki page to document it https://github.com/gaul/s3proxy/wiki/Middleware-Tier-blobstore

Copy link

@michaelcourcy michaelcourcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the little change I suggest in the comments but not a blocker IMO.

* s3proxy.tier-blobstore = VALUE
*
* VALUE can be anything from org.jclouds.blobstore.domain.Tier, e.g.,
* STANRDARD, COOL, COLD, ARCHIVE. org.jclouds.s3.domain.StorageClass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use COOL in the configuration

s3proxy.tier-blobstore=COOL

I get this error message

Exception in thread "main" java.lang.IllegalArgumentException: No enum constant org.jclouds.blobstore.domain.Tier.COOL
        at java.base/java.lang.Enum.valueOf(Enum.java:293)
        at org.jclouds.blobstore.domain.Tier.valueOf(Tier.java:24)
        at org.gaul.s3proxy.Main.parseMiddlewareProperties(Main.java:278)
        at org.gaul.s3proxy.Main.main(Main.java:125)

Look like we are limited by the enum values of org.jclouds.blobstore.domain.Tier which are

STANDARD
INFREQUENT
ARCHIVE

I would suggest to change this comment accordingly unless I pick up the wrong version of jcloud in the maven dependencies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely that's what happened in this version there is COOL and COLD
https://github.com/apache/jclouds/blob/master/blobstore/src/main/java/org/jclouds/blobstore/domain/Tier.java
I did not figure out why when I built s3proxy but that does not matter please ignore my comment.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jclouds 2.6.0 only has a few Tier constants but 2.7.0-SNAPSHOT includes a wider variety.

@gaul gaul force-pushed the tier-blobstore branch from e52ce01 to 635efc0 Compare July 22, 2024 12:48
@gaul gaul changed the title Add TierBlobStore to override Tier during putBlob Add middleware to override storage class. Jul 22, 2024
@gaul
Copy link
Owner Author

gaul commented Jul 22, 2024

This now maps and logs S3 storage classes instead of jclouds Tiers as previously discussed.

blobStore = storageClassBlobStore;
System.err.println("Requested tier: tier");
System.err.println("Overriden to: " +
StorageClass.fromTier(storageClassBlobStore.getTier()));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for S3-S3 mappings and is confusing for other storage backends.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wiki accordingly.

@gaul gaul force-pushed the tier-blobstore branch from 635efc0 to 52099cb Compare July 22, 2024 14:35
This is best-effort and some storage classes do not map exactly,
particularly for non-S3 object stores.  Fixes #625.
@gaul gaul force-pushed the tier-blobstore branch from 52099cb to 1ed2c92 Compare July 25, 2024 19:36
Copy link
Collaborator

@timuralp timuralp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Unfortunate that we're limited by the jclouds set of storage classes.

@gaul gaul merged commit 82e50ee into master Jul 26, 2024
@gaul gaul deleted the tier-blobstore branch July 26, 2024 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a middleware to change the storageclass

3 participants