Does an object lock in S3 helps prevent an object from being overwritten?

I am currently analyzing S3 Object lock feature and AWS specifies the below,

S3 Object Lock is used to prevent an object from being deleted or overwritten for a fixed amount of time or indefinitely

But AWS also specifies users to enable Versioning to use Object lock.

If Versioning is enabled, overwriting of objects can’t be done by default as it always create a newer version every time. How does then Object lock in S3 helps prevent an object from being overwritten? Am I missing something here?

Solution:

From Locking Objects Using Amazon S3 Object Lock – Amazon Simple Storage Service:

Amazon S3 Object Lock works only in versioned buckets, and retention periods and legal holds apply to individual object versions. When you lock an object version, Amazon S3 stores the lock information in the metadata for that object version. Placing a retention period or legal hold on an object protects only the version specified in the request. It doesn’t prevent new versions of the object from being created. If you put an object into a bucket that has the same key name as an existing, protected object, Amazon S3 creates a new version of that object, stores it in the bucket as requested, and reports the request as completed successfully. The existing, protected version of the object remains locked according to its retention configuration.

So, Object Lock does not prevent an object from being overwritten or a new version from being created. Only a specific version of the object is locked, which cannot be deleted. Other operations are permitted.

For example, I created an objected and locked it with Legal Hold. I then renamed the object. This resulted in the addition of a Delete Marker and a new object was created with the changed name.

Export big data from PostgreSQL to AWS s3

I have ~10TB of data in the PostgreSQL database. I need to export this data into AWS S3 bucket.

I know how to export into the local file, for example:

CONNECT DATABASE_NAME;
COPY (SELECT (ID, NAME, ADDRESS) FROM CUSTOMERS) TO ‘CUSTOMERS_DATA.CSV WITH DELIMITER '|' CSV;

but I don’t have the local drive with 10TB size.

How to directly export to AWS S3 bucket?

Solution:

When exporting a large data dump your biggest concern should be mitigating failures. Even if you could saturate a GB network connection, moving 10 TB of data will take > 24 hours. You don’t want to have to restart that due to a failure (such as a database connection timeout).

This implies that you should break the export into multiple pieces. You can do this by adding an ID range to the select statement inside the copy (I’ve just edited your example, so there may be errors):


COPY (SELECT (ID, NAME, ADDRESS) FROM CUSTOMERS WHERE ID BETWEEN 0 and 1000000) TO ‘CUSTOMERS_DATA_0.CSV WITH DELIMITER '|' CSV;

You would, of course, generate these statements with a short program; don’t forget to change the name of the output file for each one. I recommend picking an ID range that gives you a gigabyte or so per output file, resulting in 10,000 intermediate files.

Where you write these files is up to you. If S3FS is sufficiently reliable, I think it’s a good idea.

By breaking the unload into multiple smaller pieces, you can also divide it among multiple EC2 instances. You’ll probably saturate the database machine’s bandwidth with only a few readers. Also be aware that AWS charges $0.01 per GB for cross-AZ data transfer — with 10TB that’s $100 — so make sure these EC2 machines are in the same AZ as the database machine.

It also means that you can perform the unload while the database is not otherwise busy (ie, outside of normal working hours).

Lastly, it means that you can test your process, and you can fix any data errors without having to run the entire export (or process 10TB of data for each fix).

On the import side, Redshift can load multiple files in parallel. This should improve your overall time, although I can’t really say how much.

One caveat: use a manifest file rather than an object name prefix. I’ve run into cases where S3’s eventual consistency caused files to be dropped during a load.

AWS S3 Can't do anything with one file

I’m having issues trying to remove a file from my s3 bucket with the following name: Patrick bla bla 1 PV@05-06-2018-19:42:01.jpg

If I try to rename it through the s3 console, it just says that the operation failed. If I try to delete it, the operation will “succeed” but the file will still be there.

I’ve tried removing it through the aws cli, when listing the object I get this back

 {
        "LastModified": "2018-06-05T18:42:05.000Z",
        "ETag": "\"b67gcb5f8166cab8145157aa565602ab\"",
        "StorageClass": "STANDARD",
        "Key": "test/\bPatrick bla bla 1 PV@05-06-2018-19:42:01.jpg",
        "Owner": {
            "DisplayName": "dev",
            "ID": "bd65671179435c59d01dcdeag231786bbf6088cb1ca4881adf3f5e17ea7e0d68"
        },
        "Size": 1247277
    },

But if I try to delete or head it, the cli won’t find it.

s3api head-object --bucket mybucket --key "test/\bPatrick bla bla 1 PV@05-06-2018-20:09:37.jpg"

An error occurred (404) when calling the HeadObject operation: Not Found

Is there any way to remove, rename or just move this image from the folder?

Regards

Solution:

It looks like your object’s key begins with a backspace (\b) character. I’m sure there is a way to manage this using the awscli but I haven’t worked out what it is yet.

Here’s a Python script that works for me:

import boto3 
s3 = boto3.client('s3')
Bucket ='avondhupress'
Key='test/\bPatrick bla bla 1 PV@05-06-2018-19:42:01.jpg'
s3.delete_object(Bucket=bucket, Key=key)

Or the equivalent in node.js:

const aws = require('aws-sdk');
const s3 = new aws.S3({ region: 'us-east-1', signatureVersion: 'v4' });

const params = {
  Bucket: 'avondhupress',
  Key: '\bPatrick bla bla 1 PV@05-06-2018-19:42:01.jpg',
};

s3.deleteObject(params, (err, data) => {
  if (err) console.error(err, err.stack);
});

How to securely access a file in the application using s3 bucket URL

In my application we have to open some pdf files in a new tab on click of an icon using the direct s3 bucket url like this:

http://MyBucket.s3.amazonaws.com/Certificates/1.pdf?AWSAccessKeyId=XXXXXXXXXXXXX&Expires=1522947975&Signature=XXXXXXXXXXXXXXXXX

Some how i feel this is not secure as the user could see the bucket name, AWSAccessKeyId,Expiration and Signature. Is this still considered secure ? Or is there a better way to handle this ?

Solution:

Allowing the user to see these parameters is not a problem because;

  1. AWSAccessKeyId can be public (do not confuse with SecretAccessKey)
  2. Expires and signature is signed with your SecretAccessKey so no one will be able to manipulate it (aws will validate it against you SecretKey)
  3. Since you don’t have public objects and your bucket itself is not public, then it is ok to the user knowing your bucket name – you will always need a valid signature to access the objects.

But I have two suggestions for you; 1. Use your own domain, so the bucket is not visible (you can use free SSL provided by AWS if you use CloudFornt), 2. Use HTTPS instead of plain HTTP.

And if for any reason you absolutely dont want your users to see AWS parameters, then I suggest that you proxy the access to S3 via your own API. (though I consider it unnecessary)

Transfer files from S3 to untrusted server via intermediary

I have two servers that have an encrypted line of communication between them. One of these devices I trust with my AWS credentials and it can access files on my S3 bucket via boto or aws cli etc. The other one I do not trust with any aws credentials so normally it cannot access my files.

I am trying to come up with a way to get a file from S3 to the untrusted server but my best idea so far is to download the files to the trusted server and then send them to the other. Is there any better way to do this?

I don’t want to use CDN or make my files public since I don’t want anyone but the two servers to access them.

Thanks!

Solution:

You can generate presigned URLs on the trusted servers and send those to the untrusted server that can then use those URLs to safely download the file. The URLs don’t require the untrusted server to hold any keys. They also have a limited time-to-live so you can limit your exposure if those leak for some reason.

This way you can allow the untrusted server to access only the files you want for the period of time you want.

aws s3 presign s3://mybucket/myfile --expires-in 60

S3 signature does not match on getSignedUrl serverside node

I’m trying to put a video file to my bucket using a pre-signed url in angular4.

Node:

let s3 = new AWS.S3();
      s3.config.update({
        accessKeyId: process.env.VIDEO_ACCESS_KEY,
        secretAccessKey: process.env.VIDEO_SECRET_KEY
      })
      let videoId = await Video.createVideo()
      let params = {
        ACL: "public-read",
        Bucket: process.env.BUCKET_NAME,
        ContentType: 'video/mp4',
        Expires: 100,
        Key: req.jwt.username+"/"+videoId,
      }
      return s3.getSignedUrl('putObject', params, function (err, url) {
        if(!err) {
          console.log(url);
          res.status(200);
          res.json({
            url: url,
            reference: `${process.env.BUCKET_NAME}/${req.jwt.username}/${videoId}`,
            acl: params.ACL,
            bucket: params.Bucket,
            key: params.Key,
            contentType: params.ContentType,
          });
        } else {
          console.log(err);
          res.status(400);
          res.json({
            message: "Something went wrong"
          })
        }
      });

This successfully generates a url for me, and I try to use it in my post request in the front end.

Angular:

this.auth.fileUpload().subscribe((result) => {
        console.log(result["key"], result["acl"], result["bucket"], result["contentType"])
        if(!result["message"]) {
          let formData = new FormData();
          formData.append('file', file.files[0]);
          const httpOptions = {
            headers: new HttpHeaders({
              "Key": result["key"],
              "ACL": result["acl"],
              "Bucket": result["bucket"],
              "Content-Type": result["contentType"],
            })
          };
          this.http.post(result["url"], formData, httpOptions ).subscribe((response) => {
            console.log("response");
            console.log(response);
            let reference = `https://s3.amazonaws.com/${result["reference"]}`
            this.auth.makeVideo(result["reference"]).subscribe((result) => {
              console.log(result);
            });
          }, (error) => {
            console.log("error");
            console.log(error);
          })

But this generates an error.

SignatureDoesNotMatch
The request signature we calculated does not match the signature you provided. Check your key and signing method

Here’s the URL that I generate

https://MY_BUCKET_HERE.s3.amazonaws.com/admin/87f314f1-9f2e-462e-84ff-25cba958ac50?AWSAccessKeyId=MY_ACCESS_KEY_HERE&Content-Type=video%2Fmp4&Expires=1520368428&Signature=Ks0wfzGyXmBTiAxGkHNgcYblpX8%3D&x-amz-acl=public-read

I’m pretty sure I’m just making a simple mistake, but I can’t figure it out for the life of me. Do I need to do something with my headers? Do I need to change the way I read the file for the post? I’ve gotten it to work with a public bucket with FormData and a simple post request with no headers, but now that I’m working with Policies and a private bucket, my understanding is much less. What am I doing wrong?

Solution:

If you generate a pre-signed URL for PutObject then you should use the HTTP PUT method to upload your file to that pre-signed URL. The POST method won’t work (it’s designed for browser uploads).

Also, don’t supply HTTP headers when you invoke the PUT. They should be supplied when generating the pre-signed URL, but not when using the pre-signed URL.

Any way to get the cost per file on s3 or cloudfront?

I have a small static site hosted on s3, served through cloudfront. I’ve been trying to reduce the costs but now I’m at a point that I really just need to know which files are getting downloaded the most. How can I figure that out?

Thanks.

Solution:

Take a look at your CloudFront distribution statistics. In the AWS Management Console, go to CloudFront, and select “Popular Objects” from the left navigation panel.

It will show you the the following statistsics:

  • requested URL (into CloudFront)
  • whether it was a cache hit or miss,
  • bytes for misses (this tells you how many bytes are read from your origin)

Edit base.css file from S3 Bucket

I’m using AWS S3 to serve my static files – however I’ve just found out you can’t edit them directly from S3, which kind of makes it pointless as I will be continuously changing things on my website. So – is the conventional way to make the changes then re-upload the file? Or do most developers store their base.css file in their repository so it’s easier to change?

Because I’m using Django for my project so there is only supposed to be one static path (for me that’s my S3 bucket) – or is there another content delivery network where I can directly edit the contents of the file on the go which would be better?

Solution:

Yes, imo it would be unusual to edit the files of your production website directly from where they are served.

Edit them locally, check them into your repo and then deploy them to s3 from your repo, perhaps using a tool like Jenkins. If you make a mistake, you have something to roll back to.

I can’t think of any circumstances where editing your files directly in production is a good idea.

Why does AWS warn "Do not grant public read access to this object(s) (Recommended)"

This is the only way to make files/images available to everyone on your website – otherwise they won’t be able to see it. Same with css files if they arn’t publicly readable then the website won’t have styling so why does it warn to not grant access?

Solution:

It warns everyone, because lots of people make the wrong things public by accident – check the news lately – but if you are serving up web content, you can safely ignore the warning….

Installing aws-sdk-php in local machine — http request failed

I am trying to install aws-sdk-php in my local machine using this reference url

When I try — php composer.phar require aws/aws-sdk-php

It gives me the following error:

[Composer\Downloader\TransportException] The
http://packagist.org/p/aws/aws-sdk-php%24a8b264f89dd462e84ffdc6487d616b6126bbc20d00351e8a9daf81c5d66f2305.json
file could not be downloaded: failed to open stream: HTTP request
failed!

require [–dev] [–prefer-source] [–prefer-dist] [–no-progress]
[–no-suggest] [–no-update] [–no-scripts] [–update-no-dev]
[–update-with-dependencies] [–update-with-all-dependencies]
[–ignore-platform-reqs] [–prefer-stable] [–prefer-lowest]
[–sort-packages] [-o|–optimize-autoloader]
[-a|–classmap-authoritative] [–apcu-autoloader] [–] []…

Any help/hint is highly appreciated. Thanks in advance.

Solution:

php -r "file_get_contents('http://packagist.org/p/aws/aws-sdk-php%24a8b264f89dd462e84ffdc6487d616b6126bbc20d00351e8a9daf81c5d66f2305.json');" 

try this way

OR

hit this: composer config -g repo.packagist composer https://packagist.org

and try YOUR way again