Skip to content

UKHomeOffice/file-vault

Repository files navigation

File-vault a RESTful service to store and retrieve files

File-vault is a simple REST service that allows POSTing a file to an S3 bucket. Upon a successful virus check the service will return with a URL that can be used to retrieve the file.

How this works

Filevault simply adds or removes images/objects to an S3 bucket. It uses AWS Access Key IDs, Secret Access Keys and KMS Key IDs to do this which are supplied by ACP (or by the team in charge of creating them).

However, Filevault also can whitelist file types, how long signed urls it receives from S3 last for, and what aws signature version to use.

When a user POSTs to this service the image is uploaded to the S3 bucket, and a signed url is returned. This is then encrypted for security using a default encryption algorithm, randomly generated IV Vector and uses the AWS Password secret as the key to complete the encryption. This is then returned in a JSON response back to the server that called Filevault with an ID which represents the S3 Bucket Key ID (/file/:id) for the image specified, a date query parameter (/file/:id?date=) that corresponds to when the upload occurred, and the ID query parameter (/file/:id?date=&id=<encrypted_signed_S3_link) which represents the signed link sent back from S3 to access the image. To reiterate ID query parameter is actually the signed uri sent back from S3 which has been encrypted by crypto and the AWS Password as mentioned above.

Decrypt_Deprecated - This uses crypto.createDecipher which is deprecated and will be lost from future versions of Node. However, to maintain backwards compatibility, the service looks out for a ':' separater in the ID query parameter to determine if there is an encoded IV Vector or not which the new method uses. If it isn't present, the old method of decryption is used. Example:

http://localhost:3000/file/ecf69543b7475297e885d6e94450fc75?date=20210804T152612Z&id=98cc7a349e0543b3e92758461a674a0919b423d17d27a42414ba5cf0c20b4c4e8a2dd838c6f536c84f929f3df02e93f7fe2f2316f3888c894f7d2eee6f924835

Decrypt - This uses crypto.createDecipheriv which requires the original password to be 16 bytes in length and this is done using the Buffer. Also for an IV Vector to be used which is randomly generated and passed on in the ID query parameter. Example:

http://localhost:3000/file/97ebbf4916250d24c7724044d1e1a54d?date=20210804T153817Z&id=75219fd49fe3d34a46b213f162bf05dc:c38868e0cad4596bb62c0feb04f86245ed188c944a2c231d718ecd83a8e988351900e01f2ecf958e8334e02a6e44cbb8ccebfbe1b1cb84d6d997017fc33e3d6d

N.B. if you are getting either 502 errors through Nginx and the Nginx logs are saying 'upstream prematurely closed connection while reading response header' OR you see this error below if running filevault locally, this is due to issues with decryption and the AWS Password or signature in the ID query parameter falling out of sync with the service. This is usually due to a code change or using a different file vault image in your drone file (i.e. switching the filevault image SHA). This is most likely to be discovered during Testing and should not be an issue in production unless the AWS Password or default algorithm has been changed suddenly. Beware: this could block caseworkers from accessing previously submitted material to S3.

{
  "code": "ERR_UNESCAPED_CHARACTERS"
}

Configuration

The following environment variables are used to configure file-vault.

  FILE_VAULT_URL            | URL of file-vault, this is used when returning a URL upon successful upload to S3
  CLAMAV_REST_URL           | Location of ClamAV rest service
  AWS_ACCESS_KEY_ID         | AWS Key ID
  AWS_SECRET_ACCESS_KEY     | AWS Secret Access Key
  AWS_KMS_KEY_ID            | AWS KMS Key ID
  AWS_REGION                | AWS Region (defaults to eu-west-1)
  AWS_SIGNATURE_VERSION     | AWS Signature Version (defaults to v4)
  AWS_BUCKET                | AWS Bucket Name
  AWS_EXPIRY_TIME           | Length of time (in seconds) the URL will be valid for (defaults to 1 hour)
  AWS_PASSWORD              | A password used to encrypt the params that are returned by file-vault
  STORAGE_FILE_DESTINATION  | Temp directory for storing uploaded file (this is deleted on upload or fail and defaults to 'uploads')
  REQUEST_TIMEOUT           | Length of time (in seconds) for timeouts on http requests made by file-vault (when talking to clamAV and s3, defaults to 15s)
  FILE_EXTENSION_WHITELIST  | A comma separated list of file types that you want to white-list (defaults to everything). If the file is not in this list file-vault will respond with an error.
  MAX_FILE_SIZE             | The maximum file size that Clam AV will scan (bytes). Default is 105 mb.

Tutorial

This tutorial explains how to set up the different components of AWS s3, keycloak and the filevault configuration file. This will then allow you to run a local instance of filevault in docker-compose so you can post a document.

AWS S3

Make sure you have an AWS s3 instance created.

AWS S3 Secrets

Grab the secrets. In kubernetes you can do this

kubectl get secrets notify-secret -o yaml

This should return your secrets like so

  access_key_id: <your-access-key-id>
  kms_key_id: <your-kms-key-id>
  name: <your-bucket-name>
  secret_access_key: <your-secret-access-key>

Note: that each item in the secret is likely to be base64 encoded and you'll need to decode it. You can do this on the terminal like so

echo <secret> | base64 -D

AWS CLI

Now check that these secrets are valid. The best way to do this is to use the AWS-CLI. You'll need to download & install it.

AWS Credentials

You'll need to set up your AWS credentials

Now you should be able to access your bucket

aws s3 ls s3://<your-s3-bucket-name>

If your bucket is empty, this is not going to return anything.

Upload to AWS

Next try and post to the bucket

aws s3 cp --sse aws:kms --sse-kms-key-id <kms-key-id> <file> s3://<bucket-name>

If the post was successful, the command line will return something like the following

upload: ./myfile.txt to s3://my-bucket/myfile.txt

Keycloak

Keycloak realm

You will need a keycloak realm set up something like

https://sso-dev.notprod.homeoffice.gov.uk/auth/realms/<my-realm>

Client ID and Client secret

You will need to create a client in keycloak. You may need to ask your administrator to do this if you do not have access

  • Go to Keycloak -> Applications -> Security Admin console -> Clients -> Create
  • Name the client ID
  • Enable Direct Access Grants
  • Select the Credentials tab
  • Keep a note of the Client secret. You will need this later
  • Set the Valid Redirect URIs to localhost

Roles

You will also need to create a role

  • Go to Keycloak -> Roles (located on the left) -> Add role
  • Call the role caseworkers

Groups

You will also need to create a group

  • Go to Keycloak -> Groups (located on the left) -> New
  • Call the group something
  • open the group -> role mappings -> assign roles as caseworkers

Users

You will also need to create a user

  • Go to Keycloak -> Users (located on the left) -> Add user
  • Give the user an username and password

Docker-compose

The best way to run the service is to use docker-compose. However, you'll need to make sure you change and obtain the following configuration details in the docker-compose.yml file:

- PROXY_CLIENT_SECERT=<client-secret>
- PROXY_CLIENT_ID=<client-id>
- PROXY_DISCOVERY_URL=<keycloak-realm-url>

You can grab the client-id, client-secret and keycloak-realm-url from Keycloak as described above.

Build & Run

  • docker-compose build
  • docker-compose up

bearer token

Request a bearer token from keycloak. Note the keycloak url is different to your normal url

curl -X POST https://<domain-of-host-realm>/auth/realms/<my-realm>/protocol/openid-connect/token -d "username=<your-username>" -d 'password=<your-password>' -d 'grant_type=password' -d 'client_id=<your-client-id>' -d 'client_secret=<your-client-id>'

This will return a long bearer token in JSON

{"access_token":"<bearer-token-returned>","expires_in":300,"refresh_expires_in":1800,"refresh_token":"<bearer-token-returned>","token_type":"bearer","not-before-policy":0,"session_state":"<session-stat-number>","scope":"email profile"}

Upload a document via filevault

Ensure you have the bearer token and you use it before it expires.

Also ensure you have the path of a file to POST.

curl -H "Authorization: Bearer <bearer-token>" -F 'document=@/Users/Name/my-file.txt' https://localhost/file -kv

Note: that the end point is localhost/file

This will return a url something like

{"url":"http://localhost/file/<filename>?date=<date>&id=<random-id>"}

Copy and paste the url into the browser. You will need to log into office 365. Your file should be there

Git Tags and Release Workflow

This repository uses Git tags to trigger the release pipeline, build container images, and push them to the Quay.io container registry.

Workflow Overview

Developers push a Git tag following Semantic Versioning (e.g., 1.0.0).

The Drone CI pipeline is automatically triggered only when the tag is pushed from the master branch.

A Docker image is built and pushed to Quay.io.

The image is tagged with:

the semantic version (e.g., 1.0.0)

a content-addressable digest (@sha256:...)

The complete image reference can be used in the format: quay.io/yourorg/your-image:1.0.0@sha256:

Tagging for Releases

To release a new version, follow these steps on the master branch only:

Make sure you're on the master branch

git checkout master

Create and push a semantic version tag

git tag 1.2.3 git push origin 1.2.3

Alternatively, We can create Tags from Git Hosting UI instead of CLI commands

We can also create tags directly from Git hosting provider’s web interface e.g., GitHub

Go to the Releases or Tags section of the repository

Click "Create a new release" or "Add tag"

Use the proper version format (e.g., 1.2.3) and make sure it points to the master branch

This is a convenient way for team members to trigger a release without using the command line.

Release Tagging Guidelines for Contributors

When creating a new Git tag (either via CLI or Git UI), please follow these practices to ensure clear, traceable, and production-ready releases:

Attach release notes or changelogs to a tag

Link to issues, PRs, and milestones

Create pre-releases for testing before full deployment

This turns a simple tag into a full-fledged release artifact.

Important:

Use valid Semantic Versioning format: v.. (e.g., 1.0.0, 2.3.1)

The Drone CI pipeline is configured to only trigger on tags created from the master branch.

Reason for Usage of image:tag@digest

The format image:tag@digest combines:

Tag (human-readable version, like 1.2.3)

Digest (immutable SHA-256 content identifier)

The digest SHA (sha256:) is a cryptographic hash that uniquely identifies the image content. You can retrieve it from Quay.io after the image is pushed:

This guarantees:

'Consistency' – The image always resolves to the same content.

'Traceability' – You can trace exactly which build and source it came from.

'Security' – Prevents tampering or tag overwriting in registries.

About

File-vault is a simple REST service that allows POSTing a file to an S3 bucket. Upon a successful virus check the service will return with a URL that can be used to retrieve the file.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors