Cloudfront in front of S3 using response-content-disposition

At the Science History Institute Digital Collections, we have a public collection of digitized historical materials (mostly photographic images of pages). We store these digitized assets — originals as well as various resizes and thumbnails used on our web pages — in AWS S3.

Currently, we provide access to these assets directly from S3. For some of our deliveries, we also use the S3 feature of a response-content-disposition query parameter in a signed expiring S3 url, to have the response include an HTTP Content-Disposition header with a filename and often attachment disposition, so when the end-user saves the file they get a nice humanized filename (instead of our UUID filename on S3), supplied dynamically at download time — while still sending the user directly to S3, avoiding the need for a custom app proxy layer.

While currently we’re sending the user directly to urls in S3 buckets set with public non-authenticated access, we understand a better practice is putting a CDN in front like AWS’s own CloudFront. In addition to the geographic distribution of a CDN, we believe this will give us: better more consistent performance even in the same AWS region; possibly some cost savings (although it’s difficult for me to compare the various different charges over our possibly unusual access patterns); and additionally access to using AWS WAF in front of traffic, which was actually our most immediate motivation.

But can we keep using the response-content-disposition query param feature to dynamically specify a content-disposition header via the URL? It turns out you certainly can keep using response-content-disposition through CloudFront. But we found it a bit confusing to set up, and think through the right combination features and their implications, with not a lot of clear material online.

So I try to document here the basic recipe we have used, as well as discuss considerations and details!

Recipe for CloudFront distribution forwarding response-content-disposition to S3

  • We need CloudFront to forward response-content-disposition header to s3 — by default it leaves off query string (after ? in a URL) when forwarding to origin. You might reach for a custom Origin Request Policy, but it turns out we’re not going to need it, because a Cache Policy will take care of it for us.
  • If we’re returning varying content-disposition headers, we need a non-default Cache Policy such that the cache key varies based on response-content-disposition too — otherwise changing the content-disposition in query param might get you a cached response with old stale content-disposition.
    • We can create a Cache Policy based on the managed CachingOptimized policy, but adding the query params we are interested in.
    • It turns out including URL query params in a Cache Policy automatically leads to them being included in origin requests, so we do NOT need a custom Origin Request Policy. Only a custom Cache Policy that includes response-content-disposition
  • OK, but for the S3 origin to actually pay attention to the response-content-disposition` header, you need to set up a CloudFront Origin Access Control  (OAC) given access to the S3 bucket, and set to “sign requests”. Since S3 only respects this header for signed requests.
    • You don’t actually need to restrict the bucket to only allow requests from CloudFront, but you probably want to make sure all your buckets requests are going through cloudfront?
    • You don’t need to restrict the CloudFront distro to Restrict viewer access, but there may be security implications of setting up response-content-disposition forwarding with non-restircted distro? More discussion below.
    • Some older tutorials you may find use AWS “Origin Access Identity (OAI)” for this, but OAC is the new non-deprecated way, don’t follow those tutorials.
    • Setting this all up has a few steps, and but this CloudFront documentation page leads you through it.

At this point your Cloudfront distribution is working to forward response-content-disposition headers, and return the resultant content-disposition headers in response — Cloudfront by default forwards on all response headers from origin, by default if you haven’t set a distribution behavior “Response headers policy”. Even setting a response headers policy like Managed-CORS-with-preflight-and-SecurityHeadersPolicy (which is what I often need), it seems it forwards on other response headers like content-disposition no problem.

Security Implications of Public Cloudfront with response-content-disposition

An S3 bucket can be set to allow public access, as I’ve done with some buckets with public content. But to use the response-content-disposition or response-content-type query param to construct a URL that dynamically chooses a content-disposition or content-type — you need to use an S3 presigned url (or some other form of auth I guess), even on a public bucket! “These parameters cannot be used with an unsigned (anonymous) request.”

Is this design intentional? If this wasn’t true, anyone could construct a URL to your content that would return a response with their chosen content-type or content-disposition headers. I can think of some general vague hypothetical ways this could be used maliciously, maybe?

But by setting up a CloudFront distribution as above, it is possible to set things up so an unsigned request can do exactly that. http://mydistro.cloudfront.net/content.jpg?response-content-type=application%2Fx-malicious, and it’ll just work without being signed. Is that a potential security vulnerability? I’m not sure, but if so you should not set this up without also setting the distribution to have restricted viewer access and require (eg) signed urls. That will require all urls to the distribution to be signed though, not just the ones with the potentially sensitive params.

What if you want to use public un-signed URLs when they don’t have these sensitive params; but require signed URLs when they do have these params? (As we want the default no-param URLs to be long-cacheable, we don’t want them all to be unique time-limited!)

Since CloudFront “restricted access” is set for the entire distribution/behavior, you’d maybe need to use different distributions both pointed at the same origin (but with different config). Or perhaps different “behaviors” at different prefix paths within the same distribution. Or maybe there is a way to use custom Cloudfront functions or lambdas to implement this, or restrict it in some other way? I don’t know much about that. It is certainly more convoluted to try to set up something like how S3 alone works, where straight URLs are public and persistent, but URLs specifying response headers are signed and expiring.

Other Considerations

You may want to turn on logging for your CloudFront distro. You may want to add tags to make cost analysis easier.

In my buckets, all keys have unique names using UUID or content digests, such that all URLs should be immutable and cacheable forever. I want the actual user-agents making the request o get far-future cache-control headers. I try to set S3 cache-control metadata with far-future expiration. But if some got missed or I change my mind about what these should look like, it is cumbersome (and has some costs) to try to check/reset metadata on many keys. Perhaps I want the CloudFront distro/behavior to force add/overwrite far-future cache-control header itself? I could do that either with a custom response headers policy (might want to start with one of the managed policies, and copy/paste it modifying to add cache-control header), or perhaps a custom origin request policy that added on a S3 response-cache-control query param to ask S3 to return a far-future cache-control header. (You might want to make sure you aren’t telling the user-agent to cache error messages from origin though!)

You may be interested in limiting to a CloudFront price class to control costs.

Terraform example

Terraform files demonstrating what is described here can be found: https://gist.github.com/jrochkind/4edcc8a4a1abf090a771a3e0324f6187

More detailed explanation below.

Detailed Implementation Notes and Examples

Custom Cache Policy

Creating cache polices discussed in AWS docs.

Documentation that Cache Policy results in query params being included in origin requests from documentation on Control origin requests with a policy.

Although the two kinds of policies are separate, they are related. All URL query strings, HTTP headers, and cookies that you include in the cache key (using a cache policy) are automatically included in origin requests. Use the origin request policy to specify the information that you want to include in origin requests, but not include in the cache key. Just like a cache policy, you attach an origin request policy to one or more cache behaviors in a CloudFront distributionz

You set a cache policy for your distribution (or specific behavior) by editing a Behavior here:

I created the Cache Policy with TTL values from “CachingOptimized” managed behavior, and added the query params I was interested in:

Which looks like this in terraform:

 resource "aws_cloudfront_distribution" "example-test2" {
      # etc
      default_cache_behavior {
          cache_policy_id        = "658327ea-f89d-4fab-a63d-7e88639e58f6"
      }
}

resource "aws_cloudfront_cache_policy"  "jrochkind-test-caching-optimized-plus-s3-params" {
  name        = "jrochkind-test-caching-optimized-plus-s3-params"
  comment     = "Based on Managed-CachingOptimized, but also forwarding select S3 query params"
  default_ttl = 86400
  max_ttl     = 31536000
  min_ttl     = 1
  parameters_in_cache_key_and_forwarded_to_origin {
    enable_accept_encoding_brotli = true
    enable_accept_encoding_gzip   = true

    cookies_config {
      cookie_behavior = "none"
    }
    headers_config {
      header_behavior = "none"
    }
    query_strings_config {
      query_string_behavior = "whitelist"
      query_strings {
        items = [
          "response-content-disposition",
          "response-content-type"
        ]
      }
    }
  }
}

Cloudfrong Origin Access Control (OAC) to sign requests to S3

Covered in CloudFront docs Restrict access to an Amazon Simple Storage Service origin, which lead you through it pretty nicely.

While you could leave off the parts that actually restrict access (say allowing public access), and just follow the parts for setting up an OAC to sign requests… you probably also want to restrict access to s3 so only CloudFront has it, not the public?

Relevant terraform follows. (You may want to use templating feature for the json policy, shown in complete example above).

resource "aws_cloudfront_distribution" "example-test2" {
    # etc
    origin {
        connection_attempts = 3
        connection_timeout  = 1
        domain_name         = aws_s3_bucket.example-test2.bucket_regional_domain_name
        origin_id           = aws_s3_bucket.example-test2.bucket_regional_domain_name
        origin_access_control_id = aws_cloudfront_origin_access_control.example-test2.id
    }
}

resource "aws_s3_bucket_policy" "example-test2" {
    bucket = "example-test2"
    
    policy = jsonencode(
        {
            Id        = "PolicyForCloudFrontPrivateContent"
            Statement = [
                {
                    Action    = "s3:GetObject"
                    Condition = {
                        StringEquals = {
                            "AWS:SourceArn" = aws_cloudfront_distribution.example-test2.arn
                        }
                    }
                    Effect    = "Allow"
                    Principal = {
                        Service = "cloudfront.amazonaws.com"
                    }
                    Resource  = "arn:aws:s3:::example-test2/*"
                    Sid       = "AllowCloudFrontServicePrincipal"
                  },
            ]
            Version   = "2008-10-17"
        }
    )
}

resource "aws_cloudfront_origin_access_control" "example-test2" {
  description                       = "Cloudfront signed s3"
  name                              = "example-test2"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

Restrict public access to CloudFront

We want to require signed urls with our CloudFront distro, similar to what would be required with a non-public S3 bucket directly. Be aware that CloudFront uses a different signature algorithm and type of key than s3 and expirations can be further out.

See AWS doc at Serve private content with signed URLs and signed cookies.

  • Create a public/private RSA key pair
    • openssl genrsa -out private_key.pem 2048
    • extrat just public key with openssl rsa -pubout -in private_key.pem -out public_key.pem
    • Upload the public_key.pem to CloudFront “Public Keys”, and keep the private key in a secure place yourself.
  • Create a CloudFront “Key Group”, and select that public key from select menu
  • In the Distribution “Behavior”, select “Restrict Viewer Access”, to a “Trusted Key Group”, and choose the Trusted Key Group you just created.

Now all CloudFront URLs for this distribution/behavior will need to be signed to work, or else you’ll get an error Missing Key-Pair-Id query parameter or cookie value. See Use signed URLs. (you could also use a signed cookie, but that’s not useful to me right now).

You’ll need the private key to sign a URL. Note that CloudFront uses an entirely different key signing algorithm, protocol, and key than s3 signed urls! Shrine’s S3 docs have a good ruby example of using ruby AWS SDK Aws::CloudFront::UrlSigner, which will by default use a “canned” policy. (I’m not sure the default expiration you’ll get without specifing it in the call, as in that example.)

In terraform, the public key, trusted key group, and distribution settings might look like the following, using a “canned” policy that just has a simple expiration. Passing a custom expiration for 7 days in future might look something like this:

signed_url = signer.signed_url(
  "https://mydistro.cloudfront.net/content.jpg?response-content-disposition=etc",
  expires: Time.now.utc.to_i + 7 * 24 * 60 * 60,
)

Terraform for creating restricted cloudfront access as above:

resource "aws_cloudfront_public_key" "example-test2" {
  comment     = "public key used by our app for signing urls"
  encoded_key = file("public_key-example-test2.pem")
  name        = "example-test2"
}

resource "aws_cloudfront_key_group" "example-test2" {
  comment = "key group used by our app for signing urls"
  items   = [aws_cloudfront_public_key.example-test2.id]
  name    = "example-test2"
}

resource "aws_cloudfront_distribution" "example-test2" {
  # etc
  trusted_key_groups = [aws_cloudfront_key_group.example-test2.id]
}

(Warning, with terraform aws provider v5.53.0, to have terraform remove the trusted_key_groups and have the distro be public again, have to leave in trusted_key_groups = [], rather than remove the key entirely. Perhaps that’s part of how terraform works)

Leave a comment