AWS S3 and Cloudfront Adventures

Amazon’s S3 bucket Website and Cloudfront

Amazon Web Service’s (AWS) S3 storage is a service to store files and provides both distribution and versioning. The S3 service is also touted as a service to prop up a static website on and serve it to the Internet using HTTPS — a common use-case that is served up as easy and trouble-free — as promised by many AWS evangelists on their blog postings.

To be frank, it is easy, once all the problems are solved. But I had a different starting state than the blank storage object and unassigned domain name. Notwithstanding I had zero-experience with S3 storage at this point.

I had previously setup an EC2 instance on AWS (Ok, warning, if you do not know what an EC2 instance is and how awesome they are, go here to find out) and had a domain name claimed and paid for with an existing DNS record set. Since the domain name was maintained through AWS, I figured it would be simple to switch it over to a static webpage served through an S3 bucket and deployed to the Internet through Cloudfront.

Let’s go through what it took to setup an S3 bucket up. This is not a step-by-step tutorial, but rather a brief description of some the problems I encountered and how I solved them.

Create a bucket in S3

First, you need to create a bucket in S3. Follow instructions, simple. Done, next step. I left it empty until I was ready to deploy my site documents. We need to do this step first that we have on origin object in S3 that Cloudfront can connect to.

Setup of HTTPS in Cloudfront

The use of HTTPS in Cloudfront does present some hurdles that need to be addressed to be successful. The most important point to be aware of throughout the process is to be sure that the alternate domain name that appears on the underlying TLS X.509 certificate is entered correctly in the Cloudfront alternate domain field. It took a couple tries to get the two to align correctly. Once everything agrees, it should work. I set Cloudfront to redirect HTTP to HTTPS.

DNS entry for Cloudfront

Cloudfront serves HTTPS webpages through its own domain url. Cloudfront assigns a hashed unique subdomain that appears as <UID>.cloudfront.net. Since the website is not a fixed IPv4/IPv6 address, an alias needs to be used to redirect traffic from a domain name to the newly generated host.

Using AWS Route 53 service, I modified the original DNS recordset I had setup for my established northerntechie.org website. My original DNS recordset had the A field previously pointing to a fixed IPv4 address that hosted the EC2 instance. There were also NS and SOA fields — we want to keep those fields as they do not need to be changed and are necessary to maintain the northerntechie.org domain name. I deleted the A and CNAME records as I wanted the point the domain name to the new Cloudfront website.

The A field is necessary to establish the apex domain. There are two options for this field, fixed IP or alias. I chose the alias option and entered the <UID>.cloudfront.net url that points to static website. I also added a www.northerntechie.org CNAME field to redirect www requests to the apex domain.

How do I deploy?

Deployment is as easy as the traditional ftp approach — no really. There are many different ways to deploy the website from development to production and a lot of them involve some sort of CI/CD pipeline. For my personal site, I prefer to use a more manual method so I have instant feedback that my site is deployed as intended. The danger with the CI/CD workflow is lack of attention given to the Image of build_fail notice on a repository. It can be easily missed during the daily grind, especially due to the asynchronous jobs of a CI/CD pipeline. It is doubly dangerous if the build and deployment is handled by someone else and a developer simply ignores errors until someone holds them responsible.

This is a personal blog and I only answer to myself, so I go with the most automated manual approach I can muster. The AWS CLI is your friend. The AWS CLI installation instructions are here. I am using version 2. You should also use a separate user that is limited to the minimum permission required to deploy the website. I created a web admin user in the AWS IAM service and set a policy for the S3 and Cloudfront services.

You could enter the three separate commands on the command line, this will keep you afresh of the subtleties on the CLI, but writing a script is better, and it documents your workflow. Here is my script,

#!/bin/bash
rm -r public/*
hugo && aws s3 sync public/ s3://<your_bucket_name>/ --acl public-read --delete && \
aws cloudfront create-invalidation --distribution-id <your_cloudfront_distribution_id> \
--paths '/*'

The aws s3 sync command is similar to the unix rsync and will update files that have a newer timestamp. It isn’t a foolproof method of modifying files, but if you are comfortable with rsync, it will suffice. A periodic entire wipe of the s3 bucket(website) may be necessary before moving files unto the bucket.

The aws cloudfront create-invalidation command clears the cached versions within the cloudfront CDN and reloads the new pages. As with all AWS features, there are costs involved with almost all cloud activities. But with a personal blog, the number of transactions should be below the free tier threshold. Hopefully my blog posts are not that popular and the number of requests stay within the free tier limit.

In order for the aws s3 commands to work, your aws user credentials need to setup to allow for the permissions. The policy permissions required to allow the script to work are,

  "s3:PutObject",
  "s3:PutObjectAcl",
  "s3:DeleteObject",
  "cloudfront:CreateInvalidation"

Where do all these settings reside, well they are in the AWS IAM service under the policy that you set up for the web admin user. I will leave that for another blog post.

Why do my pages stay as previous versions?

There is caching taking place within the Cloudfront service that provides high-performance while serving your webpages — that is a good thing and generates a low-latency website. Unfortunately, it also confuses the heck out of someone because the website on the Internet externally does not reflect the file you just changed within the S3 bucket. Do not worry, it is simply a matter of invalidating the files in the Cloudfront cache which in turn will force Cloudfront to retrieve the fresh new version on the S3 bucket, as explained here and here.

Cloudfront caches can be invalidated and forced to use the most recent versions using the aws cloudfront command as follows,

$ aws cloudfront create-invalidation --distribution-id <your_cloudfront_distribution_id>

I have my AWS CLI setup for json reponse. After running the command, a json reponse showed the successful invalidation.

Conclusion

There is much more to deploying an AWS S3 bucket static website via the Cloudfront service and I have only touched the surface. One point of interest is to look to the documentation and do not rely solely on the blog posts (including this one).