routeToS3: Storing Messages in S3 via Lambda and the API Gateway

(If you want to cut to the chase, skip the first four paragraphs.)

Over the past year, Amazon Web Services (AWS) has previewed and released several new services that have the potential to drive the cost of IT down. This includes services like EFS and Aurora, but the service I was most excited about was Lambda. Lambda is a service that executes code on-demand so you don’t have to pay for an entire EC2 instance to sit around waiting for events. I recall at my previous position having a server that only existed to execute scheduled tasks. As supported languages expand, Lambda has the potential to completely replace such utility servers.

There are many ways to trigger Lambda functions, including S3 events, SNS messages and schedules. But, until recently, it wasn’t straightforward to trigger a Lambda event from outside your AWS environment. Enter Amazon’s fairly new API Gateway. The API Gateway is a super simple way to setup http endpoints that communicate with AWS resources, including Lambda functions. And, you don’t have to be a seasoned developer to use it. In fact, I had only recently started learning some standard concepts while playing around with the Slim Framework for PHP. While understanding RESTful APIs will help the API Gateway feel more natural, you can get started without knowing everything.

Let me back up a bit and explain why I came across the API Gateway in the first place. SendGrid has become our go-to service for sending email from various applications. I can’t say enough good about SendGrid, but it has some intentional limitations. One of those is that it will store no more than 500 events or 7 days (whichever comes first) at a time. You still get all your stats, but if you need to look up what happened to a specific email two weeks ago (or two minutes ago depending on your volume), you’re out of luck. Fortunately, SendGrid thought this through and made an event webhook available that will POST these events as a JSON object to any URL you give it. “Perfect!” I thought, “We can build something to store it in RDS.” But first, I thought it prudent to explore the Internet for pre-built solutions.

My research brought me to Keen.io, which was the only out-of-the-box solution I found that would readily accept and store SendGrid events. If you are here for the exact same solution that I was looking for, I strong recommend checking out Keen.io. The interface is a little slow, but the features and price are right. We would have gone this route in a heartbeat, but had some requirements that the terms of service could not satisfy. With that option gone, I was back to the drawing board. After brainstorming many times with my teammates, we finally came up with a simple solution: SendGrid would POST to an http endpoint via the API Gateway, which would in turn fire up a Lambda function, which would take the JSON event and write it to an S3 bucket. The reason for S3 instead of something more structured like RDS or SimpleDB is because we can use Splunk to ingest S3 contents. Your requirements may be different, so be sure to check out other storage options like those I have mentioned already.

SendGrid Logging Diagram

The initial plan. The API structure changed, but the flow of events is still accurate.

Now that we have introductions out of the way, let’s jump in and start building this thing. You will need to be familiar with creating Lambda functions and general S3 storage management. Note that I will borrow heavily from the API Gateway Getting Started guide and Lambda with S3 tutorial. Most of my testing took place on my personal AWS account and cost me $.02.

Create an S3 Bucket

The first thing you need to do is create your S3 bucket or folder that will store SendGrid events as files (you can also use an existing bucket). The simple GUI way is to open your AWS console and access the S3 dashboard. From there, click the Create Bucket button. Give your bucket a unique name, choose a region and click Create.

Create a Lambda Function

This won’t be an in-depth guide into creating Lambda functions, but we will cover what you need to know in order to get this up and running. At the time of writing, Lambda supports three languages: Java, Node.js, and Python. I will use Node.js in this guide.

The Code

Create a file called index.js and add the following contents:


//Modified from AWS example: http://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

var AWS = require('aws-sdk');

exports.handler = function(event, context) {
console.log("routeToS3 Lambda function invoked");

//Restrict this function so that not just anyone can invoke it.
var validToken = event.validToken;
//Check supplied token and kill the process if it is incorrect
var token = event.token;
if (token != validToken) {
console.log('routeToS3: The token supplied (' + token + ') is invalid. Aborting.');
context.fail('{ "result" : "fail", "reason" : "Invalid token provided" }');
} else {
uploadBody(event, context);
}
};

uploadBody = function(event, context) {

var bucket = event.bucket;
var app = event.app;
var timestamp = Date.now();
var key = app + '_' + timestamp;
var body = JSON.stringify(event.body);

var s3 = new AWS.S3();
var param = {Bucket: bucket, Key: key, Body: body};
console.log("routeToS3: Uploading body to S3 - " + bucket);
s3.upload(param, function(err, data) {
if (err) {
console.log(err, err.stack);// an error occurred, log to CloudWatch
context.fail('{ "result" : "fail", "reason" : "Unable to upload file to S3" }');
} else {
console.log('routeToS3: Body uploaded to S3 successfully');// successful response
context.succeed('{ "result" : "success" }');
}

});
};

This script will become your Lambda function and has a few key elements to take note of. First, it declares a variable named AWS with “require(‘aws-sdk’)”. This pulls in the aws-sdk Node.js module, which is required for writing to S3. With most Node.js modules, you will need to zip up the module files with your Lambda function. However, the AWS SDK is baked in, so you don’t need to worry about uploading any dependency files with the above function.

Next, the function declares a series of variables, starting with “validToken” and “token.” This might be where most seasoned API engineers roll their eyes at me. When possible, it makes sense to handle authentication at the API level and not inside your function. In fact, the API Gateway has this functionality built in. However, the supported method requires a change to the incoming requests header. That is not an option with SendGrid’s event webhook, which only gives you control over the URL, not the data. So, I had to cheat a little. We will cover this a little more when we setup the API, but for now it is sufficient to understand that token must match validToken for the function to work. Otherwise, the function will exit with an error.

Moving on to the other important variables:

  • bucket – The bucket or bucket/path combination (e.g.: my-bucket/SendGridEvents)
  • app – The name of the app these events are coming from; will be used as the resulting file’s prefix
  • timestamp – The current timestamp, which will be used to make the file name/key unique
  • key – constructed from app and timestamp to generate the file name

All of these variables will be passed in via the API Gateway as part of the event variable. That is why they all look something like “bucket = event.bucket”.

When this script is run, the very first thing Lambda will do is call the “exports.handler” function. In our case, exports.handler simply checks the token and, if it is correct, calls the “uploadBody” function. Otherwise, it exits the script and writes an error to CloudWatch via console.log.

Zip up index.js and use it to create a new Lambda function named “routeToS3.” You can do this all through the GUI, but I am more familiar with the CLI method. Not because I am a CLI snob, but because when Lambda first came out, only account admins could access the Lambda GUI.

Create your API

The AWS API Gateway enables people to build APIs without typing a line of code. It’s really fast to get something up and running. In fact, when all I meant to do was make sure my permissions were set correctly, I accidentally built the whole thing. I recommend checking out AWS’s guide, but you can also learn a bit by following along here.

To start…

  1. Log into your AWS console and open up the API Gateway service and click the Create API button.
  2. Name your API routeToS3 and click the next Create API button.
  3. With the root resource selected (it should be your only resource at this point), click Actions -> Create Resource.
  4. Name the resource “input” and set the path to “input” as well.
  5. Select /input from Resources menu on the left.
  6. Click Actions -> Create Method.
  7. In the dropdown that appears on the Resources menu, select POST and click the checkmark that appears to the right.
  8. For Integration Type, choose Lambda Function.
  9. Set your Lambda Region (choose the same region as your S3 bucket).
  10. Type or select the name of your Lambda function (routeToS3) in the Lambda Function field.
  11. Click Save
  12. When prompted to Add Permission to Lambda Function, click OK.

Congratulations! You just built an API in about two minutes. Now, in order to make sure the Lambda function gets all the parameters we mentioned earlier (body, bucket, app, etc.), we need to configure query strings, a mapping template, and a stage variable. We won’t be able to create a stage variable just yet, so that will come a little later.

With your POST method selected in the Resources menu, you should see a diagram with boxes titled Method Request, Integration Request, Method Response, and Integration Response:

POST Function

Click on Method Request to setup our query strings. From here, click to expand the URL Query String Parameters section. Any query string we add here will act as what some of us might refer to as GET parameters (e.g.: /?var1=a&var2=b&var3=etc). To setup the strings we will need, follow these steps:

  1. Click the Add query string link.
  2. Name the string token and click the checkmark to the right.
  3. Repeat for app and bucket.

Go back to the method execution overview by clicking POST in the Resources menu or <- Method Execution at the top.

Next, we will add a mapping template:

  1. Click Integration Request.
  2. Expand the Body Mapping Templates section.
  3. Click Add mapping template
  4. Type application/json (even though it is already filled in and doesn’t disappear when you click inside the text box) and click the checkmark to the right.
  5. Click the pencil icon next to Input Passthrough (it’s possible you could see “Mapping template” instead).
  6. Add the following JSON object and click the checkmark

{
"bucket": "$input.params('bucket')",
"app": "$input.params('app')",
"token": "$input.params('token')",
"validToken": "$stageVariables.validToken",
"body": $input.json('$')
}

This mapping will take the body of the request and our variables, and pass them along as part of the event object to Lambda. Note that all values, like “$input.params(‘bucket’)” are wrapped in double quotes, except for $input.json(‘$’). That is because we are actually calling a function on the body (‘$’), so wrapping it in quotes will break things.

Now, it’s time to deploy our API, which will make it accessible over HTTP. But, we haven’t tested it yet and that validToken variable is still undefined. Don’t worry, we haven’t forgotten those two critical pieces. But, we have to create a stage first, which is part of the deployment process.

  1. Click the Deploy API button at the top of the screen.
  2. On the screen that appears, choose [New Stage] for the Deployment Stage.
  3. Choose a name for the stage (Stages are like different environments, for example dev or prod).
  4. Enter a Deployment description and click Deploy.

On the screen that follows, you will see a tab labeled Stage Variables. Open this tab and click Add Stage Variable. Name the variable validToken and enter a token of your choosing for the Value. Use something strong.

Go back to the Settings tab and take a look at the options there. You may be interested in throttling your API, especially if this is a development stage. Remember that, although the API Gateway and Lambda are fairly cheap, too much traffic could rack up a bill. Since we aren’t using a client certificate to authenticate the calling app, we have to invoke the Lambda function to verify the provided token. Just something to keep in mind when considering throttling your API.

Now that I’ve distracted you with some prose, click Save Settings at the bottom of the page.

At the top of the screen, you will see an Invoke URL. This is the address to access the stage you just deployed into. All of our magic happens in the /input resource, so whatever that invoke URL is add “/input” to the end of it. For example, https://yudfhjky.execute-api.region.amazonaws.com/dev would become https://yudfhjky.execute-api.region.amazonaws.com/dev/input.

With our stage setup, we can now test the method.

  1. Go back to the routeToS3 API and click on the POST method in the Resources menu.
  2. Click Test.
  3. Enter a token, app, and a valid bucket/folder path (e.g.: my-bucket/routeToS3/SendGrid)
  4. Enter a value for validToken (this should be the same as token if you want the test to succeed).
  5. For Request Body, type something like {“message”: “success”}.
  6. Click Test.

You should see the CloudWatch logs that indicate the results of your test. If all is well, you will get a 200 status back and a corresponding file will appear the bucket you provided. The file contents should be {“message”: “success”} or whatever you set for the request body.

If things are working as expected, then it is time to head over to SendGrid and configure your event webhook:

  1. Log into SendGrid.
  2. Click Settings -> Mail Settings.
  3. Find Event Notification.
  4. Click the gray Off button to turn event notifications on.
  5. If needed, click edit to enter the HTTP POST URL.
  6. Enter the URL to your API endpoint, along with all necessary query strings (e.g.: https://yudfhjky.execute-api.region.amazonaws.com/dev/input?bucket=my-bucket/routeToS3/SendGrid&token=1234567890&app=SendGrid).
  7. Click the checkmark.
  8. Check all the events you want to log.
  9. Click the Test Your Integration button.
  10. Wait a couple minutes and then check your bucket to see if SendGrid’s test events arrived.

Tada! You should now be logging SendGrid events to an S3 bucket. Honestly, it’s much simpler than you might think based on the length of this post. Just keep the perspective that all of this is accomplished with three lightweight and low-cost services: the API Gateway to receive the event from SendGrid, Lambda to process that event and upload it to S3, and S3 to store the body of the SendGrid event. I hope you find this as helpful and straightforward as I have.

FTP to Google Drive

Let’s be clear that Google Drive does not provide FTP access to your content. But, that doesn’t mean it isn’t possible. I’ve been playing recently with a wireless security camera that can send images to an FTP server fairly easily. But, I didn’t have something reliable in the cloud handy and for the right price. Google Drive seemed like an excellent storage solution, but there was no way for the camera to utilize it… Directly.

At some point, I remembered I had a 2006 Mac Mini sitting around. The older versions of OSX make it really simple to get an FTP server up and running, which is the boat I found myself in:

  • Open System Preferences
  • Go to Sharing
  • Enable File Sharing
  • Modify permissions and paths to your liking
  • FTP will now be available on port 21

If you want to use a Mac for this exercise and you have a newer OS installed, you may need to follow these steps.

First half of your work? Done. The next step is pretty simple: Download and install the Google Drive app (tip: limit the folders Google Drive will sync if this will be a single-use computer/server). Google Drive content will be accessible at /Users/username/Google Drive. However, if like me, your camera or other client doesn’t play nicely with spaces. A symlink (or a shortcut in Windows) took care of this. I ran a command like this to create a space-free symlink:

ln -s ~/Google\ Drive/ ~/googleDrive

The backslash (“\”) escapes the space in a *nix environment. Now, anytime you write to /Users/username/googleDrive, you will actually be writing to your Google Drive folder. That means, if you use this path in your FTP configuration, you are essentially writing to Google Drive using FTP. Sneaky, sneaky. It worked beautifully for me. In fact, it worked a little too well. I didn’t quite nail the security camera’s sensitivity level and woke up to more than 10,400 images synced to Google Drive.

But, are there downsides? Of course. First and foremost is that, at least in my setup, it means one more device powered up. The Mini isn’t the worst thing to have going, but it also isn’t your only option if you want to be a little more green. You could setup something in AWS or Azure, use a Raspberry Pi, etc., but keep in mind there is no official Google Drive app for Linux yet. The second downside is that Google Drive only runs, and therefore only syncs, when you are logged in. That means, my Mac Mini is setup for automatic login, never goes to sleep, and starts up immediately after a power failure.

It’s not a perfect setup, but it worked in a pinch. My next task is setting up a secure proxy to the camera’s web interface. Another need the Mini can easily fill.

Crawling Commented Styling with Heritrix

This post isn’t something I can take credit for. The purpose is to make two potential solutions discoverable for someone like me, looking for an answer. Credit will be given where it is absolutely due.

As I’ve written about before, I inherited a Wayback/Heritrix server in my role and have had the pleasure of hacking my way through it on occasion. A recent challenge arose when I needed Heritrix to crawl an old Plone site. The Plone template placed html comments around all the styling tags to hide it from older browsers, which couldn’t understand the CSS. The result looks something like this:

<style type="text/css"><!--
/* - base.css - */
@media screen {
/* http://servername.domain.net/portal_css/base.css?original=1 */
/* */
/* */
.documentContent ul {
list-style-image: url(http://servername.domain.net/bullet.gif);
list-style-type: square;
margin: 0.5em 0 0 1.5em;
}
--></style>

Unfortunately, it seems Heritrix happily skips past any URLs within comments by default and does not follow them, regardless of your seeds and other configurations. Because, hey, they’re only comments, right? The end result is that it looks like the site was crawled successfully, but some resources were actually missed. In the above example, the Wayback version of the site was still pointing to http://servername.domain.net/bullet.gif for thelist-style-image, rather than http://wayback.domain.net:3366/wayback/20151002173414im_/http://servername.domain.net/bullet.gif. Therefore, it was not a complete archive of the site and its contents.

In my case, this was an internal site that I had total control over. However, try as I might, I could not figure out how to remove the comments from the old Plone template. Grepping for ‘<style type=”text/css”><!–‘ turned up ‘_SkeletonPage.py’. I tried modifying it and then running buildout to no avail. I am sure people more experienced with Plone could tell you where to change this in a heartbeat, but it’s beyond my knowledge with the application at this point. After coming up short on searches for solutions with Heritrix (thus, this post), I started looking for ways to remove the comment tags with something like Apache’s mod_substitute, since Plone was being reverse-proxied through Apache anyway.

Solution 1: Mod_Substitute/Mod_Filter

Eventually, I stumbled upon this configuration from Chris, regarding mod_substitute and mod_filter. Mod_filter needed to be used for mod_substitute to work properly because of the content being reverse-proxied. A simple modification of Chris’s configuration worked to remove the comment tags beautifully (using CentOS/Httpd for reference):

LoadModule substitute_module modules/mod_substitute.so
LoadModule filter_module modules/mod_filter.so
FilterDeclare replace
FilterProvider replace SUBSTITUTE Content-Type $text/html
FilterChain +replace
FilterTrace replace 1
Substitute "s/css\"><!--/css\">/n"
Substitute "s|--></style>|</style>|n"

(Note: I probably could have made this a little more efficient by using a single regex instead of two separate substitutes. But, meh. This was good enough.)

Chris recommended loading this into a new file: /etc/httpd/conf.d/replace.conf.

Solution 2: Hack Heritrix

While exploring my options with Apache, I also decided to reach out to the archive-crawler community on Yahoo! for help. A user identified as “eleklr” shared a patch that he used often for this kind of scenario. I think this is the better route to go, though I have not had an opportunity to try it out yet. Its biggest strength is that it doesn’t require you to have complete control over the site you are crawling, as is necessary for solution 1.

If you’ve found yourself in my position, rejoice in the fact that it’s not just you and there are solutions out there. Hopefully, one of the two listed above will help you on your way. Please share if you’ve discovered other solutions to this or similar problems.

The Single Best Improvement in Outlook 2016

Ok, maybe I haven’t been using Outlook 2016 long enough to say this is the single best improvement. In fact, I really hope it isn’t. But, I breathed a long-awaited “Finally” the first time I didn’t have to close an Office Reminder, switch to my calendar, and then double-click an event to see the details. Reminders have always taken up an excessive amount of screen space and, up until 2016, most of this space was wasted. You couldn’t click on any of that yellow expanse to view more information. It just sat there, taunting you. I recently referred to it as the second worst UI choice I had ever encountered. But, no more in Outlook 2016! Now, you can click anywhere in the blue expanse to open the event. The level of excitement I feel is on par with that of our CRM users who just learned SugarCRM finally allows resizable columns.

What’s more interesting is that after sharing my elation with a coworker, he pointed out that in 2011, you can actually open an event from the reminder pop-up. However, it is not in the great yellow expanse. Instead, it is the small calendar icon–which does not present itself as a button or anything remotely clickable–which holds the key. He knew this because two weeks before we upgraded to 2016, he got fed up and went searching. So, for a whole two weeks, he was able to open his events in Outlook 2011 without having to go through the already described process. Here is a visual representation of what Microsoft decided to do with that pop-up in 2011:

outlook-2011-reminder-arrows

Note how their little secret is safely tucked away, surrounded by everything you can’t click. Fast-forward five years and we find that Microsoft decided to put all that extra space to work:

outlook-2016-reminders-arrows

Oo. Ah!

Now, what is still a bit odd is that you have to double-click in the blue area, while the little calendar icon still gets special treatment. You only have to click once on the calendar icon to open the event. Maybe it is a nod to those “in the know,” which is probably a far greater number of people than my ego would care to hear.

It is funny and frustrating what impact a minor UI decision can have. However, I write this not with frustration, but with honest fascination. Of course, it might be a handy tip for someone using 2011 or 2016, but how much more interesting it is to consider the journey this small, but critical feature must have endured thus far.

Who Isn’t Taking Out the Trash? Use WinDirStat and PowerShell to Find Out.

Using WinDirStat to find unnecessary files on a hard drive is a pretty routine task. A common find is that someone’s recycling bin has large zip or executable files. WinDirStat is helpful for showing this to you, but it only reveals the user’s local SID, such as:

S-1-1-12-1234567890-123456789-123456789-123

It’s not terribly difficult to track down the associated profile using regedit. Still, clicking through a series of plus buttons in a GUI seems inefficient. Here is a simple method I used today to make this process a little quicker. Ok, so it took a bit longer than clicking through the first time, but it will be quicker for me next time:


((get-itemproperty "hklm:\Software\Microsoft\Windows NT\CurrentVersion\ProfileList\*") | where {$_.pschildname -like "S-1-1-12-1234567890-123456789-123456789-123"}).ProfileImagePath

This will return the ProfileImagePath value, which is the file path to the guilty profile. If you want to cut straight to the username, try this:


(((get-itemproperty "hklm:\Software\Microsoft\Windows NT\CurrentVersion\ProfileList\*") | where {$_.pschildname -like "S-1-1-12-1234567890-123456789-123456789-123"}).ProfileImagePath).split("\")[-1]

Taylor Swift and 9/23

Conspiracy theorists across the Internet have spouted fears about impending doom on 9/23/2015 (today, if you haven’t checked your phone yet) over the past 499 days. But, I believe they have all overlooked something crucial, which has been hiding in plain sight. What is the connection between Taylor Swift and 9/23? Here is a list of convincing evidence that clearly needs no explanation…

This morning, the scholarly new site JSTOR Daily published two articles. Both are concerned with Taylor Swift and Taylor Swift only.

jstor-daily-swift

Coincidence? Seems unlikely. A quick search of Google reveals these chilling results. Note the recurrence of 923 in varying formats:

k923-swift

swift-user-923

swift-page923

swift-auction

Sorry, sold already

everyone-talking-swift

Everyone talking about Taylor today? They will be.

You can search “taylor swift 923” on Google yourself and you will see that the results produce many o’s chock full of signs (probably more than 38 minutes and 57 seconds worth). And don’t forget her birthday. It’s not today, you say. Not even this month. But, let’s look closer.

December 13

We all know that “Decem” is the Latin word for 10 and there were originally ten months in the year; December being the tenth month. With the addition of July and August, all the latter months got displaced. So, her birthday could be viewed as 10/13. Still not close enough for you? What happens when you take subtract 1 from the month and add that to the ten column in day: 9/23!

The evidence is clear, but the question remains: what part will Taylor Swift play in today’s potentially life-altering events? If she uses her “guardian-angel-spirit-animal” (credit: Katya) powers for good, we may all live to still be talking about her tomorrow.

Update – The following video exposes even more of Taylor’s connection to this date: https://youtu.be/xFAKBrEC-lM?t=53. Shocking!

We’re All a Part

We see the passing of another birthday for a child no longer with us. Still alive, but growing up in another home: the first little girl we loved and raised for a year as foster parents. She has been gone for nearly three years, but it still hurts, still brings tears. Yet, there are reminders that, even though we may feel the loss the greatest, we are not alone in our pain. These children connected with our family and left holes in their hearts, too.

My father-in-law recently ran into another child we cared for over the course of eleven months. This boy had come to know him as grandpa and called us mommy and daddy. He saw my father-in-law pass by in a store and called out “Hi!” My father-in-law returned the greeting, but didn’t recognize our son. It wasn’t until he was nearly home that he realized who it had been. He started to cry as he relayed the story to us. I know how much it hurt that he was so close and missed an opportunity to tarry and maybe even get a hug. He apologized for the mistake. There was nothing to apologize to us for. How can you apologize for loving a child too much?

Our daughter had as strong of an impact on our family. So many nights we stayed up praying for her. I wonder how many hours the whole family lost to prayer. At my stepfather’s reading chair, there sits a small round end table with her photo. I couldn’t take my eyes off it the first time I saw it there; after she had left. Three years later, it’s still there. The same little smile and big round eyes. Sometimes, I fear the day when the photo will no longer be relevant and it will be replaced with something else. Such a thought may only prove how little I realize she is loved by my family. I forget, we’re all a part in this.

A few times each year, a new child enters into our home. Most of them take a piece of our hearts–all our hearts–when they leave. And somehow, our family is always the same. They never keep our kids at arm’s length for fear of being hurt once more. Instead, they greet our kids with a smile and say, “Call us grandma and grandpa!”