Google App Engine’s datastore meets most of our backend storage needs, but we sometimes find ourselves limited by the maximum entity size of one megabyte. One option for storing larger files is to build a separate system on top of Amazon S3. A downside of this approach, however, is that we cannot take advantage of Google’s edge cache, which acts as a free CDN.

A second option is the new Google Cloud Storage service. Google Cloud Storage is the unofficial successor to the Google App Engine Blobstore, and both services are built on the same underlying infrastructure. Yet unlike the Blobstore, which is bundled with App Engine, Google Cloud Storage is a standalone service for storing and managing data. As such, Cloud Storage is Google’s attempt to roll out an Infrastructure as a Service (IaaS) offering that can compete with Amazon S3.

Getting Started

In order to use Google Cloud Storage with App Engine, the first step is to grant your application access to your storage bucket. The documentation instructs you to add the application’s service account name (application-id@appspot.gserviceaccount.com) as a team member to your Google APIs console project.

However, since we created our project with a Google Apps account, this takes bit more effort.  Only users from our domain (xxx@yourdomain.com) could be added to the team via the console. The solution is to use the GSUtil command line tool to edit the storage bucket’s Access Control List (ACL).

Run the following command to retrieve your bucket’s current ACL: gsutil getacl gs://bucketname > acl.txt. Then add an entry that looks like this:

<Entry>
<Scope type="UserByEmail">
<EmailAddress>application-id@appspot.gserviceaccount.com</EmailAddress>
<Name>Service Account</Name>
</Scope>
<Permission>FULL_CONTROL</Permission>
</Entry>

Finally, run this command to set the new ACL: gsutil setacl acl.txt gs://bucketname.

Storing Data

Google provides an experimental API to integrate Cloud Storage with App Engine. This API allows for reading and writing of files to a storage bucket. While testing, I had already preloaded some test files into our bucket using the (barebones, but functional) Cloud Storage Manager web application. I could also have used the GSUtil tool.

Moving forward, we wanted to start loading files programmatically from within App Engine. The API documentation clearly explains how to create, write to, save, and read from Cloud Storage objects. Note that the function provided by the API to create a Google Cloud Storage object —files.gs.create() — takes a number of useful parameters. For instance, this is where you can specify the ACL and Cache-Control header for the object.

The documentation does not address the case in which the object you wish to save is a user upload. Storing uploaded files in a bucket can be accomplished using the Blobstore, as suggested by this StackOverflow answer. The blobstore_helper module is useful for adapting this code for Django.  Simply replace self.get_uploads('file') with blobstore_helper.get_uploads(request, 'file') in order to retrieve the uploaded files.

Serving Content

The Cloud Storage API does not offer a way to serve files directly from a storage bucket. Instead, you can use the Blobstore API to create a url that points at your file.

First, generate a blob key for the Cloud Storage object using the Blobstore API’s create_gs_key() function. Then serve the object as you would a traditional blobstore object. The example given for the Blobstore Python API assumes use of Google’s webapp framework, which provides helper functions (such as self.send_blob()) that obscure the underlying implementation. This makes it a little tricky to understand how to port the code to a different framework, but once again the blobstore_helper module offers some insight. The module defines its own send_blob function, in which the key line of code is response[blobstore.BLOB_KEY_HEADER] = str(blob_key). Essentially, if you put a special header in the response containing the blob key, then App Engine will automatically fill the body of the response with the content of the blob.

To properly serve the blob, it is also necessary to set a correct Content-Type header for the response. Although the Cloud Storage REST API does support retrieving an object’s metadata, it seems that the API for App Engine does not. Currently, we rely on Python’s mimetypes module, which can guess content type from a filename: response['Content-Type'] = mimetypes.guess_type(filename)[0].

An alternative approach to serving files from Cloud Storage, which applies to images only, is to use App Engine’s Image API. As of App Engine version 1.7.0, it is possible to use the get_serving_url() function with Cloud Storage objects. Simply generate the blob key as before, and plug into this function to generate a url for the image. One benefit of using this approach is that the serving url supports cropping and resizing on the fly by supplying optional parameters.

We will continue to investigate the best practices for using Google Cloud Storage with App Engine as a service for storing and serving large files. For others who might be interested, there was a helpful session at Google IO, entitled Storing Your Application’s Data in the Google Cloud, that covers the basics of this new service. Of course, there are other options to consider as well, such as the Blobstore or Amazon S3. It remains to be seen which service will best meet our needs, but we’re glad that there is now a strong option on the Google side.