Integrating with External APIs
In February, we released our first set of API driven feeds, including Flickr, YouTube, and Reddit. Every week, Pulse features a new set of feeds to help our users discover new content; RSS feeds served this purpose simply and effectively, since many news sites, blogs, and social sites offer high-quality RSS feeds. However, we realized that there are also many mediocre feeds, in terms of content served, image size/quality, and device compatibility (i.e. article formatting across phones, tablets and Pulse.me).
Fortunately, many of these services offer external APIs for developers, and so we started playing around these APIs to create custom feeds for our users. This post discusses some of the design choices we made in order to bring API content to Pulse.
Integrating with the Backend
As outlined in a previous post, our custom feeds are served from our backend, which is built on Google AppEngine (GAE). We were able to quickly prototype our API feeds by utilizing the datastore and deferred features. By integrating this component into GAE, we are able to serve the API feeds in the same manner as other RSS feeds, requiring no modifications to our client-side code.
We use GAE Datastore to store individual articles crafted from the APIs, and GAE Deferred/TaskQueue to poll the APIs for updated content. The article content is created in HTML to be easily displayed in a WebView on the client (iOS and Android) .
Here are some issues that we ran into while integrating the APIs into our backend, and tips on how to deal with them:
1) No APIs are created equal
Since there’s no standard for how APIs are served, you really need to have a separate script for each of them, as they all have different endpoints for different functions. Some return JSON, and some return XML. It’s tedious, but it’s only a one time investment to learn each API. Once you have your script programmed to call the right functions and use the data in the correct way, you don’t have to look at the API ever again…
2) Ever again? APIs change!
The golden rule of using an API is that APIs change! You are always at the mercy of the API providers. Just because you have access to a piece of content now, doesn’t mean you will have access to it in the future. The endpoint for it may change, or it may just be removed entirely. As the developer you have to create code that doesn’t grind to a halt when it doesn’t find the data where it expects it to be. Protip: use try/catch when making API calls so your script has a chance to finish despite missing some calls.
3) Rate Limiting on GAE
From the start, we were very careful about following each API’s rate limit quota, specified by X requests a day or Y requests per second. Using Deferred/Taskqueue functions on GAE, these rate limits can be faithfully followed. However, within the first week, we started getting blocked by YouTube and Digg for surpassing our limit. After digging deeper, we found the culprit; in the GAE implementation, Google sends out URL requests for multiple apps from the same physical server. Therefore, if the API provider only counts the number of requests coming from each IP address as a means of checking the rate limit, this GAE server will quickly hit the limit if many of its apps perform API requests to that same service. You can circumvent this problem in two ways. A short-term solution is to set up a server outside of GAE (for example, on Amazon’s EC2 cluster) to bounce requests for your application. The long-term solution is to ask the API provider to check the rate limit based on api key, not IP address.
As we push out new feeds every week, we are always looking for better ways to get content to our users. Whether it be through improving on existing RSS feeds or creating new feeds through APIs, we want to provide our users with an experience that they can only get through Pulse.
Ever used another service’s external API? Liked it? Hated it? Comments and suggestions appreciated!