Line-by-line Speed Analysis for iOS Apps

[This is a guest post written by Tyler Neylon. Tyler is an iOS programmer and founder of Bynomial. He regularly blogs about iOS coding tips at the Bynomial blog.]

The iOS SDK comes bundled with tools in the Instruments app for finding speed bottlenecks in your code. These tools work by regularly inspecting the call stack of your app, and then providing an aggregate view of these call stacks. This is a good method for doing top-down speed analysis, where you want to find the methods eating up the most CPU time on a very large scale.

This post introduces some easy-to-use tools for speed analysis from a bottom-up perspective. Let’s say you’re writing or improving a particular time-critical function. For example, you may be writing a function that reacts to a user’s tap and needs to quickly compute and display results from that tap. In this case, the time profiler in Instruments is not ideal, since it’s hard to isolate a useful amount of data around the exact call stacks and time interval you care about. Instead, you can use CodeTimestamps, the open source tool we’re introducing here.

How to use CodeTimestamps

CodeTimestamps contains two macro sets. There is the simple LogTimestamp macro, which is useful for quick speed checks, and then the LogTimestamp{Start,Mid,End}Chunk macros, which are better for in-depth analysis, such as comparing many runs of multiple functions or loop bodies.

Simple case

Let’s say you want a detailed picture of how much time certain lines in your code take. You can add calls to LogTimestamp like this:

- (void)myMethod {
  LogTimestamp;
  // do some CPU-heavy stuff
  LogTimestamp;  // Line 44.
}

When this code executes, every line with LogTimestamp after the first one will produce a debug log line like this:

* -[MyClass myMethod:]:  44 -    864719 nsec since last timestamp

This tells you the class and method name, the line number, and the number of nanoseconds passed since the previous timestamp. In other words, you get to know exactly how long myMethod takes, down to the nanosecond. It can be useful to throw in a bunch of LogTimestamp lines to get a line-by-line breakdown of what parts of your method are the slowest, so you know where to focus your speed-up efforts.

Advanced case

The above technique is great for looking at a small number of specific code segments, and when no data aggregation is needed. But let’s say you want to find out which iteration of a function out of 10,000 calls was the slowest, or perhaps you will be using many timestamps within a function, and would like to quickly locate the slowest piece. Time to bust out the big guns.

You use the LogTimestamp{Start,Mid,End}Chunk macros similarly to LogTimestamp, except that, within a function, they are expected to be executed in order – Start – Mid, Mid, Mid, (etc.) – End. You can have as many midpoints as you want, including zero. The data you get from these macros is best explained by an simple example. Suppose we want to know which line of the function MyFunction is slowest. Here’s our code:

void SlowCall() {
  sleep(1);  // pause for 1 sec
}

void FastCall() {
  usleep(1e5);  // pause for 0.1 sec
}

void MyFunction() {
  LogTimestampStartChunk;
  FastCall();
  LogTimestampMidChunk;
  SlowCall();
  LogTimestampMidChunk;
  FastCall();
  LogTimestampEndChunk;
}

And here’s the resulting debug log lines when we run the code with one call to MyFunction:

Temp20[36546:207] ==== Start chunk timestamp data (from "LogTimestamp{Start,Mid,End}Chunk") ====
Temp20[36546:207] --- Data for thread 0x4b0dbb0 ---
Temp20[36546:207] + Chunk = MyFunction:28 - MyFunction:34, time = 1.2003s
Temp20[36546:207]     83% in MyFunction:30 - MyFunction:32
Temp20[36546:207]      8% in MyFunction:32 - MyFunction:34
Temp20[36546:207]      8% in MyFunction:28 - MyFunction:30
Temp20[36546:207] ++ Chunk = MyFunction:28 - MyFunction:34, avg time = 1200328449 nsec
Temp20[36546:207] ==== Slowest chunks so far ====
Temp20[36546:207] # Chunk = MyFunction:28 - MyFunction:34, time = 1.2003s
Temp20[36546:207] ==== End timestamp data ====

If we add these chunk macros to another function called MyFunction2, and execute both MyFunction and MyFunction2 about 1,000 times each, then we can quickly find out which one is slower by looking at the “Slowest chunks so far” aggregated data, which is a summary based on all previous calls to all chunks.

The debug output is not immediately supplied — it is deferred by up to 10 seconds. This is because NSLog calls can be notoriously slow. If data were logged immediately, all the timing information would be hijacked by NSLog’s inexplicable processor waylaying. Why, NSLog? But alas, the best we can do is work around NSLog’s lackadaisical lollygagging ways by gathering the timing information in memory, and reporting it periodically outside of the timestamp chunks.

This output is not as beautiful as the graphs in Instruments, but I personally find this approach much easier to use for the purpose of speeding up specific sections of my code.

How to get the code

The files CodeTimestamps.{h,m} are freely available as open source code under the Apache 2 license. They can be downloaded as part of the moriarty library. Click on “Downloads” (near the upper-right corner on the github page), unzip the file you download, and copy the two files (CodeTimestamps.{h,m}) into your Xcode project, making sure Xcode knows about these files (right-click on Classes, then Add > Existing Files… to add them within Xcode). From there, just #import "CodeTimestamps.h" in whichever file you want to add timestamps to, and use them as described above.

I wrote this code while working with Ankit to speed up Pulse on the iPad. I’ve always been impressed by the Pulse team’s steadfast dedication to setting new standards in mobile app user experience, and building custom performance tools like this is just one of many techniques used. I also think it’s awesome that Pulse chose to offer this code as open source. Keep up the great work!

5 Tips for Honeycomb Design

The explosion of Android-powered devices has been a boon and a burden to Android app makers. An application’s potential reach has never been so vast, yet the diversity of hardware can throw designers/developers for a loop. Today, we’ll talk about how to design an application that takes advantage of all the slick new Honeycomb tablets (present and future) so your app can shine in all sizes!

 

1. Modular components make full use of screen real estate

Compared to phones, the amount of space available on tablets can be confounding – how on earth can you fill up all those pixels?! One thing you absolutely should not do is blindly scale up a design meant for a phone. It’s hard to envision something looking sillier than a 10” list view.

Please don't do this

Luckily there are several ways to tackle this issue, one of them is the concept of Fragments introduced in Honeycomb. Fragments are UI components that are meant to be modular and dynamic. The clearest example of this is in the official GMail app. On the phone, the user can view a list of all emails, which takes up the entire screen. Selecting an email will proceed to another screen showing the email conversation itself.

With the fragments, these list and email can appear side-by-side, allowing a user can view an email while gaining the additional context of their entire inbox.

 

2. Use dialogs

Sometimes it simply doesn’t make sense for a view to take up an entire screen. For situations like these, dialogs are quite handy. In Pulse, we enable users to connect with social networks when they wish to share stories – all of which require a login screen. On the phones, login utilizes the full window, but on a tablet such a view would be visually offensive. Dialogs provide an easy way to show these smaller components in a more natural manner.

 

3. Utilize 9 patches creatively

Unpredictable screen dimensions create some interesting challenges for the visual designer. Being pixel-perfect is possible, but not without some ingenuity. 9-patch image files allow you to determine how images get stretched when resized, which happens quite often when using relative layouts.

In the tutorial for saving stories with Pulse.Me, we use an arrow that points from the star button to the “.me” button. With the magic of 9-patches this arrow behaves correctly on all screen sizes and orientations.

 

4. Design for both landscape and portrait

Never assume a user will only use the app in one orientation. People have wildly different (and very strong) opinions on proper tablet usage and your app must accommodate both. In landscape mode, the wide screen makes horizontal layouts more natural. Avoid stretching things out to span the entire width unless absolutely necessary, otherwise you’ll most likely end up with a lot of blank space.

 

5. Remember the Honeycomb UI conventions

There are several UI patterns for Android tablet apps that users have come to expect from all applications. The first is the lack of a menu button. While users may have overlooked menu buttons on phones, they’re completely MIA in Honeycomb. In their place, the concept of an Action Bar governs what the user expects to be possible on a particular screen.

Even if you don’t explicitly use the Action Bar classes in the Android SDK, rolling your own is a good idea to fit the expectations of Honeycomb users. In general, the top left is reserved for going back to a previous screen, with the rest of the bar containing other actions. By following this pattern, a user will never feel lost or confused when using your application.

Now that you’re newly equipped with these tips, start creating compelling, beautiful apps!

Integrating with External APIs

In February, we released our first set of API driven feeds, including Flickr, YouTube, and Reddit. Every week, Pulse features a new set of feeds to help our users discover new content; RSS feeds served this purpose simply and effectively, since many news sites, blogs, and social sites offer high-quality RSS feeds. However, we realized that there are also many mediocre feeds, in terms of content served, image size/quality, and device compatibility (i.e. article formatting across phones, tablets and Pulse.me).

Fortunately, many of these services offer external APIs for developers, and so we started playing around these APIs to create custom feeds for our users. This post discusses some of the design choices we made in order to bring API content to Pulse.

Integrating with the Backend

As outlined in a previous post, our custom feeds are served from our backend, which is built on Google AppEngine (GAE). We were able to quickly prototype our API feeds by utilizing the datastore and deferred features. By integrating this component into GAE, we are able to serve the API feeds in the same manner as other RSS feeds, requiring no modifications to our client-side code.

We use GAE Datastore to store individual articles crafted from the APIs, and GAE Deferred/TaskQueue to poll the APIs for updated content. The article content is created in HTML to be easily displayed in a WebView on the client (iOS and Android) .

API Issues

Here are some issues that we ran into while integrating the APIs into our backend, and tips on how to deal with them:

1) No APIs are created equal
Since there’s no standard for how APIs are served, you really need to have a separate script for each of them, as they all have different endpoints for different functions. Some return JSON, and some return XML. It’s tedious, but it’s only a one time investment to learn each API. Once you have your script programmed to call the right functions and use the data in the correct way, you don’t have to look at the API ever again…

2) Ever again? APIs change!
The golden rule of using an API is that APIs change! You are always at the mercy of the API providers. Just because you have access to a piece of content now, doesn’t mean you will have access to it in the future. The endpoint for it may change, or it may just be removed entirely. As the developer you have to create code that doesn’t grind to a halt when it doesn’t find the data where it expects it to be. Protip: use try/catch when making API calls so your script has a chance to finish despite missing some calls.

3) Rate Limiting on GAE
From the start, we were very careful about following each API’s rate limit quota, specified by X requests a day or Y requests per second. Using Deferred/Taskqueue functions on GAE, these rate limits can be faithfully followed. However, within the first week, we started getting blocked by YouTube and Digg for surpassing our limit. After digging deeper, we found the culprit; in the GAE implementation, Google sends out URL requests for multiple apps from the same physical server. Therefore, if the API provider only counts the number of requests coming from each IP address as a means of checking the rate limit, this GAE server will quickly hit the limit if many of its apps perform API requests to that same service. You can circumvent this problem in two ways. A short-term solution is to set up a server outside of GAE (for example, on Amazon’s EC2 cluster) to bounce requests for your application. The long-term solution is to ask the API provider to check the rate limit based on api key, not IP address.

Recap

As we push out new feeds every week, we are always looking for better ways to get content to our users. Whether it be through improving on existing RSS feeds or creating new feeds through APIs, we want to provide our users with an experience that they can only get through Pulse.

Ever used another service’s external API? Liked it? Hated it? Comments and suggestions appreciated!

Concurrent Downloads using NSOperationQueues

Most iOS applications have to download data from the internet. Being a mobile developer, sparse resources, limited bandwidth and user-responsiveness needs make this a very interesting problem to tackle. If you are lucky, you might just have one url that you ping to download all the data for your application. But usually, one has multiple urls to download from simultaneously. Since Pulse aggregates your news from multiple sources, this is a particularly important part of the application. After a long search for an elegant and efficient solution for this problem, we discovered that NSOperations and NSOperationQueues make managing simultaneous downloads very easy. This post lists some of our learnings and gives code samples on how to implement it in your own app.

Why NSOperationQueues?

Ideally, you never want to block the main thread that handles user input. Hence, downloading data has to happen in a background thread. We have found that NSURLConnection is an excellent class to download data asynchronously. But, in order to maintain multiple connections simultaneously, one usually has to care about 3 features: (1) Throttling the number of simultaneous downloads (2) Prioritizing connections over one another (3) Easy cancellation and cleanup of such connections. These functions require a lot of bookkeeping that can get quite tedious. NSOperationQueues are extremely helpful in such situations.

An NSOperationQueue is essentially a pool of threads each of which runs a task described by NSOperation objects. It is extremely easy to wrap an asynchronous NSURLConnection in an NSOperation, as we shall see in the next section. Each NSOperation object can be given a priority and added to the queue.

[myOperation setQueuePriority:NSOperationQueuePriorityVeryHigh];
[operationQueue addOperation:myOperation];

An NSOperationQueue object allows you to specify the number of threads is should use and you can easily kill all operations.

[operationQueue setMaxConcurrentOperationCount:3];
[operationQueue cancelAllOperations];

Based on system resources and operation priorities, an NSOperationQueue runs all its operations till they finish. You can pause and resume it at any time, giving you complete control over running tasks in the background with minimal lines of code.

[operationQueue setSuspended:YES];
for (operation in newOperations) {
  [operationQueue addOperation:operation];
}
[operationQueue setSuspended:NO];

Asynchronous downloads using NSOperations

To wrap an NSURLConnection object in an NSOperation, we need to create an NSOperation subclass. When the operation starts, we can initiate an NSURLConnection and implement its delegate methods to collect downloaded data. An NSOperation uses 3 key variables to define its state: isExecuting, isFinished and isConcurrent. Since we want our downloads to run in parallel, we set isConcurrent to YES. Both isExecuting and isFinished need to be tracked in a key-value coding compliant manner. The operation is only considered finished when the isFinished property changes to YES. Check out DownloadURLOperation.h and DownloadURLOperation.m to learn more on how this is done.

New Subclass in Action

Now that we have our DownloadURLOperation subclass in place, we add its objects to operation queues and start downloading data. If an operation queue is empty, an operation starts executing as soon as it is added to the queue.

downloadOperation_ = [[DownloadURLOperation alloc] initWithURL:url];
// Add an observer to get notified when the download finishes
[downloadOperation_ addObserver:self forKeyPath:@"isFinished" options:NSKeyValueObservingOptionNew context:NULL];
[queue addOperation:downloadOperation_];

To process the downloaded data after an operation is finished, we need to observe KVO notifications.

- (void)observeValueForKeyPath:(NSString *)keyPath ofObject:(id)operation change:(NSDictionary *)change context:(void *)context {
  if ([operation isEqual:downloadOperation_]) {
    [downloadOperation_ removeObserver:self forKeyPath:@"isFinished"];
    [downloadOperation_ release];
    downloadOperation_ = nil;
    NSData *data = [downloadOperation_ data];
    NSError * error = [downloadOperation_ error];
    if (error != nil) {
      // handle error
    } else {
      // process data
    }
  }
}

Whenever necessary, remember to cancel the operation.

- (void)dealloc {
  if (downloadOperation_ != nil) {
    [downloadOperation_ removeObserver:self forKeyPath:@"isFinished"];
    [downloadOperation_ cancel];
    [downloadOperation_ release];
    downloadOperation_ = nil;
  }
  [super dealloc];
}

Benefits

Here is a quick summary of a some important benefits of wrapping NSURLConnections in NSOperations and using NSOperationQueues to manage simultaneous downloads.

  1. Throttling: NSOperationQueue allows you to set the maximum concurrent operations allowed to run. Thus, we can fine tune this number based on device performance and network constraints.
  2. Chained Downloads: If you need multiple downloads to happen one after the other in a sequence, using NSOperations allows you to execute chained downloads easily in a single class.
  3. Code Cleanliness: This pattern allows you to encapsulate downloading and processing data in the same class. For example: YAJLParserOperation.h and YAJLParserOperation.m parseJSON data on the fly as it is downloaded.

Although asynchronous NSURLConnection downloads happen on a background thread, its delegates are always called in the main thread in the NSDefaultRunLoopMode. This means that the delegates would never be called when the user is touching the interface (say while scrolling a tableview or tapping buttons). Even if a connection runs inside an NSOperation, this important property is maintained since we ensure that the connection starts on the main thread.

Here is some sample code that you can play around with to learn more. Please leave comments for any suggestions or improvements.