<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pulse News Engineering Blog</title>
	<atom:link href="http://eng.pulse.me/feed/" rel="self" type="application/rss+xml" />
	<link>http://eng.pulse.me</link>
	<description>Engineering Blog</description>
	<lastBuildDate>Fri, 01 Mar 2013 02:18:46 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Recap: Tech &amp; Templeton</title>
		<link>http://eng.pulse.me/recap-tech-templeton/</link>
		<comments>http://eng.pulse.me/recap-tech-templeton/#comments</comments>
		<pubDate>Fri, 01 Mar 2013 02:18:46 +0000</pubDate>
		<dc:creator>Katie</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=188</guid>
		<description><![CDATA[Yesterday&#8217;s Tech Talk was a special event, focusing on collaboration between designers and engineers. After a tasting from Al Capone favorite Templeton Rye, Head of Product Design Stuart Norrie and Lead Android Developer Albert Lai presented tips, tricks, and common misconceptions about working with designers and engineers. From Photoshop to Machine Learning, these two blurred [...]]]></description>
				<content:encoded><![CDATA[<p><img src="http://eng.pulse.me/wp-content/uploads/2013/03/techtalktalk-300x300.jpg" alt="techtalktalk" width="260" height="260" class="alignright size-medium wp-image-189" /><br />
Yesterday&#8217;s Tech Talk was a special event, focusing on collaboration<br />
 between designers and engineers. After a tasting from Al Capone<br />
favorite <a href="http://www.templetonrye.com" target="_blank">Templeton Rye</a>, <strong>Head of Product Design Stuart Norrie</strong> and<br />
<strong>Lead Android Developer Albert Lai</strong> presented tips, tricks, and common misconceptions about working with designers and engineers. From Photoshop to Machine Learning, these two blurred the boundaries and examined the benefits of multifaceted product development.</p>
<p>What would you like to see from our future tech talks? We&#8217;d love to hear your feedback, speaker suggestions, and more. Talk to us on <a href="http://www.twitter.com/pulsepad" target="_blank">Twitter</a>, <a href="http://www.facebook.com/pulsepad" target="_blank">Facebook</a>, or leave a comment here on the Engineering Blog.</p>
<p>Check out our presentation slides below, and stay tuned for news about<br />
our next Pulse event!</p>
<p><p>
<strong>Designing Engineers</strong><br />
<em>By Stuart Norrie</em></p>
<p><iframe src="http://www.slideshare.net/slideshow/embed_code/16847474" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen webkitallowfullscreen mozallowfullscreen> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="http://www.slideshare.net/pulsepad/designing-engineers" title="Designing Engineers" target="_blank">Designing Engineers</a> </strong> from <strong><a href="http://www.slideshare.net/pulsepad" target="_blank">Pulse</a></strong> </div>
<p><em>How far do designers and developers need to go into each others’ worlds? What things can each camp learn about the other to make the transition from sketches and mocks to working builds as smooth as possible? This talk will outline what works and what doesn’t work from a designer perspective.</em></p>
<p><strong>Engineering Designers</strong><br />
<em>By Albert Lai</em></p>
<p><iframe src="http://www.slideshare.net/slideshow/embed_code/16847479" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen webkitallowfullscreen mozallowfullscreen> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="http://www.slideshare.net/pulsepad/engineering-designers" title="Engineering Designers" target="_blank">Engineering Designers</a> </strong> from <strong><a href="http://www.slideshare.net/pulsepad" target="_blank">Pulse</a></strong> </div>
<p><em>Get a look into the mind of an engineer! Learn about the various systems they use and how it could impact the design of a product. We will discuss popular frameworks, how mobile apps are architected, common problems that keep engineers up at night, and the real reasons behind those “are you crazy?!” looks we sometimes give you when you ask if something is feasible.</em></p>
<p><p>
<strong>For more on all things Pulse, check out our main blog at <a href="http://blog.pulse.me" target="_blank">blog.pulse.me</a>.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/recap-tech-templeton/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Join Us for Tech &amp; Templeton!</title>
		<link>http://eng.pulse.me/join-us-for-tech-templeton/</link>
		<comments>http://eng.pulse.me/join-us-for-tech-templeton/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 23:58:56 +0000</pubDate>
		<dc:creator>Katie</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=186</guid>
		<description><![CDATA[After last month&#8217;s awesome (and jam-packed!) Tech Talk, we have high hopes for Wednesday&#8217;s Tech &#038; Templeton event. Join us at Pulse HQ for an evening of talks on a crucial piece of the app puzzle: collaboration between engineers and designers. Make sure to RSVP here! You know them, you love them, but how do [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://eng.pulse.me/wp-content/uploads/2013/02/facebook_bg.png"><img src="http://eng.pulse.me/wp-content/uploads/2013/02/facebook_bg.png" alt="facebook_bg" width="700" class="aligncenter size-full wp-image-187" /></a></p>
<p><strong>After last month&#8217;s awesome (and jam-packed!) Tech Talk, we have high hopes for Wednesday&#8217;s <b>Tech &#038; Templeton</b> event. Join us at Pulse HQ for an evening of talks on a crucial piece of the app puzzle: collaboration between engineers and designers.</strong></p>
<p><b><a href="http://pulsetech3.eventbrite.com/" target="_blank">Make sure to RSVP here!</a></b></p>
<p>You know them, you love them, but how do you work with them? Hear from our Android and Design teams to get insider tips and tricks for building the best products with the other side of the aisle. We look forward to seeing you at Pulse HQ!</p>
<p>Your evening will start off at 6:45pm with a tasting of Al Capone&#8217;s &#8220;good stuff&#8221;—the deliciously infamous <a href="http://www.templetonrye.com/" target="_blank">Templeton Rye</a>. Our first presentation will begin at 7:15pm. Read more about this unique set of talks below:</p>
<p><b>Engineering Designers</b><br />
<em>By Lead Android Developer Albert Lai</em><br />
Get a look into the mind of an engineer! Learn about the various systems they use and how it could impact the design of a product. We will discuss popular frameworks, how mobile apps are architected, common problems that keep engineers up at night, and the real reasons behind those &#8220;are you crazy?!&#8221; looks we sometimes give you when you ask if something is feasible.</p>
<p><b>Designing Engineers</b><br />
<em>By Lead Product Designer Stuart Norrie</em><br />
How far do designers and developers need to go into each others&#8217; worlds? What things can each camp learn about the other to make the transition from sketches and mocks to working builds as smooth as possible? This talk will outline what works and what doesn&#8217;t work from a designer perspective.</p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/join-us-for-tech-templeton/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recap: Pulse&#8217;s Tech &amp; Tonic</title>
		<link>http://eng.pulse.me/recap-pulses-tech-tonic/</link>
		<comments>http://eng.pulse.me/recap-pulses-tech-tonic/#comments</comments>
		<pubDate>Tue, 05 Feb 2013 01:38:16 +0000</pubDate>
		<dc:creator>Katie</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=184</guid>
		<description><![CDATA[Wednesday saw Pulse&#8217;s second official Tech Talk, Tech &#038; Tonic! The event was packed to the gills, and we thank those of you who made it out to our San Francisco HQ for presentations, absinthe, and great conversation. We were also honored to have people following along on Twitter, located everywhere from the Mission to [...]]]></description>
				<content:encoded><![CDATA[<p><b>Wednesday saw Pulse&#8217;s second official Tech Talk, <a href="http://pulsetech2.eventbrite.com/" target="_blank">Tech &#038; Tonic</a>!</b> </p>
<p>The event was packed to the gills, and we thank those of you who made it out to our San Francisco HQ for presentations, absinthe, and great conversation. We were also honored to have people following along on Twitter, located everywhere from the Mission to Malaysia. We appreciate the support, and we&#8217;d love to hear any questions or feedback you have about the event! Let us know on <a href="http://www.facebook.com/pulsepad" target="_blank">Facebook</a> and <a href="http://www.twitter.com/pulsepad" target="_blank">Twitter</a>.</p>
<p>For those of you who couldn&#8217;t make it, check out our recaps:</p>
<p><b><a href="http://www.slideshare.net/pulsepad/speed-up-your-web-app-by-asynchronously-loading-resources" target="_blank">Speed Up Your Web App By Asynchronously Loading Resources</a></b><br />
<i><b>By Lead Web Developer Filip Mares</b></i></p>
<p><center><iframe src="http://www.slideshare.net/slideshow/embed_code/16299016" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen webkitallowfullscreen mozallowfullscreen> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="http://www.slideshare.net/pulsepad/speed-up-your-web-app-by-asynchronously-loading-resources" title="Speed Up Your Web App By Asynchronously Loading Resources" target="_blank">Speed Up Your Web App By Asynchronously Loading Resources</a> </strong> from <strong><a href="http://www.slideshare.net/pulsepad" target="_blank">Pulse</a></strong> </div>
<p></center></p>
<p><em>From Filip Mares:</em> &#8220;The Pulse web app is built with a mixture of backbone.js/Django/YepNope.js. The old ways of packaging your static resources are not optimal for single page apps. Using yepnope.js you can load all of these files asynchronously and speed up load times for the core of your app. At Pulse we lowered our bandwidth use and sped up our initial load times through the use of asynchronous resource loading.&#8221;</p>
<p><b><a href="http://www.slideshare.net/pulsepad/syncing-nontrivial-user-data-across-mobile-devices" target="_blank">Syncing Non-Trivial User Data Across Mobile Devices</a><br />
<em>By Pulse Co-Founder &#038; CTO Ankit Gupta &#038; Lead Backend Engineer Greg Bayer</em></b></p>
<p><center><iframe src="http://www.slideshare.net/slideshow/embed_code/16299033" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen webkitallowfullscreen mozallowfullscreen> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="http://www.slideshare.net/pulsepad/syncing-nontrivial-user-data-across-mobile-devices" title="Syncing Non-Trivial User Data Across Mobile Devices" target="_blank">Syncing Non-Trivial User Data Across Mobile Devices</a> </strong> from <strong><a href="http://www.slideshare.net/pulsepad" target="_blank">Pulse</a></strong> </div>
<p></center></p>
<p><em>From Greg Bayer:</em> &#8220;The core concepts we covered are:<br />
1) Mobile &#038; offline support means not having a single source of truth. Implications of this.<br />
2) Making sure the user&#8217;s mental model matches what the client and syncing service does.<br />
3) Some challenges and techniques for making syncing data across mobile devices fast.&#8221;</p>
<p>Thanks for joining us, and stay tuned for our next event!</p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/recap-pulses-tech-tonic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcing: Tech &amp; Tonic!</title>
		<link>http://eng.pulse.me/announcing-tech-tonic/</link>
		<comments>http://eng.pulse.me/announcing-tech-tonic/#comments</comments>
		<pubDate>Tue, 22 Jan 2013 19:10:18 +0000</pubDate>
		<dc:creator>Katie</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=181</guid>
		<description><![CDATA[We&#8217;re bringing a winning combination to 2 Shaw Alley: Tech and Tonic! Join us for a gin and absinthe tasting courtesy of Raff Distillerie, then hear from members of Pulse&#8217;s iOS and Web teams. Make sure to RSVP here, and read more about the tech talks below: Speed Up Your Web App By Asynchronously Loading [...]]]></description>
				<content:encoded><![CDATA[<p>We&#8217;re bringing a winning combination to 2 Shaw Alley: <a href="http://pulsetech2.eventbrite.com" target="_blank">Tech and Tonic</a>! Join us for a <strong>gin and absinthe tasting</strong> courtesy of <a href="http://www.raffdistillerie.com/" target="_blank">Raff Distillerie</a>, then hear from members of Pulse&#8217;s <strong>iOS and Web</strong> teams.</p>
<p><strong>Make sure to RSVP <a href="http://pulsetech2.eventbrite.com/" target="_blank">here</a></strong>, and read more about the tech talks below:</p>
<div><strong>Speed Up Your Web App By Asynchronously Loading Resources</strong></div>
<div><em>By Lead Web Developer Filip Mares</em></div>
<p>The Pulse web app is built with a mixture of backbone.js/Django/YepNope.js. This talk will give an overview of our architecture and conventions for developing the app, as well as discuss some limitations in performance and how we&#8217;ve overcome them.</p>
<div><strong>Syncing Non-Trivial User Data Across Mobile Devices</strong></div>
<div><em>By Pulse Co-Founder &#038; CTO Ankit Gupta and Lead Backend Engineer Greg Bayer</em></div>
<p>One of the most delightful features in Pulse is the automatic syncing of user&#8217;s sources across their devices. This tech talk will cover the architecture and implementation of our syncing service, from both app and server perspectives. We will go over real world considerations, including speed, efficiency, offline access and user interface decisions. We will provide specific best practices for iOS and Android apps as well.</p>
<p>We can&#8217;t wait to see you!</p>
<p><strong>Wednesday, January 30th<br />
6:45 PM<br />
2 Shaw Alley, 5th Floor<br />
San Francisco</strong></p>
<p><em>As always, get more updates about all things Pulse at <a href="http://www.facebook.com/pulsepad" target="_blank">Facebook</a> and <a href="http://www.twitter.com/pulsepad" target="_blank">Twitter</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/announcing-tech-tonic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recap: Pulse&#8217;s First Tech Talk</title>
		<link>http://eng.pulse.me/recap-pulses-first-tech-talk/</link>
		<comments>http://eng.pulse.me/recap-pulses-first-tech-talk/#comments</comments>
		<pubDate>Tue, 18 Dec 2012 20:45:57 +0000</pubDate>
		<dc:creator>Katie</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=176</guid>
		<description><![CDATA[Pulse had its first Tech Talk last Wednesday at our HQ in San Francisco! Featuring a tequila tasting and two presentations from members of the team, the event was a success—and the first of many Tech Talks to come. Thank you to everyone who attended! If you&#8217;d like to hear about our upcoming events, keep [...]]]></description>
				<content:encoded><![CDATA[<p><strong>Pulse had its first <a href="http://pulsetechtalk1.eventbrite.com/"><strong>Tech Talk</strong></a> last Wednesday at our HQ in San Francisco!</strong> Featuring a tequila tasting and two presentations from members of the team, the event was a success—and the first of many Tech Talks to come. Thank you to everyone who attended!</p>
<p>If you&#8217;d like to hear about our upcoming events, keep your eyes on this page, follow us on <strong><a href="http://www.facebook.com/pulsepad" target="_blank">Facebook</a></strong> and <strong><a href="http://www.twitter.com/pulsepad" target="_blank">Twitter</a></strong>, or send us an <strong><a href="mailto:feedback@pulse.me" target="_blank">email</a></strong> to be added to our priority mailing list.</p>
<p>If you missed the presentations, check out the slides and recaps below:</p>
<p><strong><a href="http://www.slideshare.net/pulsepad/building-on-the-shoulders-of-giants-how-we-boostrapped-an-mvp-data-product-on-aws-and-gae" target="_blank">Building on the Shoulders Of Giants:</a><br />
<em>How We Boostrapped an MVP Data Product on AWS and GAE</em></strong></p>
<p><center><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/15689394" height="356" width="427" allowfullscreen="" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe></center>&nbsp;</p>
<p><em>From Backend Engineer <a href="http://gbayer.com" target="_blank">Greg Bayer</a>:</em> &#8220;Pulse&#8217;s backend runs on both Amazon Web Services (AWS) and Google App Engine (GAE). Our engineering philosophy is to minimize ops and low-level system development, so we can focus on core product features. To that end, we try to build things on GAE&#8217;s more managed infrastructure first and switch over to AWS if a particular feature doesn&#8217;t fit well with Google&#8217;s architecture.</p>
<p>In Wednesday&#8217;s talk we highlighted our Scribe-based event log collection and EMR-based analysis systems on AWS. We also talked about using GAE to manage user accounts, and how we repurpose GAE&#8217;s map reduce to efficiently send email to a large sets of users.&#8221;</p>
<p><em>From Product Engineer Elliot Babchick:</em> &#8220;We covered the steps involved in Pulse&#8217;s data pipeline for creating an email digest. We covered the raw logging of our user events all the way up to our App Engine map reduce job to send the emails. Results of the effectiveness were presented (~2x engagement compared with our non-personalized emails), along with several tips about how to make the process easier to debug.&#8221;</p>
<p>&nbsp;</p>
<p><strong><a href="http://www.slideshare.net/pulsepad/albert-tech-talk" target="_blank">One Screen, Two Screen, Red Screen, Blue Screen:</a><br />
<em>Designing and Engineering a Mobile Application for Multiple Screen Sizes</em></strong></p>
<p><center><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/15689393" height="356" width="427" allowfullscreen="" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe></center>&nbsp;</p>
<p><em>From Lead Android Developer Albert Lai:</em> &#8220;Creating an app is more complicated now than it was three years ago; you can&#8217;t just design for the iphone screen and be done with it! You&#8217;ll have to be prepared to run your app on a diverse array of screen sizes and lucky for you there are some guidelines to help your code be as adaptable as you are.</p>
<p>1. use space judiciously<br />
2. use popups for small actions in tablets<br />
3. you&#8217;ll have to write size dependent code and use dynamic layouts</p>
<p>Best practices:<br />
1. Use Fragments in Android (they&#8217;re in the compatability library) and ModalViewControllers in iOS<br />
2. There are classes in the respective SDKs that tell you what kind of device your code is running on. Use them.<br />
3. Never use absolute coordinates to layout your app<br />
4. Use RelativeLayouts in Android and Autolayout in iOS<br />
5. Calculate dimensions for UI elements on-the-fly in your app&#8221;</p>
<p>&nbsp;</p>
<p>Stay tuned for our next event!</p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/recap-pulses-first-tech-talk/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Join us at Pulse&#8217;s First Tech Talk!</title>
		<link>http://eng.pulse.me/join-us-at-pulses-first-tech-talk/</link>
		<comments>http://eng.pulse.me/join-us-at-pulses-first-tech-talk/#comments</comments>
		<pubDate>Wed, 28 Nov 2012 20:47:40 +0000</pubDate>
		<dc:creator>Katie</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=172</guid>
		<description><![CDATA[Join at Pulse HQ on December 12th from 6:30 to 8:00 PM to learn about the technology that drives Pulse, from the people who make it happen. Meet the team, check out our San Francisco office space, and even enjoy a tequila tasting. Make sure to RSVP here! The presentations include: Building on the Shoulders [...]]]></description>
				<content:encoded><![CDATA[<p><img src="http://eng.pulse.me/wp-content/uploads/2012/11/facebook_bg.png" alt="" title="facebook_bg" width="690" class="aligncenter size-full wp-image-173" /></p>
<p>Join at Pulse HQ on December 12th from 6:30 to 8:00 PM to learn about the technology that drives Pulse, from the people who make it happen. Meet the team, check out our San Francisco office space, and even enjoy a tequila tasting. <strong>Make sure to RSVP <a href="http://pulsetechtalk1.eventbrite.com/">here!</a></strong></p>
<p>The presentations include:</p>
<div><strong>Building on the Shoulders Of Giants</strong></div>
<p><em>How We Boostrapped an MVP Data Product on AWS and GAE</em></p>
<p>This talk, led by backend lead Greg Bayer, will attempt to cover several major components of Pulse&#8217;s backend infrastructure, including an overview of the services we leverage, and how we specifically make use of them. Along the way, product engineer Elliot Babchick will be relating how each one of these pieces allowed us to create a product feature driven by user generated data, at scale, in just under a week, under a single engineer&#8217;s supervision.</p>
<div><strong>One Screen, Two Screen, Red Screen, Blue Screen</strong></div>
<p><em>Designing and Engineering a Mobile Application for Multiple Screen Sizes</em></p>
<p>Screens are everywhere; they&#8217;re even on little devices inside our pockets. Yet how do we know how big they are when we load our software onto those portable screens? Android lead Albert Lai will walk through some of the key heuristics and techniques used to ensure a seamless Pulse experience that adapts to virtually any mobile device size.</p>
<p>
______________</p>
<p><i>Immediately following our tech talk, we&#8217;ll be discussing (and tasting) the technology of <a href="http://www.tequilaspremium.com/">tequilas premium</a>, from a blanco to a reposado to an anejo. Our brand representative will walk you through the tasting profiles and teach you what makes an alcohol a tequila, what makes a tequila a blanco vs a reposado, and more. For guests 21 and up.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/join-us-at-pulses-first-tech-talk/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Backend Tips – App Engine, meet Redis on AWS</title>
		<link>http://eng.pulse.me/gae-meet-redis-on-aws/</link>
		<comments>http://eng.pulse.me/gae-meet-redis-on-aws/#comments</comments>
		<pubDate>Thu, 13 Sep 2012 15:16:53 +0000</pubDate>
		<dc:creator>Leonard</dc:creator>
				<category><![CDATA[Backend]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[Caching]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Redis]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=156</guid>
		<description><![CDATA[Since snappy performance is critical to providing a good user experience, we try to keep the latency of all common Pulse backend API requests under 500ms. Most of the time we achieve this by using Google App Engine&#8217;s memcache to cache all data which might be reused by many requests. Less commonly requested data is pulled from [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://eng.pulse.me/wp-content/uploads/2012/09/redis.png"><img class="alignleft  wp-image-162" style="margin-left: 10px; margin-right: 10px;" title="redis" src="http://eng.pulse.me/wp-content/uploads/2012/09/redis-300x116.png" alt="" width="210" height="81" /></a>Since snappy performance is critical to providing a good user experience, we try to keep the latency of all common <a href="http://www.pulse.me" target="_blank">Pulse</a> backend API requests under 500ms. Most of the time we achieve this by using Google App Engine&#8217;s <a href="https://developers.google.com/appengine/docs/python/memcache/usingmemcache" target="_blank">memcache</a> to cache all data which might be reused by many requests. Less commonly requested data is pulled from the datastore, resulting in such requests taking a bit longer than we like.</p>
<p>When these slower requests are rare, we accept them. However, for features that access a broad range of data, the likelihood of missing the cache increases. Some data required for a request may be cached, but some will almost always not be, resulting in high latency for <em>most</em> requests.</p>
<p>To implement these types of features efficiently, one option is to dramatically increase the size of our memcache. This would allow us to keep all required data in cache. However, it would be expensive and is somewhat at odds with the <a href="http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used" target="_blank">LRU cache policy</a> we like to use for other features. This approach is also currently unsupported on Google App Engine (since memcache capacity is not directly tunable).</p>
<p>We investigated several other options and finally settled on using Redis as a persistent, in-memory, datastore. Redis strikes a great balance between simplicity, powerful primitives, and proven stability. Instead of increasing our memcache or switching entirely to a larger in-memory store, we created a second Redis-based system on AWS. This system is specifically designed to hold data which is important to have available at in-memory speeds (with no expected misses). Achieving this is more expensive than providing a similar LRU cache (which could be smaller), so we reserve it specifically for features that require such guarantees.</p>
<h2>Architecture</h2>
<p>We wanted to use Redis, but also to make sure that our implementation was both scalable and easily recoverable in the case of failure. From here on out, we will discuss the infrastructure and tools we use to build this system. Here&#8217;s a visual overview of the system:</p>
<p><img class="alignleft size-full wp-image-163" title="Redis Blog" src="http://eng.pulse.me/wp-content/uploads/2012/09/Redis-Blog.png" alt="" width="720" height="540" /></p>
<p>&nbsp;</p>
<h3>Amazon Elastic Load Balancer</h3>
<p>This is a really nice utility that AWS gives us. We setup an <a href="http://aws.amazon.com/elasticloadbalancing/" target="_blank">ELB</a> that points to as many <a href="http://aws.amazon.com/ec2/" target="_blank">EC2 machines</a> as we need, and for each of those machines (we’ll call them redis frontends), we get automatic round-robin balancing and it will also detect failing machines, give us a warning, and transfer the load to the running machines. Some important dos:</p>
<ol>
<li>The load balancer can deal with https requests, so use them! Some security is always better than none.</li>
<li>You should make sure that the machines you provide to the load balancer are distributed among the different regions that AWS offers.</li>
<li>You can also use dynamic scaling by putting dynamic instances into a group and giving the group to the load balancer.</li>
</ol>
<h3><strong><strong><br />
</strong></strong>HA Proxy</h3>
<p>Our redis frontend machines use <a href="http://en.wikipedia.org/wiki/Tornado" target="_blank">Tornado</a> as the webserver. Tornado is fast (great!) and single threaded. Single threaded prevents many headaches, scales predictably and has minimal overhead, but doesn&#8217;t benefit from multiple cores on a machine. The larger Amazon machines have multiple cores, so we really want to use that to our advantage. Enter HA Proxy, a nice utility that allows you to build an reverse proxy. Here’s a barebone version of the configuration we use:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;height:300px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">global<br />
maxconn <span style="color: #000000;">1024</span><br />
daemon<br />
log 127.0.0.1 local0<br />
frontend load_balancer<br />
<span style="color: #666666; font-style: italic;"># We process all requests hitting port 8080</span><br />
<span style="color: #7a0874; font-weight: bold;">bind</span> <span style="color: #000000; font-weight: bold;">*</span>:<span style="color: #000000;">8080</span><br />
<span style="color: #666666; font-style: italic;"># We will point them to the backend we describe later</span><br />
default_backend tornado_servers<br />
mode http<br />
option httplog<br />
option dontlognull<br />
clitimeout <span style="color: #000000;">20000</span><br />
backend tornado_servers<br />
<span style="color: #666666; font-style: italic;"># The balancing strategy</span><br />
balance roundrobin<br />
<span style="color: #666666; font-style: italic;"># The tornado servers, in this case, the machine has 4 cores</span><br />
server tornado_1 127.0.0.1:<span style="color: #000000;">13371</span> check rise <span style="color: #000000;">2</span> fall <span style="color: #000000;">5</span><br />
server tornado_1 127.0.0.1:<span style="color: #000000;">13372</span> check rise <span style="color: #000000;">2</span> fall <span style="color: #000000;">5</span><br />
server tornado_1 127.0.0.1:<span style="color: #000000;">13373</span> check rise <span style="color: #000000;">2</span> fall <span style="color: #000000;">5</span><br />
server tornado_1 127.0.0.1:<span style="color: #000000;">13374</span> check rise <span style="color: #000000;">2</span> fall <span style="color: #000000;">5</span><br />
retries <span style="color: #000000;">1</span><br />
mode http<br />
contimeout <span style="color: #000000;">5000</span><br />
srvtimeout <span style="color: #000000;">20000</span><br />
<span style="color: #666666; font-style: italic;"># We also get stats from HA Proxy about our tornado servers</span><br />
stats <span style="color: #7a0874; font-weight: bold;">enable</span><br />
stats uri <span style="color: #000000; font-weight: bold;">/</span>lb?stats</div></div>
<h3></h3>
<h3>Tornado Frontends</h3>
<p>Each of these Tornado instances provides a thin python api layer. The implementation is both simplistic and very specific to our own use-cases. I won’t go into the specific details, but the frontend takes care of all of the security and implements the internal API we provide to our client teams. Certain general tasks like deserialization, error handling, and batching requests before hitting the backend were also very important. We run enough instances to match the number of cores on the machine and they all rely on the sharded redis interface to actually access the data.</p>
<h3>Sharded Redis Interface</h3>
<p>This is based heavily off of redis-py by Andy McCurdy, so many thanks to him. You can take a look at <a href="https://github.com/andymccurdy/redis-py/">https://github.com/andymccurdy/redis-py/</a></p>
<p>The thing we needed to add was the ability to split our data amongst several different machines. Andy is working on a general solution for this called cluster redis, but we opted to go with something simpler in the meantime.</p>
<p>The first thing was to implement the actual sharding, something like:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">def</span> find_shard<span style="color: black;">&#40;</span>key<span style="color: black;">&#41;</span>:<br />
hash_value <span style="color: #66cc66;">=</span> some_consistent_hash_function<span style="color: black;">&#40;</span>key<span style="color: black;">&#41;</span><br />
<span style="color: #ff7700;font-weight:bold;">return</span> hash_value % num_machines</div></div>
<p>With that little snippet, it was pretty easy to send operations to a wrapper class of StrictRedis (look at redis-py), and just have all the tornado frontends behave as if there was a single machine serving the data. This works as long as you don’t want to use pipelines.</p>
<p>However, it turns out that you really do want to use pipelines. Whenever you have multiple requests that you can send out at the same time, a pipeline will save you all the roundtrip time of single requests. Without pipelines, it doesn’t matter how blazingly fast redis is, you are stuck on network i/o latency.</p>
<p>Getting pipelines to work is a little bit more involved. Now when a request comes in on a pipeline, we index it by the order it came in and store that tied to the individual machine pipeline we created. An example with two machines:</p>
<div class="codecolorer-container xml default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">command1 key1 value1 (key1 -<span style="color: #ddbb00;">&amp;gt;</span> machine 1)<br />
command2 key2 value2 (key2 -<span style="color: #ddbb00;">&amp;gt;</span> machine 2)<br />
command3 key3 value3 (key3 -<span style="color: #ddbb00;">&amp;gt;</span> machine 1)<br />
command4 key4 value4 (key4 -<span style="color: #ddbb00;">&amp;gt;</span> machine 1)</div></div>
<p>We will remember it like this:<br />
Pipeline index for machine 1:<br />
[1, 3, 4]<br />
Pipeline for machine 1 will contain:<br />
command1 key1 value1<br />
command3 key3 value3<br />
command4 key4 value4<br />
Pipeline index for machine 2:<br />
[2]<br />
Pipeline for machine 2 will contain:<br />
command2 key2 value2</p>
<p>Now when we execute all the pipelines, we will be able to reconstitute the return values in the order they came in to the sharded_redis interface. With solutions to both the sharding and pipelines, we now have an interface that hides the fact that we actually need multiple machines to serve all the data. Notice that since each tornado frontend uses the interface independently we need to update them synchronously when we make changes!</p>
<h3>Redis Backend</h3>
<p>Here are a few tips for setting up redis:</p>
<ol>
<li>Use a password, and make it a long password</li>
<li>Set a memory limit and a reasonable policy to deal with exceeding max memory</li>
<li>Change your machine overcommit_memory setting to 1
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">sysctl <span style="color: #660033;">-w</span> vm.overcommit_memory=<span style="color: #000000;">1</span></div></div>
</li>
<li>Don’t run anything except redis on this machine</li>
<li>If you are using AOF files and backup machines (recommended), don&#8217;t bother with persistence on the master! Instead, make sure you have an agressive fsync policy (everysec works) for the slave.</li>
</ol>
<div>For those who want the &#8220;why&#8221; behind each of the tips:</div>
<ol>
<li>From Redis Documentation:<br />
<blockquote><p>The password is set by the system administrator in clear text inside the redis.conf file. It should be long enough to prevent brute force attacks for two reasons:</p>
<ul>
<li>Redis is very fast at serving queries. Many passwords per second can be tested by an external client.</li>
<li>The Redis password is stored inside the redis.conf file and inside the client configuration, so it does not need to be remembered by the system administrator, and thus it can be very long.</li>
</ul>
<p>The goal of the authentication layer is to optionally provide a layer of redundancy. If firewalling or any other system implemented to protect Redis from external attackers fail, an external client will still not be able to access the Redis instance without knowledge of the authentication password.</p>
<p>Note: The AUTH command, like every other Redis command, is sent unencrypted, so it does not protect against an attacker that has enough access to the network to perform eavesdropping.</p></blockquote>
</li>
<li>We actually monitor the machine memory usage as well as the redis memory usage to shard our redis backend more as needed. Even so, its safer to set a reasonable limit of memory that redis should use so that we don&#8217;t have a scenario where redis uses all available memory on a machine and then crashes.</li>
<li>From Redis Documentation:<br />
<blockquote><p>Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will <em>share</em> the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can&#8217;t tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.</p>
<p>Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.</p></blockquote>
</li>
<li>Because of the large memory footprint we expect redis to use and the fact that we have to use an optimistic memory allocation setting, running anything else that might use up a lot of memory on the same machine can lead to failures.</li>
<li>This is a optimization to make sure the master Redis instance does not bottleneck because of disk writes. The work associated with persistence is offloaded as much as possible to a backup machine That being said, its important that the slave/backup machine is robust.</li>
</ol>
<h3>Backup</h3>
<p>This is simply a second machine running Redis that is set as a slave to the master Redis instance. In AWS, remember to use internal ip addresses when setting this up, since it saves you money. Backups are a must when you are running redis in production for several reasons:</p>
<ol>
<li>It’s a backup! If your machine in front goes down, you fail over to the backup as you try to fix the first machine. More often than not, you can actually just promote the backup and setup a new backup when you are running on AWS.</li>
<li>If you ever need to expand the number of machines used for serving, you can just promote your backup to a serving machine and set up new backups for both machines. I would be remiss not to mention that you do have to then go through both machines to delete the extra keys later, or else you really won’t have expanded your memory limit.</li>
<li>You can run data analytics on the backup without affecting the all important performance of the actual serving machine.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/gae-meet-redis-on-aws/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Backend Tips &#8211; Conquering Big Tables with MapReduce</title>
		<link>http://eng.pulse.me/backend-tips-google-app-engine-map-reduce/</link>
		<comments>http://eng.pulse.me/backend-tips-google-app-engine-map-reduce/#comments</comments>
		<pubDate>Thu, 16 Aug 2012 22:23:01 +0000</pubDate>
		<dc:creator>Simon Tao</dc:creator>
				<category><![CDATA[Backend]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=148</guid>
		<description><![CDATA[As some of our readers already know, Pulse uses Google App Engine (GAE) to serve content from thousands of publishers to millions of users. We have been very happy with the minimal operational overhead App Engine requires and were thrilled to see App Engine scale without hiccups when we were preloaded on the Kindle Fire. [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignleft wp-image-141" src="http://eng.pulse.me/wp-content/uploads/2012/08/elephant.png" alt="mapreduce" />As some of our readers already know, <a href="http://www.pulse.me/">Pulse</a> uses Google App Engine (GAE) to serve content from thousands of publishers to millions of users. We have been very happy with the minimal operational overhead App Engine requires and were thrilled to see <a href="http://googleappengine.blogspot.com/2011/11/scaling-with-kindle-fire.html">App Engine scale without hiccups</a> when we were preloaded on the Kindle Fire.</p>
<p>As a backend engineer, it is inevitable that some engineering tasks involve heavy data processing. In our case, this often happens on data in the App Engine datastore. We have always relied on the very flexible and easy <a href="https://developers.google.com/appengine/articles/remote_api">remote shell</a> to do this type of work. However, this approach is too slow for many use cases, especially those touching millions of records.</p>
<p>For larger tasks, App Engine&#8217;s built-in <a href="http://code.google.com/p/appengine-mapreduce/">MapReduce</a> is often the right tool. It allows us to quickly operate on millions of datastore entities in a very short amount time. To give a few examples, we use MapReduce: to quickly migrate existing data from legacy datastore models to new models due to architectural changes, to perform load testing on our system with hundreds of shards simulating millions of users, and to inform our users of Pulse’s latest updates by sending out millions of emails or push notifications.</p>
<h3>Data Migration</h3>
<p>When making product changes, we sometimes move large amounts of data away from a legacy <a href="https://github.com/django-nonrel/django-nonrel" target="_blank">django-nonrel</a> model. The speed of MapReduce ensures that minimal transition time is required and that the transition is painless enough that it is preferred over simply living with the wrong data model.</p>
<h3>Load Testing</h3>
<p>We use MapReduce to simulate load tests that would otherwise be unrealistic if we only used a few physical machines. A simple load test might use MapReduce to make thousands of requests within a very short period. These requests can simulate millions of users using Pulse throughout a day.</p>
<h3>Lessons Learned</h3>
<p>You should plan and test any large Map Reduce task that will consume quota-limited resources before running the full job. It&#8217;s a good idea to estimate the amount of datastore reads/writes, url fetch calls, and other API requests beforehand. In some cases, it may be necessary to contact App Engine support to ask for increased quotas (for those that cannot be increased in the admin console).</p>
<p>For those using a framework on top of App Engine, make sure you initialize at the top of your handler file (see below). In some cases, you may also need to add the initialization code to the mapreduce module (at the top of mapreduce/main.py). In Django-nonrel, the init line you&#8217;ll need looks like this.</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">from</span> djangoappengine <span style="color: #ff7700;font-weight:bold;">import</span> main</div></div>
<h3></h3>
<h3>Getting Started</h3>
<p>For those of you new to Map Reduce on App Engine, here&#8217;s how to create jobs of your own. The App Engine team has made it pretty easy.</p>
<p><strong>Download the mapreduce library via svn and add it to your app:</strong></p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp;<span style="color: #c20cb9; font-weight: bold;">svn checkout</span> http:<span style="color: #000000; font-weight: bold;">//</span>appengine-mapreduce.googlecode.com<span style="color: #000000; font-weight: bold;">/</span>svn<span style="color: #000000; font-weight: bold;">/</span>trunk<span style="color: #000000; font-weight: bold;">/</span>python<span style="color: #000000; font-weight: bold;">/</span>src<span style="color: #000000; font-weight: bold;">/</span>mapreduce</div></div>
<p><strong>Register the MapReduce handler in your app.yaml:</strong></p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">handlers:<br />
- url: <span style="color: #000000; font-weight: bold;">/</span>mapreduce<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #000000; font-weight: bold;">/</span>.<span style="color: #000000; font-weight: bold;">*</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>?<br />
&nbsp; script: mapreduce<span style="color: #000000; font-weight: bold;">/</span>main.py<br />
&nbsp; login: admin</div></div>
<p>url &#8211; The MapReduce endpoints.<br />
script &#8211; The handler file containing the task you want to perform.<br />
login &#8211; Restricts access to app admins only.</p>
<p><strong>Create the handler file you specified in the previous step (mr_email_users.py) and pass in the model you want to map over:</strong></p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span>user_entity<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; send_email<span style="color: black;">&#40;</span>user_entity.<span style="color: #dc143c;">email</span><span style="color: black;">&#41;</span></div></div>
<p>Note: See the official Map Reduce guide below for more advanced options &#038; examples.</p>
<p><strong>Register and configure the MapReduce job in mapreduce.yaml:</strong></p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mapreduce:<br />
- name: MapReduce Email Users Job<br />
&nbsp; mapper:<br />
&nbsp; &nbsp; input_reader: mapreduce.input_readers.DatastoreInputReader<br />
&nbsp; &nbsp; handler: mr_email_users.run<br />
&nbsp; &nbsp; params:<br />
&nbsp; &nbsp; - name: entity_kind<br />
&nbsp; &nbsp; &nbsp; default: user<br />
&nbsp; &nbsp; - name: shard_count<br />
&nbsp; &nbsp; &nbsp; default: <span style="color: #000000;">50</span><br />
&nbsp; &nbsp; - name: processing_rate<br />
&nbsp; &nbsp; &nbsp; default: <span style="color: #000000;">1000</span></div></div>
<p>input_reader &#8211; The input reader for this job; you can find other types <a href="http://code.google.com/p/appengine-mapreduce/wiki/UserGuidePython#Specifying_readers">here</a>.<br />
handler &#8211; The entry point to this MapReduce job.<br />
entity_kind &#8211; The datastore model being mapped over.<br />
shard_count &#8211; The number of concurrent mapper workers to run at once.<br />
processing_rate &#8211; The aggregated maximum number of inputs processed per second by all mappers. Can be used to avoid using up all quota, interfering with online users.</p>
<p><strong>Access the MapReduce admin console panel to view and launch jobs:</strong></p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">http:<span style="color: #000000; font-weight: bold;">//</span><span style="color: #7a0874; font-weight: bold;">&#40;</span>your app name<span style="color: #7a0874; font-weight: bold;">&#41;</span>.appspot.com<span style="color: #000000; font-weight: bold;">/</span>mapreduce<span style="color: #000000; font-weight: bold;">/</span>status</div></div>
<h3></h3>
<h3>More Info</h3>
<p>You may be interested in the official MapReduce Get Started Guide for <a href="http://code.google.com/p/appengine-mapreduce/wiki/GettingStartedInPython">Python</a> or <a href="http://code.google.com/p/appengine-mapreduce/wiki/GettingStartedInJava">Java</a>. In addition, this <a href="http://www.google.com/events/io/2011/sessions/app-engine-mapreduce.html">2011 Google IO talk</a> includes many new useful MapReduce tips. Please leave any questions and comments below, and we will be happy to answer / discuss!</p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/backend-tips-google-app-engine-map-reduce/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Backend Tips &#8211; Google Cloud Storage</title>
		<link>http://eng.pulse.me/google-cloud-storage/</link>
		<comments>http://eng.pulse.me/google-cloud-storage/#comments</comments>
		<pubDate>Fri, 10 Aug 2012 01:18:44 +0000</pubDate>
		<dc:creator>Lili Dworkin</dc:creator>
				<category><![CDATA[Backend]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=144</guid>
		<description><![CDATA[Google App Engine&#8217;s datastore meets most of our backend storage needs, but we sometimes find ourselves limited by the maximum entity size of one megabyte. One option for storing larger files is to build a separate system on top of Amazon S3. A downside of this approach, however, is that we cannot take advantage of [...]]]></description>
				<content:encoded><![CDATA[<p>Google App Engine&#8217;s datastore meets most of our backend storage needs, but we sometimes find ourselves limited by the maximum entity size of one megabyte. One option for storing larger files is to build a separate system on top of <a href="http://aws.amazon.com/s3/" target="_blank">Amazon S3</a>. A downside of this approach, however, is that we cannot take advantage of Google&#8217;s edge cache, which acts as a <a href="http://eng.pulse.me/backend-tips-the-free-cdn/" target="_blank">free CDN</a>.</p>
<p>A second option is the new <a href="http://cloud.google.com/products/cloud-storage.html" target="_blank">Google Cloud Storage</a> service. Google Cloud Storage is the unofficial successor to the <a href="https://developers.google.com/appengine/docs/python/blobstore/" target="_blank">Google App Engine Blobstore</a>, and both services are built on the same underlying infrastructure. Yet unlike the Blobstore, which is bundled with App Engine, Google Cloud Storage is a standalone service for storing and managing data. As such, Cloud Storage is Google’s attempt to roll out an Infrastructure as a Service (IaaS) offering that can compete with Amazon S3.</p>
<h3>Getting Started</h3>
<p>In order to use Google Cloud Storage with App Engine, the first step is to grant your application access to your storage bucket. The <a href="https://developers.google.com/appengine/docs/python/googlestorage/overview" target="_blank">documentation</a> instructs you to add the application’s service account name (<a href="mailto:application-id@appspot.gserviceaccount.com" target="_blank">application-id@appspot.<wbr>gserviceaccount.com</wbr></a>) as a team member to your Google APIs console project.</p>
<p>However, since we created our project with a Google Apps account, this takes bit more effort.  Only users from our domain (<a href="mailto:xxx@yourdomain.com" target="_blank">xxx@yourdomain.com</a>) could be added to the team via the console. The <a href="http://stackoverflow.com/questions/8072436/google-cloud-storage-authentication-for-app-engine/8828849#8828849" target="_blank">solution</a> is to use the <a href="https://developers.google.com/storage/docs/gsutil" target="_blank">GSUtil command line tool</a> to edit the storage bucket’s Access Control List (ACL).</p>
<p>Run the following command to retrieve your bucket’s current ACL: <code class="codecolorer text default"><span class="text">gsutil getacl gs://bucketname &gt; acl.txt</span></code>. Then add an entry that looks like this:</p>
<div class="codecolorer-container xml default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Entry<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Scope</span> <span style="color: #000066;">type</span>=<span style="color: #ff0000;">&quot;UserByEmail&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;EmailAddress<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>application-id@appspot.gserviceaccount.com<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/EmailAddress<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Service Account<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Scope<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Permission<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>FULL_CONTROL<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Permission<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Entry<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></div></div>
<p>Finally, run this command to set the new ACL: <code class="codecolorer text default"><span class="text">gsutil setacl acl.txt gs://bucketname</span></code>.</p>
<h3>Storing Data</h3>
<p>Google provides an <a href="https://developers.google.com/appengine/docs/python/googlestorage/" target="_blank">experimental API</a> to integrate Cloud Storage with App Engine. This API allows for reading and writing of files to a storage bucket. While testing, I had already preloaded some test files into our bucket using the (barebones, but functional) <a href="https://developers.google.com/storage/docs/gsmanager" target="_blank">Cloud Storage Manager web application</a>. I could also have used the GSUtil tool.</p>
<p>Moving forward, we wanted to start loading files programmatically from within App Engine. The API documentation clearly explains how to create, write to, save, and read from Cloud Storage objects. Note that the function provided by the API to create a Google Cloud Storage object —<a href="https://developers.google.com/appengine/docs/python/googlestorage/functions#create" target="_blank">files.gs.create()</a> — takes a number of useful parameters. For instance, this is where you can specify the ACL and Cache-Control header for the object.</p>
<p>The documentation does not address the case in which the object you wish to save is a user upload. Storing uploaded files in a bucket can be accomplished using the Blobstore, as suggested by <a href="http://stackoverflow.com/questions/9237747/sending-images-to-google-cloud-storage-using-google-app-engine" target="_blank">this StackOverflow answer</a>. The <a href="https://gist.github.com/305322" target="_blank">blobstore_helper</a> module is useful for adapting this code for Django.  Simply replace <code class="codecolorer python default"><span class="python"><span style="color: #008000;">self</span>.<span style="color: black;">get_uploads</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'file'</span><span style="color: black;">&#41;</span></span></code> with <code class="codecolorer python default"><span class="python">blobstore_helper.<span style="color: black;">get_uploads</span><span style="color: black;">&#40;</span>request<span style="color: #66cc66;">,</span> <span style="color: #483d8b;">'file'</span><span style="color: black;">&#41;</span></span></code> in order to retrieve the uploaded files.</p>
<h3>Serving Content</h3>
<p>The Cloud Storage API does not offer a way to serve files directly from a storage bucket. Instead, you can use the Blobstore API to create a url that points at your file.</p>
<p>First, generate a blob key for the Cloud Storage object using the Blobstore API’s <a href="https://developers.google.com/appengine/docs/python/blobstore/functions" target="_blank">create_gs_key()</a> function. Then serve the object as you would a traditional blobstore object. The <a href="https://developers.google.com/appengine/docs/python/blobstore/overview#Serving_a_Blob" target="_blank">example</a> given for the Blobstore Python API assumes use of Google’s webapp framework, which provides helper functions (such as <code class="codecolorer python default"><span class="python"><span style="color: #008000;">self</span>.<span style="color: black;">send_blob</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></span></code>) that obscure the underlying implementation. This makes it a little tricky to understand how to port the code to a different framework, but once again the <a href="https://gist.github.com/305322" target="_blank">blobstore_helper</a> module offers some insight. The module defines its own <code class="codecolorer python default"><span class="python">send_blob</span></code> function, in which the key line of code is <code class="codecolorer python default"><span class="python">response<span style="color: black;">&#91;</span>blobstore.<span style="color: black;">BLOB_KEY_HEADER</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">=</span> <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>blob_key<span style="color: black;">&#41;</span></span></code>. Essentially, if you put a special header in the response containing the blob key, then App Engine will automatically fill the body of the response with the content of the blob.</p>
<p>To properly serve the blob, it is also necessary to set a correct Content-Type header for the response. Although the Cloud Storage REST API does support <a href="https://developers.google.com/storage/docs/reference-methods#headobject" target="_blank">retrieving an object’s metadata</a>, it seems that the API for App Engine does not. Currently, we rely on Python’s <a href="http://docs.python.org/library/mimetypes.html" target="_blank">mimetypes</a> module, which can guess content type from a filename: <code class="codecolorer python default"><span class="python">response<span style="color: black;">&#91;</span><span style="color: #483d8b;">'Content-Type'</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">=</span> <span style="color: #dc143c;">mimetypes</span>.<span style="color: black;">guess_type</span><span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span></span></code>.</p>
<p>An alternative approach to serving files from Cloud Storage, which applies to images only, is to use App Engine’s Image API. As of App Engine version 1.7.0, it is possible to use the <a href="https://developers.google.com/appengine/docs/python/images/functions" target="_blank">get_serving_url()</a> function with Cloud Storage objects. Simply generate the blob key as before, and plug into this function to generate a url for the image. One benefit of using this approach is that the serving url supports cropping and resizing on the fly by supplying optional parameters.</p>
<p>We will continue to investigate the best practices for using Google Cloud Storage with App Engine as a service for storing and serving large files. For others who might be interested, there was a helpful session at Google IO, entitled <a href="http://www.google.com/events/io/2011/sessions/storing-your-application-s-data-in-the-google-cloud.html" target="_blank">Storing Your Application&#8217;s Data in the Google Cloud</a>, that covers the basics of this new service. Of course, there are other options to consider as well, such as the Blobstore or Amazon S3. It remains to be seen which service will best meet our needs, but we&#8217;re glad that there is now a strong option on the Google side.<strong id="internal-source-marker_0.7268022326752543"><br />
</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/google-cloud-storage/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Backend Tips &#8211; The Free CDN</title>
		<link>http://eng.pulse.me/backend-tips-the-free-cdn/</link>
		<comments>http://eng.pulse.me/backend-tips-the-free-cdn/#comments</comments>
		<pubDate>Wed, 08 Aug 2012 00:33:25 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Backend]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[Caching]]></category>
		<category><![CDATA[CDN]]></category>
		<category><![CDATA[Scalability]]></category>

		<guid isPermaLink="false">http://eng.pulse.me/?p=135</guid>
		<description><![CDATA[New Blog Post Series This is the first in a series of blog posts in which we will offer a peek into the some of the challenges we tackle on the Backend Team and discuss some tips and tricks we have discovered. These posts will focus on the ways in which we use GAE and AWS [...]]]></description>
				<content:encoded><![CDATA[<h3><a href="http://eng.pulse.me/wp-content/uploads/2012/08/eng_tips.png"><img class="alignleft  wp-image-141" title="Pulse Backend Engineering Tips" src="http://eng.pulse.me/wp-content/uploads/2012/08/eng_tips.png" alt="" width="170" height="169" /></a>New Blog Post Series</h3>
<p><em>This is the first in a series of blog posts in which we will offer a peek into the some of the challenges we tackle on the <a href="http://eng.pulse.me/category/backend" target="_blank">Backend Team</a> and discuss some tips and tricks we have discovered. These posts will focus on the ways in which we use <a href="https://developers.google.com/appengine/" target="_blank">GAE</a> and <a href="http://aws.amazon.com/" target="_blank">AWS</a> to build simple features that have helped us to deliver an amazing product. We plan to dive a little deeper into topics we&#8217;ve covered before, as well as highlighting some new ones. Upcoming topics will include <a href="https://developers.google.com/appengine/docs/python/dataprocessing/overview" target="_blank">GAE MapReduce</a>, <a href="http://redis.io/" target="_blank">Redis</a>, <a href="https://cloud.google.com/products/cloud-storage.html" target="_blank">Google Cloud Storage</a>, and duplicate detection via <a href="http://en.wikipedia.org/wiki/Tf*idf" target="_blank">TF-IDF</a>. Our first entry in the series discusses how to use Google’s edge cache as a free content delivery network (CDN).</em></p>
<h3 dir="ltr">The Free CDN</h3>
<p>At the end of last year, we briefly mentioned Google’s edge cache as a useful feature as part of our <a href="http://googleappengine.blogspot.com/2011/11/scaling-with-kindle-fire.html" target="_blank">guest post</a> on the <a href="http://googleappengine.blogspot.com/" target="_blank">App Engine blog</a>. Since this is one of our favorite services, I’d like to take a few minutes to explain it in more detail. It is an extremely simple feature that has the potential to significantly improve content serving latency and can be very valuable in terms of cost savings over other CDNs. Hopefully it will be clear by the end of this post why you should think about using it for your next project.</p>
<h3 dir="ltr">Content Delivery Networks</h3>
<p><a href="http://en.wikipedia.org/wiki/Content_delivery_network" target="_blank">Content Delivery Networks (CDNs)</a> offer several benefits that are typically desired for both web and mobile apps. They are designed to cache content on many geographically distributed servers, as close to the end user as possible, thereby minimizing latency for requests to the cached content. There are several major CDN providers, but the big ones that come to mind are <a href="http://www.akamai.com/">Akamai</a> and <a href="http://aws.amazon.com/cloudfront/" target="_blank">Amazon&#8217;s Cloudfront</a>. CDNs vary in quality and price, but generally one should expect to pay a <a href="http://aws.amazon.com/cloudfront/pricing/" target="_blank">premium</a> for this type of service.</p>
<h3 dir="ltr">Google&#8217;s Edge Cache (aka. CDN)</h3>
<p>It turns out that if you&#8217;re using <a href="https://developers.google.com/appengine/">Google App Engine</a> (or other Google services like the newly announced <a href="http://cloud.google.com/products/cloud-storage.html" target="_blank">Google Cloud Storage</a>) and you configure things correctly, you get the same service for free. By simply setting public cache control headers wherever possible, you allow Google’s edge caches to serve unchanged content directly to users. Here&#8217;s an example of a set of response headers that will activate the cache:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:100%;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp;Cache-Control: public<span style="color: #66cc66;">,</span> max-age<span style="color: #66cc66;">=</span><span style="color: #ff4500;">900</span><span style="color: #66cc66;">,</span> must-revalidate</div></div>
<p>The most important component of the header is the word &#8216;public&#8217;. It tells Google&#8217;s network that the content in this response is not specific to a particular user or private in any way, so it&#8217;s safe to cache it as aggressively as possible. &#8216;max-age&#8217; allows you to decide how often this content will be refreshed from your servers, and &#8216;must-revalidate&#8217; is just telling the server (or client cache) to strictly follow this timeout.</p>
<p>This technique has been mentioned in at least one <a href="http://www.google.com/events/io/2011/sessions/scaling-app-engine-applications.html">Google IO talk</a>, but for some reason hasn&#8217;t been widely publicized. Because of the scale of Google&#8217;s network, this is perhaps the best CDN available. Best of all, there is no cost for this caching. It&#8217;s actually a win-win for both you and Google, since it minimizes the traffic that has to cross their internal networks and servers.</p>
<p>At <a href="http://www.pulse.me/">Pulse</a> we use this feature very heavily. It lets us serve high quality, mobile optimized images at &lt; 50ms latency, while also saving us lots of App Engine instance hours by preventing these requests from hitting our frontend servers. As you can see from the graph below, for this particular App Engine app, we are serving the majority of requests out of Google&#8217;s edge cache (labeled red). I encourage you to try it out. It&#8217;s almost too easy to be true! If you have questions, feel free to leave comments below or ping me <a href="https://twitter.com/gregbayer">@gregbayer</a>.</p>
<div><a href="http://eng.pulse.me/wp-content/uploads/2012/07/google_edge_cache.png"><img class="aligncenter  wp-image-136" title="google_edge_cache" src="http://eng.pulse.me/wp-content/uploads/2012/07/google_edge_cache.png" alt="" width="625" height="167" /></a></div>
<h3></h3>
<h3></h3>
]]></content:encoded>
			<wfw:commentRss>http://eng.pulse.me/backend-tips-the-free-cdn/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 1.835 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2013-05-19 14:05:08 -->
