improving bloggart regeneration performance with memcache-based memoization

Sat, 3 Jul 2010 13:07:27 +0800 | Filed under bloggart

If you've been following my gists and my bloggart fork, then you're probably aware that some time back, I added a simple memcache-based memoizer. I applied to BlogPost operations that I thought were computationally expensive - hashing and markup rendering.

Of course, you can't call something an optimization if you don't have the numbers to show, can you?

Here's the (pseudo-)methodology I used:

write a new post
write another new post
delete a post
delete another post

Each new post/post deletion triggers a regeneration, so it gives the memoization code path a chance to shine.

To help in stat collection, I wrote a simple python CLI-ed script to parse GAE logs for stats, as well as some code to clear the memoizer cache as a "control" aid.

Here are the numbers without memoization:

time: 5830ms; cpu: 5105ms; api: 1568ms
time: 5838ms; cpu: 5089ms; api: 1749ms
time: 5903ms; cpu: 6640ms; api: 1568ms
time: 4773ms; cpu: 2632ms; api: 1273ms

With memoization:

time: 10533ms; cpu: 5163ms; api: 1568ms
time: 7535ms; cpu: 2777ms; api: 1691ms
time: 5096ms; cpu: 3880ms; api: 1568ms
time: 3491ms; cpu: 2607ms; api: 1541ms

Notice how the first step has a high cpu time - we are doing the first computation, and doing some work to cache the result. In the second step, we can leverage the cached results, so we get a much lower cpu time.