improving bloggart regeneration performance with memcache-based memoization
If you've been following my gists and my bloggart fork, then you're probably aware that some time back, I added a simple memcache-based memoizer. I applied to BlogPost
operations that I thought were computationally expensive - hashing and markup rendering.
Of course, you can't call something an optimization if you don't have the numbers to show, can you?
Here's the (pseudo-)methodology I used:
- write a new post
- write another new post
- delete a post
- delete another post
Each new post/post deletion triggers a regeneration, so it gives the memoization code path a chance to shine.
To help in stat collection, I wrote a simple python CLI-ed script to parse GAE logs for stats, as well as some code to clear the memoizer cache as a "control" aid.
Here are the numbers without memoization:
- time: 5830ms; cpu: 5105ms; api: 1568ms
- time: 5838ms; cpu: 5089ms; api: 1749ms
- time: 5903ms; cpu: 6640ms; api: 1568ms
- time: 4773ms; cpu: 2632ms; api: 1273ms
With memoization:
- time: 10533ms; cpu: 5163ms; api: 1568ms
- time: 7535ms; cpu: 2777ms; api: 1691ms
- time: 5096ms; cpu: 3880ms; api: 1568ms
- time: 3491ms; cpu: 2607ms; api: 1541ms
Notice how the first step has a high cpu time - we are doing the first computation, and doing some work to cache the result. In the second step, we can leverage the cached results, so we get a much lower cpu time.