Random Thoughts About Caching

Imagine for a moment that there is a place where the results of all non-trivial computations are kept. Kind of a universal cache with unlimited capacity. To access this cache and get the results of a desired computation you need to specify the computation type and the problem set. For example, a computation type may be sorting and the problem set is the collection of numbers you wish to sort.

Letโ€™s further assume that this cache isnโ€™t local on the computer requiring the result. If thatโ€™s the case, accessing the cache to even find out if the result of your problem exists costs you quite a lot and you have to consider whether itโ€™ll be worth your while to try and access the cache, get a cache miss and compute the result yourself. Considering the amount of storage required to store everything and the time it would take to send the problem set, access the data and send the results back, most problems arenโ€™t good candidates for such a universal cache. Today.

However, storage space is becoming cheaper and network speed is constantly rising, so more and more problems will benefit from a universal cache service like that. Creating such a service isnโ€™t too hard. It can be implemented, as an example, using Amazonโ€™s Simple Storage Service (S3) for results storage. The universal cache can be self learning โ€“ for example, if there is a high percentage of cache misses, enlarge cache storage space automatically. In addition, you can use Amazonโ€™s Elastic Computing (EC2) for computing the cache misses instead of performing the calculations yourself. So in the case of high percentage of cache misses, you may choose to add more computers to the grid.