I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
-
295When the second sentence of the relevant wikipedia article contains the phrase "mutually-recursive descent parsing[1] in a general top-down parsing algorithm[2][3] that accommodates ambiguity and left recursion in polynomial time and space," I think it is entirely appropriate to ask SO what is going on.Clueless– Clueless2010-01-02 14:17:35 +00:00Commented Jan 2, 2010 at 14:17
-
14@Clueless: That phrase is preceded by "Memoization has also been used in other contexts (and for purposes other than speed gains), such as in". So it's just a list of examples (and need not be understood); it's not part of the explanation of memoization.ShreevatsaR– ShreevatsaR2014-04-04 06:12:03 +00:00Commented Apr 4, 2014 at 6:12
-
3New link to pdf file, since pycogsci.info is down: people.ucsc.edu/~abrsvn/NLTK_parsing_demos.pdfStefan Gruenwald– Stefan Gruenwald2014-12-05 20:08:43 +00:00Commented Dec 5, 2014 at 20:08
-
1seeing how many people answered and are still answering this question makes be a believer in the "BIKE SHED EFFECT" en.wikipedia.org/wiki/Law_of_trivialityA_P– A_P2019-01-05 19:41:37 +00:00Commented Jan 5, 2019 at 19:41
-
1@A_P actually, at the time you wrote that, all but one of the 13 answers were 5 years old (2014), and the most recent one 3 years old (2016). Not sure that counts as "are still answering". The answer I posted just now adds speed considerations that I didn't see in other answers yet and does not implement a new method or anything. There are certainly examples of the phenomenon you're describing but I'm not sure this is it.Luc– Luc2022-03-23 11:25:10 +00:00Commented Mar 23, 2022 at 11:25
15 Answers
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
@Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
19 Comments
factorial_memo, because the factorial inside def factorial still calls the old unmemoize factorial.if k not in factorial_memo:, which reads better than if not k in factorial_memo:.args is a tuple. def some_function(*args) makes args a tuple.functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a function called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
@functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
@functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
9 Comments
fib is called, it will need to recur down to the base case before memoization can happen. So, your behavior is just about expected.functools.cache which is (in cpython at least) a wrapper for lru_cache(maxsize=None) but with a shorter name.The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
@memoise
def actual_function(arg1, arg2):
#body
Comments
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
@wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
@memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
4 Comments
functools.wraps.memo so that memory is freed?Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
1 Comment
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Comments
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
1 Comment
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Comments
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Comments
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
5 Comments
list argument of [1, 2, 3] can mistakenly be considered the same as a different set argument with a value of {1, 2, 3}. In addition, sets are unordered like dictionaries, so they would also need to be sorted(). Also note that a recursive data structure argument would cause an infinite loop.lists and sets are "tupleized" into the same thing and therefore become indistinguishable from one another. The example code for adding support for sets described in your latest update doesn't avoid that I'm afraid. This can easily be seen by separately passing [1,2,3] and {1,2,3} as an argument to a "memoize"d test function and seeing whether it's called twice, as it should be, or not.lists and dicts because it's possible for a list to have exactly the same thing in it that resulted from calling make_tuple(sorted(x.items())) for a dictionary. A simple solution for both cases would be to include the type() of value in the tuple generated. I can think of an even simpler way specifically to handle sets, but it doesn't generalize.Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
@functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Comments
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
1 Comment
If speed is a consideration:
@functools.cacheand@functools.lru_cache(maxsize=None)are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system- a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
- a
self.cacheclass variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).
1 Comment
Suppose you know you have a recursive function and it has multiple recursive paths. And suppose the parameters are i and j. And suppose function name is func.
mem = [][]
def func(i,j):
#basic return value code without function calling itself
if something1:
return func(i+1,j) or func(i,j+2)
elif something2:
return func(i+1,j+1)
else:
return func(i,j+2)
Recursive paths.
func(i+1,j)
func(i+1,j+1)
func(i,j+2)
You can see that if you go through 1 st path twice and 3rd path once you come to a situation where current i and j values equals to previous respective values +2. (i+2,j+2).
Then you can also see that if you go through 2nd path twice it leads to the same situation as i+2 and j+2.
You can get the basic idea that this scenario will lead to calculation of function for same parameters twice.
And we can't guarantee that which takes first place.
So you can make a list or some sort of storing mechanism to store that calculation. like mem[i,j] = calculated value if mem[i,j] = none or false or something. You get the idea.
This is my general idea of memoization. May be you can see some sort of patterns in you code other than this. And there may be situations you can guarantee that which paths takes place first according to your logic. So you can guess that there will be situations where this will not occur even for multiple recursive calls. And also there may be different ways ,recursive paths to have same parameters like the above situation(in this case integer addition)