The persist decorator

The most important feature of pypersist is the persist decorator. As shown in the Examples section, you can use it by simply writing @persist above any function you wish to memoise.

persist can be used without any arguments, and its functionality will use use sane, conservative defaults. However, it can be customised in various ways using optional arguments, as follows.

pypersist.persist(func=None, cache='file://persist/', funcname=None, key=None, storekey=False, pickle=<function pickle>, unpickle=<function unpickle>, hash=<function hash>, unhash=None, metadata=None, verbosity=1)[source]

Function decorator for persistent memoisation

Store the output of a function permanently, and use previously stored results instead of recomputing them.

To use this, decorate the desired function with @persist. Or to customise the way this memoisation is done, decorate with @persist(<args>) and specify custom parameters.

You can even use this decorator for methods in a class. However, since it may be difficult to pickle a class instance, you may wish to specify a custom key function.

Parameters:
  • cache (str, optional) – The address of the cache in which the outputs of this function should be stored. If it starts with “file://”, then the remainder of the string should be a path to the directory on the local file system in which the results will be stored; this may be a relative path, and the directory will be created if it does not exist. If it starts with “mongodb://” then the remainder of the string should be the URL of the pypersist MongoDB server in which the results will be stored. If it does not contain “://” then “file://” will be added at the beginning. Default is “file://persist”.
  • funcname (str, optional) – A string that uniquely describes this function. If the same cache is used for several memoised functions, they should all have different funcname values. Default is the name of the function.
  • key (function(args -> object), optional) – Function that takes the arguments given to the memoised function, and returns a key that uniquely identifies those arguments. Two sets of arguments should have the same key only if they produce the same output when passed into the memoised function. Default returns a sorted tuple describing the arguments along with their names.
  • storekey (bool, optional) – Whether to store the key along with the output when a result is stored. If True, the key will be checked when recalling a previously computed value, to check for hash collisions. If False, two keys will produce the same output whenever their hash values are the same. Default is False.
  • pickle (function(object -> str), optional) – Function that converts the output of the function to a string for storage. Should be the inverse of unpickle. If storekey is true, this will also be used to store the key, and should do so without newline characters. Default uses the pickle module and base 64 encoding.
  • unpickle (function(str -> object), optional) – Function that converts a string back to an object when retrieving a computed value from storage. Should be the inverse of pickle. If storekey is true, this will also be used to retrieve the key. Default uses the pickle module and base 64 encoding.
  • hash (function(object -> str), optional) – Function that takes a key and produces a string that will be used to identify that key. If this function is not injective, then storekey can be set to True to check for hash collisions. The string should only contain characters safe for filenames. Default uses SHA-256 and base 64 encoding, which has an extremely small chance of collision.
  • unhash (function(str -> object), optional) – Function that, if specified, should be the inverse of hash. If this is specified, it may be used whenever the keys of cache are requested. Default is None.
  • metadata (function( -> str), optional) – Function that takes no arguments and returns a string containing metadata to be stored with the result currently being written. This might include the current time, or some data identifying the user or system that ran the computation.
  • verbosity (int, optional) – What level of verbosity to output when running. If 0, nothing will be printed. If 1, prints only when something goes wrong. If 2, also prints when writing to files or clearing the cache. If 3, also prints when reading from the cache. If 4, it includes all the above with more details. Defaults to 1.
Variables:

cache (diskcache.Cache or mongodb.Cache) – Dictionary-like object that allows keys to be looked up and, if present, gives the previously computed value. Values can be added and removed using the syntax func.cache[key] = val and del func.cache[key]. If storekey is True or unhash is specified, this implements the collections.abc.MutableMapping abstract base class and we can iterate over its keys using for key in func.cache.

Examples

Simple persistence using default settings:

>>> @persist
... def double(x):
...     return 2 * x
>>> double(3)
6
>>> double(3)
6
>>> double.cache[(("x", 3),)]
6

Custom persistence using a simpler key, a descriptive filename, and writing human-readable files:

>>> @persist(key=lambda x,y: (x,y),
...          hash=lambda k: "%s_to_the_power_of_%s" % k,
...          pickle=str,
...          unpickle=int)
... def power(x, y):
...     return x ** y
>>> power(2,4)
16
>>> power(10,3)
1000
>>> power.cache[(2, 4)]
16

Persistence of a method inside a class. We specify a key function that characterises the relevant parts of the A object, since it can be difficult to pickle class instances:

>>> class A:
...     def __init__(self, x):
...         self.x = x
...     @persist(key=lambda self, a: (self.x, a))
...     def this_plus_number(self, a):
...         return self.x + a
>>> a = A(5)
>>> a.this_plus_number(10)
15
>>> a.this_plus_number.cache[(5, 10)]
15
>>> A.this_plus_number.cache[(5, 10)]
15

The default arguments used by persist rely on code in the following modules:

Argument Module
cache pypersist.diskcache
key pypersist.preprocessing
pickle, unpickle pypersist.pickling
hash pypersist.hashing