py2store.filesys

Forwards to dol.filesys:

File system access

py2store.misc

Functions to read from and write to misc sources

class py2store.misc.MiscGetter(store=<py2store.persisters.local_files.PathFormatPersister object>, incoming_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function <lambda>>, '.gz': <function decompress>, '.gzip': <function decompress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>, '.zip': <class 'py2store.slib.s_zipfile.FilesOfZip'>}, dflt_incoming_val_trans=<function identity_method>, func_key=<function MiscGetter.<lambda>>)[source]

An object to write (and only write) to a store (default local files) with automatic deserialization according to a property of the key (default: file extension).

>>> from py2store.misc import get_obj, misc_objs_get
>>> import os
>>> import json
>>>
>>> pjoin = lambda *p: os.path.join(os.path.expanduser('~'), *p)
>>> path = pjoin('tmp.json')
>>> d = {'a': {'b': {'c': [1, 2, 3]}}}
>>> json.dump(d, open(path, 'w'))  # putting a json file there, the normal way, so we can use it later
>>>
>>> k = path
>>> t = get_obj(k)  # if you'd like to use a function
>>> assert t == d
>>> tt = misc_objs_get[k]  # if you'd like to use an object (note: can get, but nothing else (no list, set, del, etc))
>>> assert tt == d
>>> t
{'a': {'b': {'c': [1, 2, 3]}}}
class py2store.misc.MiscGetterAndSetter(store=<py2store.persisters.local_files.PathFormatPersister object>, incoming_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function <lambda>>, '.gz': <function decompress>, '.gzip': <function decompress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>, '.zip': <class 'py2store.slib.s_zipfile.FilesOfZip'>}, outgoing_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function csv_fileobj>, '.gz': <function compress>, '.gzip': <function compress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>}, dflt_incoming_val_trans=<function identity_method>, func_key=<function MiscGetterAndSetter.<lambda>>)[source]

An object to read and write (and nothing else) to a store (default local) with automatic (de)serialization according to a property of the key (default: file extension).

>>> from py2store.misc import set_obj, misc_objs  # the function and the object
>>> import json
>>> import os
>>>
>>> pjoin = lambda *p: os.path.join(os.path.expanduser('~'), *p)
>>>
>>> d = {'a': {'b': {'c': [1, 2, 3]}}}
>>> misc_objs[pjoin('tmp.json')] = d
>>> filepath = os.path.expanduser('~/tmp.json')
>>> assert misc_objs[filepath] == d  # yep, it's there, and can be retrieved
>>> assert json.load(open(filepath)) == d  # in case you don't believe it's an actual json file
>>>
>>> # using pickle
>>> misc_objs[pjoin('tmp.pkl')] = d
>>> assert misc_objs[pjoin('tmp.pkl')] == d
>>>
>>> # using txt
>>> misc_objs[pjoin('tmp.txt')] = 'hello world!'
>>> assert misc_objs[pjoin('tmp.txt')] == 'hello world!'
>>>
>>> # using csv
>>> misc_objs[pjoin('tmp.csv')] = [[1,2,3], ['a','b','c']]
>>> assert misc_objs[pjoin('tmp.csv')] == [['1','2','3'], ['a','b','c']]  # yeah, well, not numbers, but you deal with it
>>>
>>> # using bin
... misc_objs[pjoin('tmp.bin')] = b'let us pretend these are bytes of an audio waveform'
>>> assert misc_objs[pjoin('tmp.bin')] == b'let us pretend these are bytes of an audio waveform'
class py2store.misc.MiscReaderMixin(incoming_val_trans_for_key=None, dflt_incoming_val_trans=None, func_key=None)[source]

Mixin to transform incoming vals according to the key their under. Warning: If used as a subclass, this mixin should (in general) be placed before the store

>>> # make a reader that will wrap a dict
>>> class MiscReader(MiscReaderMixin, dict):
...     def __init__(self, d,
...                         incoming_val_trans_for_key=None,
...                         dflt_incoming_val_trans=None,
...                         func_key=None):
...         dict.__init__(self, d)
...         MiscReaderMixin.__init__(self, incoming_val_trans_for_key, dflt_incoming_val_trans, func_key)
...
>>>
>>> incoming_val_trans_for_key = dict(
...     MiscReaderMixin._incoming_val_trans_for_key,  # take the existing defaults...
...     **{'.bin': lambda v: [ord(x) for x in v.decode()], # ... override how to handle the .bin extension
...      '.reverse_this': lambda v: v[::-1]  # add a new extension (and how to handle it)
...     })
>>>
>>> import pickle
>>> d = {
...     'a.bin': b'abc123',
...     'a.reverse_this': b'abc123',
...     'a.csv': b'event,year\n Magna Carta,1215\n Guido,1956',
...     'a.txt': b'this is not a text',
...     'a.pkl': pickle.dumps(['text', [str, map], {'a list': [1, 2, 3]}]),
...     'a.json': '{"str": "field", "int": 42, "float": 3.14, "array": [1, 2], "nested": {"a": 1, "b": 2}}',
... }
>>>
>>> s = MiscReader(d=d, incoming_val_trans_for_key=incoming_val_trans_for_key)
>>> list(s)
['a.bin', 'a.reverse_this', 'a.csv', 'a.txt', 'a.pkl', 'a.json']
>>> s['a.bin']
[97, 98, 99, 49, 50, 51]
>>> s['a.reverse_this']
b'321cba'
>>> s['a.csv']
[['event', 'year'], [' Magna Carta', '1215'], [' Guido', '1956']]
>>> s['a.pkl']
['text', [<class 'str'>, <class 'map'>], {'a list': [1, 2, 3]}]
>>> s['a.json']
{'str': 'field', 'int': 42, 'float': 3.14, 'array': [1, 2], 'nested': {'a': 1, 'b': 2}}
class py2store.misc.MiscStoreMixin(incoming_val_trans_for_key=None, outgoing_val_trans_for_key=None, dflt_incoming_val_trans=None, dflt_outgoing_val_trans=None, func_key=None)[source]

Mixin to transform incoming and outgoing vals according to the key their under. Warning: If used as a subclass, this mixin should (in general) be placed before the store

See also: preset and postget args from wrap_kvs decorator from py2store.trans.

>>> # Make a class to wrap a dict with a layer that transforms written and read values
>>> class MiscStore(MiscStoreMixin, dict):
...     def __init__(self, d,
...                         incoming_val_trans_for_key=None, outgoing_val_trans_for_key=None,
...                         dflt_incoming_val_trans=None, dflt_outgoing_val_trans=None,
...                         func_key=None):
...         dict.__init__(self, d)
...         MiscStoreMixin.__init__(self, incoming_val_trans_for_key, outgoing_val_trans_for_key,
...                                 dflt_incoming_val_trans, dflt_outgoing_val_trans, func_key)
...
>>>
>>> outgoing_val_trans_for_key = dict(
...     MiscStoreMixin._outgoing_val_trans_for_key,  # take the existing defaults...
...     **{'.bin': lambda v: ''.join([chr(x) for x in v]).encode(), # ... override how to handle the .bin extension
...        '.reverse_this': lambda v: v[::-1]  # add a new extension (and how to handle it)
...     })
>>> ss = MiscStore(d={},  # store starts empty
...                incoming_val_trans_for_key={},  # overriding incoming trans so we can see the raw data later
...                outgoing_val_trans_for_key=outgoing_val_trans_for_key)
...
>>> # here's what we're going to write in the store
>>> data_to_write = {
...      'a.bin': [97, 98, 99, 49, 50, 51],
...      'a.reverse_this': b'321cba',
...      'a.csv': [['event', 'year'], [' Magna Carta', '1215'], [' Guido', '1956']],
...      'a.txt': 'this is not a text',
...      'a.pkl': ['text', [str, map], {'a list': [1, 2, 3]}],
...      'a.json': {'str': 'field', 'int': 42, 'float': 3.14, 'array': [1, 2], 'nested': {'a': 1, 'b': 2}}}
>>> # write this data in our store
>>> for k, v in data_to_write.items():
...     ss[k] = v
>>> list(ss)
['a.bin', 'a.reverse_this', 'a.csv', 'a.txt', 'a.pkl', 'a.json']
>>> # Looking at the contents (what was actually stored/written)
>>> for k, v in ss.items():
...     if k != 'a.pkl':
...         print(f"{k}: {v}")
...     else:  # need to verify pickle data differently, since printing contents is problematic in doctest
...         assert pickle.loads(v) == data_to_write['a.pkl']
a.bin: b'abc123'
a.reverse_this: b'abc123'
a.csv: b'event,year\r\n Magna Carta,1215\r\n Guido,1956\r\n'
a.txt: b'this is not a text'
a.json: b'{"str": "field", "int": 42, "float": 3.14, "array": [1, 2], "nested": {"a": 1, "b": 2}}'
py2store.misc.get_obj(k, store=<py2store.persisters.local_files.PathFormatPersister object>, incoming_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function <lambda>>, '.gz': <function decompress>, '.gzip': <function decompress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>, '.zip': <class 'py2store.slib.s_zipfile.FilesOfZip'>}, dflt_incoming_val_trans=<function identity_method>, func_key=<function <lambda>>)[source]

A quick way to get an object, with default… everything (but the key, you know, a clue of what you want)

py2store.misc.set_obj(k, v, store=<py2store.persisters.local_files.PathFormatPersister object>, outgoing_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function csv_fileobj>, '.gz': <function compress>, '.gzip': <function compress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>}, func_key=<function <lambda>>)[source]

A quick way to get an object, with default… everything (but the key, you know, a clue of what you want)

py2store.mixins

Forwards to dol.mixins:

Mixins

py2store.test.util

utils for testing

py2store.test.util.random_dict_gen(fields=('a', 'b', 'c'), word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n: int = 100)[source]

Random dict (of strings) generator

Parameters
  • fields – Field names for the random dicts

  • word_size_range – An int, 2-tuple of ints, or list-like object that defines the choices of word sizes

  • alphabet – A string or iterable defining the alphabet to draw from

  • n – The number of elements the generator will yield

Returns

Random dict (of strings) generator

py2store.test.util.random_formatted_str_gen(format_string='root/{}/{}_{}.test', word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n=100)[source]

Random formatted string generator

Parameters
  • format_string – A format string

  • word_size_range – An int, 2-tuple of ints, or list-like object that defines the choices of word sizes

  • alphabet – A string or iterable defining the alphabet to draw from

  • n – The number of elements the generator will yield

Returns

Yields random strings of the format defined by format_string

Examples

# >>> list(random_formatted_str_gen(‘root/{}/{}_{}.test’, (2, 5), ‘abc’, n=5)) [(‘root/acba/bb_abc.test’,),

(‘root/abcb/cbbc_ca.test’,), (‘root/ac/ac_cc.test’,), (‘root/aacc/ccbb_ab.test’,), (‘root/aab/abb_cbab.test’,)]

>>> # The following will be made not random (by restricting the constraints to "no choice"
>>> # ... this is so that we get consistent outputs to assert for the doc test.
>>>
>>> # Example with automatic specification
>>> list(random_formatted_str_gen('root/{}/{}_{}.test', (3, 4), 'a', n=2))
[('root/aaa/aaa_aaa.test',), ('root/aaa/aaa_aaa.test',)]
>>>
>>> # Example with manual specification
>>> list(random_formatted_str_gen('indexed field: {0}: named field: {name}', (2, 3), 'z', n=1))
[('indexed field: zz: named field: zz',)]
py2store.test.util.random_string(length=7, alphabet='abcdefghijklmnopqrstuvwxyz')[source]

Same as random_word, but it optimized for strings (5-10% faster for words of length 7, 25-30% faster for words of size 1000)

py2store.test.util.random_tuple_gen(tuple_length=3, word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n: int = 100)[source]

Random tuple (of strings) generator

Parameters
  • tuple_length – The length of the tuples generated

  • word_size_range – An int, 2-tuple of ints, or list-like object that defines the choices of word sizes

  • alphabet – A string or iterable defining the alphabet to draw from

  • n – The number of elements the generator will yield

Returns

Random tuple (of strings) generator

py2store.test.util.random_word(length, alphabet, concat_func=<built-in function add>)[source]

Make a random word by concatenating randomly drawn elements from alphabet together :param length: Length of the word :param alphabet: Alphabet to draw from :param concat_func: The concatenation function (e.g. + for strings and lists)

Note: Repeated elements in alphabet will have more chances of being drawn.

Returns

A word (whose type depends on what concatenating elements from alphabet produces).

Not making this a proper doctest because I don’t know how to seed the global random temporarily >>> t = random_word(4, ‘abcde’); # e.g. ‘acae’ >>> t = random_word(5, [‘a’, ‘b’, ‘c’]); # e.g. ‘cabba’ >>> t = random_word(4, [[1, 2, 3], [40, 50], [600], [7000]]); # e.g. [40, 50, 7000, 7000, 1, 2, 3] >>> t = random_word(4, [1, 2, 3, 4]); # e.g. 13 (because adding numbers…) >>> # … sometimes it’s what you want: >>> t = random_word(4, [2 ** x for x in range(8)]); # e.g. 105 (binary combination) >>> t = random_word(4, [1, 2, 3, 4], concat_func=lambda x, y: str(x) + str(y)); # e.g. ‘4213’ >>> t = random_word(4, [1, 2, 3, 4], concat_func=lambda x, y: int(str(x) + str(y))); # e.g. 3432

py2store.test.util.random_word_gen(word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n=100)[source]

Random string generator :param word_size_range: An int, 2-tuple of ints, or list-like object that defines the choices of word sizes :param alphabet: A string or iterable defining the alphabet to draw from :param n: The number of elements the generator will yield

Returns

Random string generator

py2store.test.quick

py2store.test

test files

py2store.test.simple

py2store.test.scrap

scrap code

py2store.util

Forwards to dol.util:

General util objects

py2store.ext.docx

Simple access to docx (Word Doc) elements.

py2store.ext.gitlab

Stores to talk to gitlab, using requests.

Example: ``` ogl = GitLabAccessor(base_url=”http://…”, project_name=None)

print(ogl.get_project_names()) # prints all project names ogl.set_project(“PROJECT_NAME”) # sets the project to “PROJECT_NAME” print(

ogl.get_branch_names()

) # gets the branch names of current project (as set previously) print(

ogl.get_branch(“master”)

) # gets a json of information about the master branch of current project. ```

py2store.ext.hdf

a data object layer for HDF files

py2store.ext

py2store Extensions, Add-ons, etc. We kept py2store purely dependency-less, using only built-ins for everything but storage system connectors.

That said, in order to provide the user with more power, and show him/her how py2store tools can be used to build powerful data accessors, we provide specialized modules that do require more than builtins. These dependencies are not listed in the setup.py module, but we wrap their imports with informative ImportError handlers.

py2store.ext.matlab

a data object layer for matlab

py2store.ext.kaggle

py2store.ext.module_imports

py2store.ext.audio

py2store.ext.github

a data object layer for github

py2store.ext.dataframes

Data as pandas.DataFrame from various sources

py2store.access

Utils to load stores from store specifications. Includes the logic to allow configurations (and defaults) to be parametrized by external environmental variables and files.

Every data-sourced problem has it’s problem-relevant stores. Once you get your stores right, along with the right access credentials, indexing, serialization, caching, filtering etc. you’d like to be able to name, save and/or share this specification, and easily get access to it later on.

Here are tools to help you out.

There are two main key-value stores: One for configurations the user wants to reuse, and the other for the user’s desired defaults. Both have the same structure:

  • first level key: Name of the resource (should be a valid python variable name)

  • The reminder is more or less free form (until the day we lay out some schemas for this)

The system will look for the specification of user_configs and user_defaults in a json file. The filepath to this json file can specified in environment variables

PY2STORE_CONFIGS_JSON_FILEPATH and PY2STORE_DEFAULTS_JSON_FILEPATH

respectively. By default, they are:

~/.py2store_configs.json and ~/.py2store_defaults.json

respectively.

py2store.access.compose(*functions)[source]

Make a function that is the composition of the input functions

py2store.access.dflt_func_loader(f) → callable[source]

Loads and returns the function referenced by f, which could be a callable or a DOTPATH_TO_MODULE.FUNC_NAME dotpath string to one, or a pipeline of these

py2store.access.dotpath_to_func(f: (<class 'str'>, <built-in function callable>)) → callable[source]

Loads and returns the function referenced by f, which could be a callable or a DOTPATH_TO_MODULE.FUNC_NAME dotpath string to one.

py2store.access.dotpath_to_obj(dotpath)[source]

Loads and returns the object referenced by the string DOTPATH_TO_MODULE.OBJ_NAME

py2store.access.fakit(fak, func_loader=<function dflt_func_loader>)[source]

Execute a fak with given f, a, k and function loader.

Essentially returns func_loader(f)(*a, **k)

Parameters
  • fak – A (f, a, k) specification. Could be a tuple or a dict (with ‘f’, ‘a’, ‘k’ keys). All but f are optional.

  • func_loader – A function returning a function. This is where you specify any validation of func specification f, and/or how to get a callable from it.

Returns: A python object.

py2store.access.getenv(name, default=None)[source]

Like os.getenv, but removes a suffix r character if present (problem with some env var systems)

py2store.__init__

Your portal to many Data Object Layer goodies

py2store.__init__.ihead(store, n=1)[source]

Get the first item of an iterable, or a list of the first n items

py2store.__init__.kvhead(store, n=1)[source]

Get the first item of a kv store, or a list of the first n items

py2store.stores.s3_store

Forwards to s3dol.s3_store

py2store.stores.delegation_stores

py2store.stores.sql_w_sqlalchemy

Forwards to sqldol

py2store.stores.arangodb_store

py2store.stores.dropbox_store

Forwards to dropboxdol

py2store.stores.local_store

stores to operate on local files

class py2store.stores.local_store.AutoMkDirsOnSetitemMixin[source]

A mixin that will automatically create directories on setitem, when missing.

class py2store.stores.local_store.AutoMkPathformatMixin(path_format=None, max_levels=None)[source]

A mixin that will choose a path_format if none given

class py2store.stores.local_store.DirStore(rootdir)[source]

A store for local directories. Keys are directory names and values are subdirectory DirStores.

>>> from py2store import __file__
>>> import os
>>> root = os.path.dirname(__file__)
>>> s = DirStore(root)
>>> assert set(s).issuperset({'stores', 'persisters', 'serializers', 'key_mappers'})
class py2store.stores.local_store.LocalBinaryStore(path_format, max_levels=None)[source]

Local files store for binary data

class py2store.stores.local_store.LocalJsonStore(path_format, max_levels=None)[source]

Local files store for text dataData is assumed to be a JSON string, and is loaded with json.loads and dumped with json.dumps

class py2store.stores.local_store.LocalPickleStore(path_format, max_levels=None, fix_imports=True, protocol=None, pickle_encoding='ASCII', pickle_errors='strict', **open_kwargs)[source]

Local files store with pickle serialization

py2store.stores.local_store.LocalStore

alias of py2store.stores.local_store.QuickPickleStore

class py2store.stores.local_store.LocalTextStore(path_format, max_levels=None)[source]

Local files store for text data

class py2store.stores.local_store.MakeMissingDirsStoreMixin[source]

Will make a local file store automatically create the directories needed to create a file. Should be placed before the concrete perisister in the mro but in such a manner so that it receives full paths.

class py2store.stores.local_store.PathFormatStore(path_format, max_levels: int = inf, mode='', **open_kwargs)[source]

Local file store using templated relative paths.

>>> from tempfile import gettempdir
>>> import os
>>>
>>> def write_to_key(fullpath_of_relative_path, relative_path, content):  # a function to write content in files
...    with open(fullpath_of_relative_path(relative_path), 'w') as fp:
...        fp.write(content)
>>>
>>> # Preparation: Make a temporary rootdir and write two files in it
>>> rootdir = os.path.join(gettempdir(), 'path_format_store_test' + os.sep)
>>> if not os.path.isdir(rootdir):
...     os.mkdir(rootdir)
>>> # recreate directory (remove existing files, delete directory, and re-create it)
>>> for f in os.listdir(rootdir):
...     fullpath = os.path.join(rootdir, f)
...     if os.path.isfile(fullpath):
...         os.remove(os.path.join(rootdir, f))
>>> if os.path.isdir(rootdir):
...     os.rmdir(rootdir)
>>> if not os.path.isdir(rootdir):
...    os.mkdir(rootdir)
>>>
>>> filepath_of = lambda p: os.path.join(rootdir, p)  # a function to get a fullpath from a relative one
>>> # and make two files in this new dir, with some content
>>> write_to_key(filepath_of, 'a', 'foo')
>>> write_to_key(filepath_of, 'b', 'bar')
>>>
>>> # point the obj source to the rootdir
>>> s = PathFormatStore(path_format=rootdir)
>>>
>>> # assert things...
>>> assert s._prefix == rootdir  # the _rootdir is the one given in constructor
>>> assert s[filepath_of('a')] == 'foo'  # (the filepath for) 'a' contains 'foo'
>>>
>>> # two files under rootdir (as long as the OS didn't create it's own under the hood)
>>> len(s)
2
>>> assert list(s) == [filepath_of('a'), filepath_of('b')]  # there's two files in s
>>> filepath_of('a') in s  # rootdir/a is in s
True
>>> filepath_of('not_there') in s  # rootdir/not_there is not in s
False
>>> filepath_of('not_there') not in s  # rootdir/not_there is not in s
True
>>> assert list(s.keys()) == [filepath_of('a'), filepath_of('b')]  # the keys (filepaths) of s
>>> sorted(list(s.values())) # the values of s (contents of files)
['bar', 'foo']
>>> assert list(s.items()) == [(filepath_of('a'), 'foo'), (filepath_of('b'), 'bar')]  # the (path, content) items
>>> assert s.get('this key is not there', None) is None  # trying to get the val of a non-existing key returns None
>>> s.get('this key is not there', 'some default value')  # ... or whatever you say
'some default value'
>>>
>>> # add more files to the same folder
>>> write_to_key(filepath_of, 'this.txt', 'this')
>>> write_to_key(filepath_of, 'that.txt', 'blah')
>>> write_to_key(filepath_of, 'the_other.txt', 'bloo')
>>> # see that you now have 5 files
>>> len(s)
5
>>> # and these files contain values:
>>> sorted(s.values())
['bar', 'blah', 'bloo', 'foo', 'this']
>>>
>>> # but if we make an obj source to only take files whose extension is '.txt'...
>>> s = PathFormatStore(path_format=rootdir + '{}.txt')
>>>
>>> rootdir_2 = os.path.join(gettempdir(), 'obj_source_test_2') # get another rootdir
>>> if not os.path.isdir(rootdir_2):
...    os.mkdir(rootdir_2)
>>> filepath_of_2 = lambda p: os.path.join(rootdir_2, p)
>>> # and make two files in this new dir, with some content
>>> write_to_key(filepath_of, 'this.txt', 'this')
>>> write_to_key(filepath_of, 'that.txt', 'blah')
>>> write_to_key(filepath_of, 'the_other.txt', 'bloo')
>>>
>>> ss = PathFormatStore(path_format=rootdir_2 + '{}.txt')
>>>
>>> assert s != ss  # though pointing to identical content, o and oo are not equal since the paths are not equal!
class py2store.stores.local_store.PathFormatStoreWithPrefix(*args, **kwargs)[source]
py2store.stores.local_store.PickleStore

alias of py2store.stores.local_store.LocalPickleStore

class py2store.stores.local_store.QuickBinaryStore(path_format=None, max_levels=None)[source]

Local files store for binary data with default temp root and auto dir generation on write.

class py2store.stores.local_store.QuickJsonStore(path_format=None, max_levels=None)[source]

Local files store for text data with default temp root and auto dir generation on write.Data is assumed to be a JSON string, and is loaded with json.loads and dumped with json.dumps

class py2store.stores.local_store.QuickLocalStoreMixin(path_format=None, max_levels=None)[source]

A mixin that will choose a path_format if none given, and will automatically create directories on setitem, when missing.

class py2store.stores.local_store.QuickPickleStore(path_format=None, max_levels=None)[source]

Local files store with pickle serialization with default temp root and auto dir generation on write.

py2store.stores.local_store.QuickStore

alias of py2store.stores.local_store.QuickPickleStore

class py2store.stores.local_store.QuickTextStore(path_format=None, max_levels=None)[source]

Local files store for text data with default temp root and auto dir generation on write.

class py2store.stores.local_store.RelativeDirPathFormatKeys(*args, **kwargs)[source]
class py2store.stores.local_store.RelativePathFormatStore2(*args, **kwargs)[source]

py2store.stores

a package of various stores

py2store.stores.couchdb_store

py2store.stores.mongo_store

py2store.core

Forwards to dol.core:

Core tools

py2store.utils.uri_utils

utils to work with URIs

py2store.utils.uri_utils.build_uri(scheme, database='', username=None, password=None, host='localhost', port=None)[source]

Reverse of parse_uri function. Builds a URI string from provided params.

py2store.utils.uri_utils.parse_uri(uri)[source]

Parses DB URI string into a dict of params. :param uri: string formatted as: “scheme://username:password@host:port/database” :return: a dict with these params parsed.

py2store.utils.explicit

utils to make stores based on a the input data itself

class py2store.utils.explicit.ExplicitKeymapReader(store, key_of_id=None, id_of_key=None)[source]

Wrap a store (instance) so that it gets it’s keys from an explicit iterable of keys.

>>> s = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> id_of_key = {'A': 'a', 'C': 'c'}
>>> ss = ExplicitKeymapReader(s, id_of_key=id_of_key)
>>> list(ss)
['A', 'C']
>>> ss['C']  # will look up 'C', find 'c', and call the store on that.
3
class py2store.utils.explicit.ExplicitKeys(key_collection: Collection)[source]

py2store.base.Keys implementation that gets it’s keys explicitly from a collection given at initialization time. The key_collection must be a collections.abc.Collection (such as list, tuple, set, etc.)

>>> keys = ExplicitKeys(key_collection=['foo', 'bar', 'alice'])
>>> 'foo' in keys
True
>>> 'not there' in keys
False
>>> list(keys)
['foo', 'bar', 'alice']
class py2store.utils.explicit.ExplicitKeysSource(key_collection: Collection, _obj_of_key: Callable)[source]

An object source that uses an explicit keys collection and a specified function to read contents for a key.

class py2store.utils.explicit.ExplicitKeysStore(store, key_collection)[source]

Wrap a store (instance) so that it gets it’s keys from an explicit iterable of keys.

>>> s = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> list(s)
['a', 'b', 'c', 'd']
>>> ss = ExplicitKeysStore(s, ['d', 'a'])
>>> len(ss)
2
>>> list(ss)
['d', 'a']
>>> list(ss.values())
[4, 1]
>>> ss.head()
('d', 4)
class py2store.utils.explicit.ExplicitKeysWithPrefixRelativization(key_collection, _prefix=None)[source]

py2store.base.Keys implementation that gets it’s keys explicitly from a collection given at initialization time. The key_collection must be a collections.abc.Collection (such as list, tuple, set, etc.)

>>> from py2store.base import Store
>>> s = ExplicitKeysWithPrefixRelativization(key_collection=['/root/of/foo', '/root/of/bar', '/root/for/alice'])
>>> keys = Store(store=s)
>>> 'of/foo' in keys
True
>>> 'not there' in keys
False
>>> list(keys)
['of/foo', 'of/bar', 'for/alice']
class py2store.utils.explicit.ObjReader(_obj_of_key: Callable)[source]

A reader that uses a specified function to get the contents for a given key.

>>> # define a contents_of_key that reads stuff from a dict
>>> data = {'foo': 'bar', 42: "everything"}
>>> def read_dict(k):
...     return data[k]
>>> pr = ObjReader(_obj_of_key=read_dict)
>>> pr['foo']
'bar'
>>> pr[42]
'everything'
>>>
>>> # define contents_of_key that reads stuff from a file given it's path
>>> def read_file(path):
...     with open(path) as fp:
...         return fp.read()
>>> pr = ObjReader(_obj_of_key=read_file)
>>> file_where_this_code_is = __file__  # it should be THIS file you're reading right now!
>>> print(pr[file_where_this_code_is][62:155])  # print some characters of this file
from collections.abc import Mapping
from typing import Callable, Collection as CollectionType
py2store.utils.explicit.invertible_maps(mapping=None, inv_mapping=None)[source]

Returns two maps that are inverse of each other. Raises an AssertionError iif both maps are None, or if the maps are not inverse of each other

Get a pair of invertible maps >>> invertible_maps({1: 11, 2: 22}) ({1: 11, 2: 22}, {11: 1, 22: 2}) >>> invertible_maps(None, {11: 1, 22: 2}) ({1: 11, 2: 22}, {11: 1, 22: 2})

If two maps are given and invertible, you just get them back >>> invertible_maps({1: 11, 2: 22}, {11: 1, 22: 2}) ({1: 11, 2: 22}, {11: 1, 22: 2})

Or if they’re not invertible >>> invertible_maps({1: 11, 2: 22}, {11: 1, 22: ‘ha, not what you expected!’}) Traceback (most recent call last):

AssertionError: mapping and inv_mapping are not inverse of each other!

>>> invertible_maps(None, None)
Traceback (most recent call last):
  ...
ValueError: You need to specify one or both maps

py2store.utils.timeseries_caching

Tools to cache time-series data.

class py2store.utils.timeseries_caching.RegularTimeseriesCache(data_rate=1, time_rate=1, maxlen=None)[source]

A type that pretends to be a (possibly very large) list, but where contents of the list are populated as they are needed. Further, the indexing of the list can be overwritten for the convenience of the user.

The canonical application is where we have segments of continuous waveform indexed by utc microseconds timestamps.

It is convenient to be able to read segments of this waveform as if it was one big waveform (handling the discontinuities gracefully), and have the choice of using (relative or absolute) integer indices or utc indices.

py2store.utils.attr_dict.py.attr_dict

py2store.utils.attr_dict.py

py2store.utils.cumul_aggreg_write

utils for bulk writing – accumulate, aggregate and write when some condition is met

class py2store.utils.cumul_aggreg_write.CumulAggregWrite(store, cache_to_kv=<function mk_kv_from_keygen.<locals>.aggregate>, mk_cache=<class 'list'>)[source]
class py2store.utils.cumul_aggreg_write.CumulAggregWriteKvItems(store)[source]
class py2store.utils.cumul_aggreg_write.CumulAggregWriteWithAutoFlush(store, cache_to_kv=<function mk_kv_from_keygen.<locals>.aggregate>, mk_cache=<class 'list'>, flush_cache_condition=<function condition_flush_on_every_write>)[source]
py2store.utils.cumul_aggreg_write.condition_flush_on_every_write(cache)[source]

Boolean function used as flush_cache_condition to anytime the cache is non-empty

py2store.utils.cumul_aggreg_write.mk_group_aggregator(item_to_kv, aggregator_op=<built-in function add>, initial=<py2store.utils.cumul_aggreg_write.NoInitial object>)[source]

Make a generator transforming function that will (a) make a key for each given item, (b) group all items according to the key

Parameters
  • item_to_kv

  • aggregator_op

  • initial

Returns:

>>> # Collect words (as a csv string), grouped by the lower case of the first letter
>>> ag = mk_group_aggregator(lambda item: (item[0].lower(), item),
...                          aggregator_op=lambda x, y: ', '.join([x, y]))
>>> list(ag(['apple', 'bananna', 'Airplane']))
[('a', 'apple, Airplane'), ('b', 'bananna')]
>>> # Collect (and concatinate)  characters according to their ascii value modulo 3
>>> ag = mk_group_aggregator(lambda item: (item['age'], item['thing']),
...                          aggregator_op=lambda x, y: x + [y],
...                          initial=[])
>>> list(ag([{'age': 0, 'thing': 'new'}, {'age': 42, 'thing': 'every'}, {'age': 0, 'thing': 'just born'}]))
[(0, ['new', 'just born']), (42, ['every'])]
py2store.utils.cumul_aggreg_write.mk_group_aggregator_with_key_func(item_to_key, aggregator_op=<built-in function add>, initial=<py2store.utils.cumul_aggreg_write.NoInitial object>)[source]

Make a generator transforming function that will (a) make a key for each given item, (b) group all items according to the key

Parameters
  • item_to_key – Function that takes an item of the generator and outputs the key that should be used to group items

  • aggregator_op – The aggregation binary function that is used to aggregate two items together. The function is used as is by the functools.reduce, applied to the sequence of items that were collected for a given group

  • initial – The “empty” element to start the reduce (aggregation) with, if necessary.

Returns:

>>> # Collect words (as a csv string), grouped by the lower case of the first letter
>>> ag = mk_group_aggregator_with_key_func(lambda item: item[0].lower(),
...                          aggregator_op=lambda x, y: ', '.join([x, y]))
>>> list(ag(['apple', 'bananna', 'Airplane']))
[('a', 'apple, Airplane'), ('b', 'bananna')]
>>>
>>> # Collect (and concatenate) characters according to their ascii value modulo 3
... ag = mk_group_aggregator_with_key_func(lambda item: (ord(item) % 3))
>>> list(ag('abcdefghijklmnop'))
[(1, 'adgjmp'), (2, 'behkn'), (0, 'cfilo')]
>>>
>>> # sum all even and odd number separately
... ag = mk_group_aggregator_with_key_func(lambda item: (item % 2))
>>> list(ag([1, 2, 3, 4, 5]))  # sum of evens is 6, and sum of odds is 9
[(1, 9), (0, 6)]
>>>
>>> # if we wanted to collect all odds and evens, we'd need a different aggregator and initial
... ag = mk_group_aggregator_with_key_func(lambda item: (item % 2), aggregator_op=lambda x, y: x + [y], initial=[])
>>> list(ag([1, 2, 3, 4, 5]))
[(1, [1, 3, 5]), (0, [2, 4])]

py2store.utils

general utils

py2store.utils.cache_descriptors

descriptors to cache data

py2store.utils.cache_descriptors.CachedProperty(*args)[source]

CachedProperties. This is usable directly as a decorator when given names, or when not. Any of these patterns will work: * @CachedProperty * @CachedProperty() * @CachedProperty('n','n2') * def thing(self: …; thing = CachedProperty(thing) * def thing(self: …; thing = CachedProperty(thing, ‘n’)

class py2store.utils.cache_descriptors.Lazy(func, name=None)[source]

Lazy Attributes.

class py2store.utils.cache_descriptors.cachedIn(attribute_name)[source]

Cached property with given cache attribute.

py2store.utils.appendable

utils to make add append and extend functionality to KV stores

py2store.utils.affine_conversion

utils to carry out affine transformations (of indices)

class py2store.utils.affine_conversion.AffineConverter(scale=1.0, offset=0.0)[source]

Getting a callable that will perform an affine conversion. Note, it does it as

(val - offset) * scale

(Note slope-intercept style (though there is the .from_slope_and_intercept constructor method for that)

Inverse is available through the inv method, performing:

val / scale + offset

>>> convert = AffineConverter(scale=0.5, offset=1)
>>> convert(0)
-0.5
>>> convert(10)
4.5
>>> convert.inv(4)
9.0
>>> convert.inv(4.5)
10.0
py2store.utils.affine_conversion.get_affine_converter_and_inverse(scale=1, offset=0, source_type_cast=None, target_type_cast=None)[source]
Getting two affine functions with given scale and offset, that are inverse of each other. Namely (for input val):

(val - offset) * scale and val / scale + offset

Note this is not “slope intercept” style!!

The source_type_cast and target_type_case (optional), allow the user to specify if these transformations need to be further cast to a given type. :param scale: :param offset: :param source_type_cast: function to apply to input :param target_type_cast: function to apply to output :return: Two single val functions: affine_converter, inverse_affine_converter

Note: Code is a lot more complex than the basic operations it performs. The reason was a worry of efficiency since the functions that are returned are intended to be used in long loops.

See also: ocore.utils.conversion.AffineConverter

>>> affine_converter, inverse_affine_converter = get_affine_converter_and_inverse(scale=0.5,offset=1)
>>> affine_converter(0)
-0.5
>>> affine_converter(10)
4.5
>>> inverse_affine_converter(4)
9.0
>>> inverse_affine_converter(4.5)
10.0
>>> affine_converter, inverse_affine_converter = get_affine_converter_and_inverse(scale=0.5,offset=1,target_type_cast=int)
>>> affine_converter(10)
4

py2store.utils.signatures

Deprecated: Forwards to py2store.signatures

py2store.utils.sliceable

utils to add sliceable functionality to stores

class py2store.utils.sliceable.iSliceStore(store)[source]

Wraps a store to make a reader that acts as if the store was a list (with integer keys, and that can be sliced). I say “list”, but it should be noted that the behavior is more that of range, that outputs an element of the list when keying with an integer, but returns an iterable object (a range) if sliced.

Here, a map object is returned when the sliceable store is sliced.

>>> s = {'foo': 'bar', 'hello': 'world', 'alice': 'bob'}
>>> sliceable_s = iSliceStore(s)
>>> sliceable_s[1]
'world'
>>> list(sliceable_s[0:2])
['bar', 'world']
>>> list(sliceable_s[-2:])
['world', 'bob']
>>> list(sliceable_s[:-1])
['bar', 'world']

py2store.utils.mappify

Utils to wrap any object into a mapping interface

class py2store.utils.mappify.LeafMappify(target, node_types=(<class 'dict'>, ), key_concat=<function Mappify.<lambda>>, names_of_literals=(), **kwargs)[source]

A dict-like interface to glom. Here, only leaf keys are taken into account.

>>> d = {
...     'a': 'simple',
...     'b': {'is': 'nested'},
...     'c': {'is': 'nested', 'and': 'has', 'a': [1, 2, 3]}
... }
>>> g = LeafMappify(d)
>>>
>>> assert list(g) == ['a', 'b.is', 'c.is', 'c.and', 'c.a']
>>> assert g['a'] == 'simple'
>>> assert g['b.is'] == 'nested'
>>> assert g['c.a'] == [1, 2, 3]
>>>
>>> for k, v in g.items():
...     print(f"{k}: {v}")
...
a: simple
b.is: nested
c.is: nested
c.and: has
c.a: [1, 2, 3]
class py2store.utils.mappify.Mappify(target, node_types=(<class 'dict'>, ), key_concat=<function Mappify.<lambda>>, names_of_literals=(), **kwargs)[source]
>>> d = {
...     'a': 'simple',
...     'b': {'is': 'nested'},
...     'c': {'is': 'nested', 'and': 'has', 'a': [1, 2, 3]}
... }
>>> g = Mappify(d)
>>>
>>> assert list(g) == ['a', 'b.is', 'b', 'c.is', 'c.and', 'c.a', 'c']
>>> assert g['a'] == 'simple'
>>> assert g['b.is'] == 'nested'
>>> assert g['c.a'] == [1, 2, 3]
>>>
>>> for k, v in g.items():
...     print(f"{k}: {v}")
...
a: simple
b.is: nested
b: {'is': 'nested'}
c.is: nested
c.and: has
c.a: [1, 2, 3]
c: {'is': 'nested', 'and': 'has', 'a': [1, 2, 3]}

py2store.utils.glom

glom is a util to extract stuff from nested structures. It’s one of those excellent utils that I’ve written many times, but never got quite right. Mahmoud Hashemi got it right.

BEGIN LICENSE

Copyright (c) 2018, Mahmoud Hashemi

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • The names of the contributors may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

END LICENSE

Now, at the time of writing this, I’ve already transformed it to bend it to my liking. At some point it may become something else, but I wanted there to be a trace of what my seed was. Though I can’t promise I’ll maintain the same functionality as I transform this module, here’s a tutorial on how to use it in it’s original form:

I only took the main (core) module from the glom project. Here’s the original docs of this glom module.

If there was ever a Python example of “big things come in small packages”, glom might be it.

The glom package has one central entrypoint, glom.glom(). Everything else in the package revolves around that one function.

A couple of conventional terms you’ll see repeated many times below:

  • target - glom is built to work on any data, so we simply refer to the object being accessed as the “target”

  • spec - (aka “glomspec”, short for specification) The accompanying template used to specify the structure of the return value.

Now that you know the terms, let’s take a look around glom’s powerful semantics.

class py2store.utils.glom.Auto(spec=None)[source]

Switch to Auto mode (the default)

TODO: this seems like it should be a sub-class of class Spec() – if Spec() could help define the interface for new “modes” or dialects that would also help make match mode feel less duct-taped on

class py2store.utils.glom.Call(func=None, args=None, kwargs=None)[source]

Call specifies when a target should be passed to a function, func.

Call is similar to partial() in that it is no more powerful than lambda or other functions, but it is designed to be more readable, with a better repr.

Parameters

func (callable) – a function or other callable to be called with the target

Call combines well with T to construct objects. For instance, to generate a dict and then pass it to a constructor:

>>> class ExampleClass(object):
...    def __init__(self, attr):
...        self.attr = attr
...
>>> target = {'attr': 3.14}
>>> glom(target, Call(ExampleClass, kwargs=T)).attr
3.14

This does the same as glom(target, lambda target: ExampleClass(**target)), but it’s easy to see which one reads better.

Note

Call is mostly for functions. Use a T object if you need to call a method.

Warning

Call has a successor with a fuller-featured API, new in 19.3.0: the Invoke specifier type.

glomit(target, scope)[source]

run against the current target

class py2store.utils.glom.Check(spec=T, **kwargs)[source]

Check objects are used to make assertions about the target data, and either pass through the data or raise exceptions if there is a problem.

If any check condition fails, a CheckError is raised.

Parameters
  • spec – a sub-spec to extract the data to which other assertions will be checked (defaults to applying checks to the target itself)

  • type – a type or sequence of types to be checked for exact match

  • equal_to – a value to be checked for equality match (“==”)

  • validate – a callable or list of callables, each representing a check condition. If one or more return False or raise an exception, the Check will fail.

  • instance_of – a type or sequence of types to be checked with isinstance()

  • one_of – an iterable of values, any of which can match the target (“in”)

  • default – an optional default value to replace the value when the check fails (if default is not specified, GlomCheckError will be raised)

Aside from spec, all arguments are keyword arguments. Each argument, except for default, represent a check condition. Multiple checks can be passed, and if all check conditions are left unset, Check defaults to performing a basic truthy check on the value.

exception py2store.utils.glom.CheckError(msgs, check, path)[source]

This GlomError subtype is raised when target data fails to pass a Check’s specified validation.

An uncaught CheckError looks like this:

>>> target = {'a': {'b': 'c'}}
>>> glom(target, {'b': ('a.b', Check(type=int))})  
Traceback (most recent call last):
...
glom.CheckError: target at path ['a.b'] failed check, got error: "expected type to be 'int', found type 'str'"

If the Check contains more than one condition, there may be more than one error message. The string rendition of the CheckError will include all messages.

You can also catch the CheckError and programmatically access messages through the msgs attribute on the CheckError instance.

Note

As of 2018-07-05 (glom v18.2.0), the validation subsystem is still very new. Exact error message formatting may be enhanced in future releases.

class py2store.utils.glom.Coalesce(*subspecs, **kwargs)[source]

Coalesce objects specify fallback behavior for a list of subspecs.

Subspecs are passed as positional arguments, and keyword arguments control defaults. Each subspec is evaluated in turn, and if none match, a CoalesceError is raised, or a default is returned, depending on the options used.

Note

This operation may seem very familar if you have experience with SQL or even C# and others.

In practice, this fallback behavior’s simplicity is only surpassed by its utility:

>>> target = {'c': 'd'}
>>> glom(target, Coalesce('a', 'b', 'c'))
'd'

glom tries to get 'a' from target, but gets a KeyError. Rather than raise a PathAccessError as usual, glom coalesces into the next subspec, 'b'. The process repeats until it gets to 'c', which returns our value, 'd'. If our value weren’t present, we’d see:

>>> target = {}
>>> glom(target, Coalesce('a', 'b'))  
Traceback (most recent call last):
...
glom.CoalesceError: no valid values found. Tried ('a', 'b') and got (PathAccessError, PathAccessError) (at path [])

Same process, but because target is empty, we get a CoalesceError. If we want to avoid an exception, and we know which value we want by default, we can set default:

>>> target = {}
>>> glom(target, Coalesce('a', 'b', 'c'), default='d-fault')
'd-fault'

'a', 'b', and 'c' weren’t present so we got 'd-fault'.

Parameters
  • subspecs – One or more glommable subspecs

  • default – A value to return if no subspec results in a valid value

  • default_factory – A callable whose result will be returned as a default

  • skip – A value, tuple of values, or predicate function representing values to ignore

  • skip_exc – An exception or tuple of exception types to catch and move on to the next subspec. Defaults to GlomError, the parent type of all glom runtime exceptions.

If all subspecs produce skipped values or exceptions, a CoalesceError will be raised. For more examples, check out the tutorial, which makes extensive use of Coalesce.

exception py2store.utils.glom.CoalesceError(coal_obj, skipped, path)[source]

This GlomError subtype is raised from within a Coalesce spec’s processing, when none of the subspecs match and no default is provided.

The exception object itself keeps track of several values which may be useful for processing:

Parameters
  • coal_obj (Coalesce) – The original failing spec, see Coalesce’s docs for details.

  • skipped (list) – A list of ignored values and exceptions, in the order that their respective subspecs appear in the original coal_obj.

  • path – Like many GlomErrors, this exception knows the path at which it occurred.

>>> target = {}
>>> glom(target, Coalesce('a', 'b'))  
Traceback (most recent call last):
...
glom.CoalesceError: no valid values found. Tried ('a', 'b') and got (PathAccessError, PathAccessError) ...
class py2store.utils.glom.Fill(spec=None)[source]

A specifier type which switches to glom into “fill-mode”. For the spec contained within the Fill, glom will only interpret explicit specifier types (including T objects). Whereas the default mode has special interpretations for each of these builtins, fill-mode takes a lighter touch, making Fill great for “filling out” Python literals, like tuples, dicts, sets, and lists.

>>> target = {'data': [0, 2, 4]}
>>> spec = Fill((T['data'][2], T['data'][0]))
>>> glom(target, spec)
(4, 0)

As you can see, glom’s usual built-in tuple item chaining behavior has switched into a simple tuple constructor.

(Sidenote for Lisp fans: Fill is like glom’s quasi-quoting.)

exception py2store.utils.glom.GlomError[source]

The base exception for all the errors that might be raised from glom() processing logic.

By default, exceptions raised from within functions passed to glom (e.g., len, sum, any lambda) will not be wrapped in a GlomError.

class py2store.utils.glom.Glommer(**kwargs)[source]

All the wholesome goodness that it takes to make glom work. This type mostly serves to encapsulate the type registration context so that advanced uses of glom don’t need to worry about stepping on each other’s toes.

Glommer objects are lightweight and, once instantiated, provide the glom() method we know and love:

>>> glommer = Glommer()
>>> glommer.glom({}, 'a.b.c', default='d')
'd'
>>> Glommer().glom({'vals': list(range(3))}, ('vals', len))
3

Instances also provide register() method for localized control over type handling.

Parameters

register_default_types (bool) – Whether or not to enable the handling behaviors of the default glom(). These default actions include dict access, list and iterable iteration, and generic object attribute access. Defaults to True.

register(target_type, **kwargs)[source]

Register target_type so glom() will know how to handle instances of that type as targets.

Parameters
  • target_type (type) – A type expected to appear in a glom() call target

  • get (callable) – A function which takes a target object and a name, acting as a default accessor. Defaults to getattr().

  • iterate (callable) – A function which takes a target object and returns an iterator. Defaults to iter() if target_type appears to be iterable.

  • exact (bool) – Whether or not to match instances of subtypes of target_type.

Note

The module-level register() function affects the module-level glom() function’s behavior. If this global effect is undesirable for your application, or you’re implementing a library, consider instantiating a Glommer instance, and using the register() and Glommer.glom() methods instead.

class py2store.utils.glom.Inspect(*a, **kw)[source]

The Inspect specifier type provides a way to get visibility into glom’s evaluation of a specification, enabling debugging of those tricky problems that may arise with unexpected data.

Inspect can be inserted into an existing spec in one of two ways. First, as a wrapper around the spec in question, or second, as an argument-less placeholder wherever a spec could be.

Inspect supports several modes, controlled by keyword arguments. Its default, no-argument mode, simply echos the state of the glom at the point where it appears:

>>> target = {'a': {'b': {}}}
>>> val = glom(target, Inspect('a.b'))  # wrapping a spec
---
path:   ['a.b']
target: {'a': {'b': {}}}
output: {}
---

Debugging behavior aside, Inspect has no effect on values in the target, spec, or result.

Parameters
  • echo (bool) – Whether to print the path, target, and output of each inspected glom. Defaults to True.

  • recursive (bool) – Whether or not the Inspect should be applied at every level, at or below the spec that it wraps. Defaults to False.

  • breakpoint (bool) – This flag controls whether a debugging prompt should appear before evaluating each inspected spec. Can also take a callable. Defaults to False.

  • post_mortem (bool) – This flag controls whether exceptions should be caught and interactively debugged with pdb on inspected specs.

All arguments above are keyword-only to avoid overlap with a wrapped spec.

Note

Just like pdb.set_trace(), be careful about leaving stray Inspect() instances in production glom specs.

class py2store.utils.glom.Invoke(func)[source]

Specifier type designed for easy invocation of callables from glom.

Parameters

func (callable) – A function or other callable object.

Invoke is similar to functools.partial(), but with the ability to set up a “templated” call which interleaves constants and glom specs.

For example, the following creates a spec which can be used to check if targets are integers:

>>> is_int = Invoke(isinstance).specs(T).constants(int)
>>> glom(5, is_int)
True

And this composes like any other glom spec:

>>> target = [7, object(), 9]
>>> glom(target, [is_int])
[True, False, True]

Another example, mixing positional and keyword arguments:

>>> spec = Invoke(sorted).specs(T).constants(key=int, reverse=True)
>>> target = ['10', '5', '20', '1']
>>> glom(target, spec)
['20', '10', '5', '1']

Invoke also helps with evaluating zero-argument functions:

>>> glom(target={}, spec=Invoke(int))
0

(A trivial example, but from timestamps to UUIDs, zero-arg calls do come up!)

Note

Invoke is mostly for functions, object construction, and callable objects. For calling methods, consider the T object.

constants(*a, **kw)[source]

Returns a new Invoke spec, with the provided positional and keyword argument values stored for passing to the underlying function.

>>> spec = Invoke(T).constants(5)
>>> glom(range, (spec, list))
[0, 1, 2, 3, 4]

Subsequent positional arguments are appended:

>>> spec = Invoke(T).constants(2).constants(10, 2)
>>> glom(range, (spec, list))
[2, 4, 6, 8]

Keyword arguments also work as one might expect:

>>> round_2 = Invoke(round).constants(ndigits=2).specs(T)
>>> glom(3.14159, round_2)
3.14

constants() and other Invoke methods may be called multiple times, just remember that every call returns a new spec.

classmethod specfunc(spec)[source]

Creates an Invoke instance where the function is indicated by a spec.

>>> spec = Invoke.specfunc('func').constants(5)
>>> glom({'func': range}, (spec, list))
[0, 1, 2, 3, 4]
specs(*a, **kw)[source]

Returns a new Invoke spec, with the provided positional and keyword arguments stored to be interpreted as specs, with the results passed to the underlying function.

>>> spec = Invoke(range).specs('value')
>>> glom({'value': 5}, (spec, list))
[0, 1, 2, 3, 4]

Subsequent positional arguments are appended:

>>> spec = Invoke(range).specs('start').specs('end', 'step')
>>> target = {'start': 2, 'end': 10, 'step': 2}
>>> glom(target, (spec, list))
[2, 4, 6, 8]

Keyword arguments also work as one might expect:

>>> multiply = lambda x, y: x * y
>>> times_3 = Invoke(multiply).constants(y=3).specs(x='value')
>>> glom({'value': 5}, times_3)
15

specs() and other Invoke methods may be called multiple times, just remember that every call returns a new spec.

star(args=None, kwargs=None)[source]

Returns a new Invoke spec, with args and/or kwargs specs set to be “starred” or “star-starred” (respectively)

>>> import os.path
>>> spec = Invoke(os.path.join).star(args='path')
>>> target = {'path': ['path', 'to', 'dir']}
>>> glom(target, spec)
'path/to/dir'
Parameters
  • args (spec) – A spec to be evaluated and “starred” into the underlying function.

  • kwargs (spec) – A spec to be evaluated and “star-starred” into the underlying function.

One or both of the above arguments should be set.

The star(), like other Invoke methods, may be called multiple times. The args and kwargs will be stacked in the order in which they are provided.

class py2store.utils.glom.Let(**kw)[source]

This specifier type assigns variables to the scope.

>>> target = {'data': {'val': 9}}
>>> spec = (Let(value=T['data']['val']), {'val': S['value']})
>>> glom(target, spec)
{'val': 9}
class py2store.utils.glom.Literal(value)[source]

Literal objects specify literal values in rare cases when part of the spec should not be interpreted as a glommable subspec. Wherever a Literal object is encountered in a spec, it is replaced with its wrapped value in the output.

>>> target = {'a': {'b': 'c'}}
>>> spec = {'a': 'a.b', 'readability': Literal('counts')}
>>> pprint(glom(target, spec))
{'a': 'c', 'readability': 'counts'}

Instead of accessing 'counts' as a key like it did with 'a.b', glom() just unwrapped the literal and included the value.

Literal takes one argument, the literal value that should appear in the glom output.

This could also be achieved with a callable, e.g., lambda x: 'literal_string' in the spec, but using a Literal object adds explicitness, code clarity, and a clean repr().

class py2store.utils.glom.Path(*path_parts)[source]

Path objects specify explicit paths when the default 'a.b.c'-style general access syntax won’t work or isn’t desirable. Use this to wrap ints, datetimes, and other valid keys, as well as strings with dots that shouldn’t be expanded.

>>> target = {'a': {'b': 'c', 'd.e': 'f', 2: 3}}
>>> glom(target, Path('a', 2))
3
>>> glom(target, Path('a', 'd.e'))
'f'

Paths can be used to join together other Path objects, as well as T objects:

>>> Path(T['a'], T['b'])
T['a']['b']
>>> Path(Path('a', 'b'), Path('c', 'd'))
Path('a', 'b', 'c', 'd')

Paths also support indexing and slicing, with each access returning a new Path object:

>>> path = Path('a', 'b', 1, 2)
>>> path[0]
Path('a')
>>> path[-2:]
Path(1, 2)
from_t()[source]

return the same path but starting from T

classmethod from_text(text)[source]

Make a Path from .-delimited text:

>>> Path.from_text('a.b.c')
Path('a', 'b', 'c')
items()[source]

Returns a tuple of (operation, value) pairs.

>>> Path(T.a.b, 'c', T['d']).items()
(('.', 'a'), ('.', 'b'), ('P', 'c'), ('[', 'd'))
values()[source]

Returns a tuple of values referenced in this path.

>>> Path(T.a.b, 'c', T['d']).values()
('a', 'b', 'c', 'd')
exception py2store.utils.glom.PathAccessError(exc, path, part_idx)[source]

This GlomError subtype represents a failure to access an attribute as dictated by the spec. The most commonly-seen error when using glom, it maintains a copy of the original exception and produces a readable error message for easy debugging.

If you see this error, you may want to:

  • Check the target data is accurate using Inspect

  • Catch the exception and return a semantically meaningful error message

  • Use glom.Coalesce to specify a default

  • Use the top-level default kwarg on glom()

In any case, be glad you got this error and not the one it was wrapping!

Parameters
  • exc (Exception) – The error that arose when we tried to access path. Typically an instance of KeyError, AttributeError, IndexError, or TypeError, and sometimes others.

  • path (Path) – The full Path glom was in the middle of accessing when the error occurred.

  • part_idx (int) – The index of the part of the path that caused the error.

>>> target = {'a': {'b': None}}
>>> glom(target, 'a.b.c')  
Traceback (most recent call last):
...
glom.PathAccessError: could not access 'c', part 2 of Path('a', 'b', 'c'), got error: ...
class py2store.utils.glom.Spec(spec, scope=None)[source]

Spec objects serve three purposes, here they are, roughly ordered by utility:

  1. As a form of compiled or “curried” glom call, similar to Python’s built-in re.compile().

  2. A marker as an object as representing a spec rather than a literal value in certain cases where that might be ambiguous.

  3. A way to update the scope within another Spec.

In the second usage, Spec objects are the complement to Literal, wrapping a value and marking that it should be interpreted as a glom spec, rather than a literal value. This is useful in places where it would be interpreted as a value by default. (Such as T[key], Call(func) where key and func are assumed to be literal values and not specs.)

Parameters
  • spec – The glom spec.

  • scope (dict) – additional values to add to the scope when evaluating this Spec

class py2store.utils.glom.TType[source]

T, short for “target”. A singleton object that enables object-oriented expression of a glom specification.

Note

T is a singleton, and does not need to be constructed.

Basically, think of T as your data’s stunt double. Everything that you do to T will be recorded and executed during the glom() call. Take this example:

>>> spec = T['a']['b']['c']
>>> target = {'a': {'b': {'c': 'd'}}}
>>> glom(target, spec)
'd'

So far, we’ve relied on the 'a.b.c'-style shorthand for access, or used the Path objects, but if you want to explicitly do attribute and key lookups, look no further than T.

But T doesn’t stop with unambiguous access. You can also call methods and perform almost any action you would with a normal object:

>>> spec = ('a', (T['b'].items(), list))  # reviewed below
>>> glom(target, spec)
[('c', 'd')]

A T object can go anywhere in the spec. As seen in the example above, we access 'a', use a T to get 'b' and iterate over its items, turning them into a list.

You can even use T with Call to construct objects:

>>> class ExampleClass(object):
...    def __init__(self, attr):
...        self.attr = attr
...
>>> target = {'attr': 3.14}
>>> glom(target, Call(ExampleClass, kwargs=T)).attr
3.14

On a further note, while lambda works great in glom specs, and can be very handy at times, T and Call eliminate the need for the vast majority of lambda usage with glom.

Unlike lambda and other functions, T roundtrips beautifully and transparently:

>>> T['a'].b['c']('success')
T['a'].b['c']('success')

T-related access errors raise a PathAccessError during the glom() call.

Note

While T is clearly useful, powerful, and here to stay, its semantics are still being refined. Currently, operations beyond method calls and attribute/item access are considered experimental and should not be relied upon.

class py2store.utils.glom.TargetRegistry(register_default_types=True)[source]

responsible for registration of target types for iteration and attribute walking

get_handler(op, obj, path=None, raise_exc=True)[source]

for an operation and object instance, obj, return the closest-matching handler function, raising UnregisteredTarget if no handler can be found for obj (or False if raise_exc=False)

register_op(op_name, auto_func=None, exact=False)[source]

add operations beyond the builtins (‘get’ and ‘iterate’ at the time of writing).

auto_func is a function that when passed a type, returns a handler associated with op_name if it’s supported, or False if it’s not.

See glom.core.register_op() for the global version used by extensions.

exception py2store.utils.glom.UnregisteredTarget(op, target_type, type_map, path)[source]

This GlomError subtype is raised when a spec calls for an unsupported action on a target type. For instance, trying to iterate on an non-iterable target:

>>> glom(object(), ['a.b.c'])  
Traceback (most recent call last):
...
glom.UnregisteredTarget: target type 'object' not registered for 'iterate', expected one of registered types: (...)

It should be noted that this is a pretty uncommon occurrence in production glom usage. See the setup-and-registration section for details on how to avoid this error.

An UnregisteredTarget takes and tracks a few values:

Parameters
  • op (str) – The name of the operation being performed (‘get’ or ‘iterate’)

  • target_type (type) – The type of the target being processed.

  • type_map (dict) – A mapping of target types that do support this operation

  • path – The path at which the error occurred.

py2store.utils.glom.glom(target, spec, **kwargs)[source]

Access or construct a value from a given target based on the specification declared by spec.

Accessing nested data, aka deep-get:

>>> target = {'a': {'b': 'c'}}
>>> glom(target, 'a.b')
'c'

Here the spec was just a string denoting a path, 'a.b.. As simple as it should be. The next example shows how to use nested data to access many fields at once, and make a new nested structure.

Constructing, or restructuring more-complicated nested data:

>>> target = {'a': {'b': 'c', 'd': 'e'}, 'f': 'g', 'h': [0, 1, 2]}
>>> spec = {'a': 'a.b', 'd': 'a.d', 'h': ('h', [lambda x: x * 2])}
>>> output = glom(target, spec)
>>> pprint(output)
{'a': 'c', 'd': 'e', 'h': [0, 2, 4]}

glom also takes a keyword-argument, default. When set, if a glom operation fails with a GlomError, the default will be returned, very much like dict.get():

>>> glom(target, 'a.xx', default='nada')
'nada'

The skip_exc keyword argument controls which errors should be ignored.

>>> glom({}, lambda x: 100.0 / len(x), default=0.0, skip_exc=ZeroDivisionError)
0.0
Parameters
  • target (object) – the object on which the glom will operate.

  • spec (object) – Specification of the output object in the form of a dict, list, tuple, string, other glom construct, or any composition of these.

  • default (object) – An optional default to return in the case an exception, specified by skip_exc, is raised.

  • skip_exc (Exception) – An optional exception or tuple of exceptions to ignore and return default (None if omitted). If skip_exc and default are both not set, glom raises errors through.

  • scope (dict) – Additional data that can be accessed via S inside the glom-spec.

It’s a small API with big functionality, and glom’s power is only surpassed by its intuitiveness. Give it a whirl!

py2store.utils.glom.is_iterable(x)[source]

Similar in nature to callable(), is_iterable returns True if an object is `iterable`_, False if not. >>> is_iterable([]) True >>> is_iterable(1) False

py2store.utils.glom.make_sentinel(name='_MISSING', var_name=None)[source]

Creates and returns a new instance of a new class, suitable for usage as a “sentinel”, a kind of singleton often used to indicate a value is missing when None is a valid input.

Parameters
  • name (str) – Name of the Sentinel

  • var_name (str) – Set this name to the name of the variable in its respective module enable pickleability.

>>> make_sentinel(var_name='_MISSING')
_MISSING

The most common use cases here in boltons are as default values for optional function arguments, partly because of its less-confusing appearance in automatically generated documentation. Sentinels also function well as placeholders in queues and linked lists.

Note

By design, additional calls to make_sentinel with the same values will not produce equivalent objects.

>>> make_sentinel('TEST') == make_sentinel('TEST')
False
>>> type(make_sentinel('TEST')) == type(make_sentinel('TEST'))
False
py2store.utils.glom.register(target_type, **kwargs)[source]

Register target_type so glom() will know how to handle instances of that type as targets.

Parameters
  • target_type (type) – A type expected to appear in a glom() call target

  • get (callable) – A function which takes a target object and a name, acting as a default accessor. Defaults to getattr().

  • iterate (callable) – A function which takes a target object and returns an iterator. Defaults to iter() if target_type appears to be iterable.

  • exact (bool) – Whether or not to match instances of subtypes of target_type.

Note

The module-level register() function affects the module-level glom() function’s behavior. If this global effect is undesirable for your application, or you’re implementing a library, consider instantiating a Glommer instance, and using the register() and Glommer.glom() methods instead.

py2store.utils.glom.register_op(op_name, **kwargs)[source]

For extension authors needing to add operations beyond the builtin ‘get’ and ‘iterate’ to the default scope. See TargetRegistry for more details.

py2store.persisters.sql_w_odbc

py2store.persisters.dynamodb_w_boto3

py2store.persisters.couchdb_w_couchdb

py2store.persisters.ftp_persister

py2store.persisters.dropbox_w_urllib

py2store.persisters._google_drive_in_progress

py2store.persisters.dropbox_w_dropbox

Forwards to dropboxdol

py2store.persisters.redis_w_redis

Forwards to redisdol

py2store.persisters.sql_w_sqlalchemy

Forwards to sqldol

py2store.persisters.new_s3

Forwards to s3dol.new_s3

py2store.persisters

base persisters – now all forwarding to separate libraries

py2store.persisters.dropbox_w_requests

py2store.persisters.w_aiofile

Forwards to aiofiledol

py2store.persisters.local_files

base classes to work with local files

class py2store.persisters.local_files.DirReader(rootdir)[source]

KV Reader whose keys (AND VALUES) are directory full paths of the subdirectories of rootdir.

class py2store.persisters.local_files.DirpathFormatKeys(path_format: str, max_levels: int = inf)[source]
class py2store.persisters.local_files.FileReader(rootdir)[source]

KV Reader whose keys are paths and values are: - Another FileReader if a path points to a directory - The bytes of the file if the path points to a file.

class py2store.persisters.local_files.FilepathFormatKeys(path_format: str, max_levels: int = inf)[source]
exception py2store.persisters.local_files.FolderNotFoundError[source]
class py2store.persisters.local_files.LocalFileRWD(mode='', **open_kwargs)[source]

A class providing get, set and delete functionality using local files as the storage backend.

class py2store.persisters.local_files.LocalFileStreamGetter(**open_kwargs)[source]

A class to get stream objects of local open files. The class can only get keys, and only to read, write (destructive or append).

>>> from tempfile import mkdtemp
>>> import os
>>> rootdir = mkdtemp()
>>>
>>> appendable_stream = LocalFileStreamGetter(mode='a+')
>>> reader = PathFormatPersister(rootdir)
>>> filepath = os.path.join(rootdir, 'tmp.txt')
>>>
>>> with appendable_stream[filepath] as fp:
...     fp.write('hello')
5
>>> print(reader[filepath])
hello
>>> with appendable_stream[filepath] as fp:
...     fp.write(' world')
6
>>>
>>> print(reader[filepath])
hello world
class py2store.persisters.local_files.PathFormatPersister(path_format, max_levels: int = inf, mode='', **open_kwargs)[source]
class py2store.persisters.local_files.PrefixedDirpathsRecursive[source]

Keys collection for local files, where the keys are full filepaths RECURSIVELY under a given root dir _prefix. This mixin adds iteration (__iter__), length (__len__), and containment (__contains__(k)).

class py2store.persisters.local_files.PrefixedFilepaths[source]

Keys collection for local files, where the keys are full filepaths DIRECTLY under a given root dir _prefix. This mixin adds iteration (__iter__), length (__len__), and containment (__contains__(k)).

class py2store.persisters.local_files.PrefixedFilepathsRecursive[source]

Keys collection for local files, where the keys are full filepaths RECURSIVELY under a given root dir _prefix. This mixin adds iteration (__iter__), length (__len__), and containment (__contains__(k)).

py2store.persisters.local_files.ensure_slash_suffix(path: str)[source]

Add a file separation (/ or ) at the end of path str, if not already present.

py2store.persisters.arangodb_w_pyarango

py2store.persisters._cassandra_in_progress

py2store.persisters._couchdb_in_progress

py2store.persisters.s3_w_boto3

Forwards to s3dol.s3_w_boto3

py2store.persisters._postgres_w_psycopg2_in_progress

py2store.persisters.ssh_persister

py2store.persisters.mongo_w_pymongo

py2store.persisters.googledrive_w_pydrive

Forwards to pydrivedol

py2store.sources

Forwards to dol.sources:

This module contains key-value views of disparate sources.

py2store.dig

Forwards to dol.dig:

Layers introspection

py2store.serializers.pickled

functions to pickle objects

py2store.serializers.pickled.mk_marshal_rw_funcs(**kwargs)[source]

Generates a reader and writer using marshal. That is, a pair of parametrized loads and dumps

>>> read, write = mk_marshal_rw_funcs()
>>> d = {'a': 'simple', 'and': {'a': b'more', 'complex': [1, 2.2]}}
>>> serialized_d = write(d)
>>> deserialized_d = read(serialized_d)
>>> assert d == deserialized_d
py2store.serializers.pickled.mk_pickle_rw_funcs(fix_imports=True, protocol=None, pickle_encoding='ASCII', pickle_errors='strict')[source]

Generates a reader and writer using pickle. That is, a pair of parametrized loads and dumps

>>> read, write = mk_pickle_rw_funcs()
>>> d = {'a': 'simple', 'and': {'a': b'more', 'complex': [1, 2.2, dict]}}
>>> serialized_d = write(d)
>>> deserialized_d = read(serialized_d)
>>> assert d == deserialized_d

py2store.serializers.jsonization

py2store.serializers

a package of serializers

py2store.serializers.sequential

py2store.serializers.regular_panel_data

py2store.serializers.audio

py2store.caching

Forwards to dol.caching:

Tools to add caching layers to stores.

py2store.scrap

py2store.scrap.new_gen_local

py2store.examples.write_caches

stores that implement various write caching algorithms

py2store.examples.write_caches.timestamp_on_cache_and_concatenate_all_values()[source]

The cache timestamps (with system clock) every item on insertion (append) and uses the min timestamp as a key for storage.

py2store.examples

modules demoing various uses of py2store

py2store.examples.python_code_stats

Note: Moved to umpyre (pip install umpyre)

Get stats about packages. Your own, or other’s. Things like…

# >>> import collections # >>> modules_info_df(collections) # lines empty_lines … num_of_functions num_of_classes # collections.__init__ 1273 189 … 1 9 # collections.abc 3 1 … 0 25 # <BLANKLINE> # [2 rows x 7 columns] # >>> modules_info_df_stats(collections.abc) # lines 1276.000000 # empty_lines 190.000000 # comment_lines 73.000000 # docs_lines 133.000000 # function_lines 138.000000 # num_of_functions 1.000000 # num_of_classes 34.000000 # empty_lines_ratio 0.148903 # comment_lines_ratio 0.057210 # function_lines_ratio 0.108150 # mean_lines_per_function 138.000000 # dtype: float64 # >>> stats_of([‘urllib’, ‘json’, ‘collections’]) # urllib json collections # empty_lines_ratio 0.157034 0.136818 0.148903 # comment_lines_ratio 0.074142 0.038432 0.057210 # function_lines_ratio 0.213907 0.449654 0.108150 # mean_lines_per_function 13.463768 41.785714 138.000000 # lines 4343.000000 1301.000000 1276.000000 # empty_lines 682.000000 178.000000 190.000000 # comment_lines 322.000000 50.000000 73.000000 # docs_lines 425.000000 218.000000 133.000000 # function_lines 929.000000 585.000000 138.000000 # num_of_functions 69.000000 14.000000 1.000000 # num_of_classes 55.000000 3.000000 34.000000

py2store.examples.kv_walking

walking through kv stores

class py2store.examples.kv_walking.SrcReader(src, src_to_keys, key_to_obj)[source]
update_keys_cache(keys)

Updates the _keys_cache by calling its {} method

py2store.examples.kv_walking.conjunction(*args, **kwargs)[source]

` will be equal to ` func_1(*args, **kwargs) & … & func_n(*args, **kwargs) ``` for all args, kwargs.

py2store.examples.kv_walking.kv_walk(v: collections.abc.Mapping, yield_func=<function asis>, walk_filt=<function val_is_mapping>, pkv_to_pv=<function tuple_keypath_and_val>, p=())[source]
Parameters
  • v

  • yield_func – (pp, k, vv) -> what ever you want the gen to yield

  • walk_filt – (p, k, vv) -> (bool) whether to explore the nested structure v further

  • pkv_to_pv – (p, k, v) -> (pp, vv) where pp is a form of p + k (update of the path with the new node k) and vv is the value that will be used by both walk_filt and yield_func

  • p – The path to v

>>> d = {'a': 1, 'b': {'c': 2, 'd': 3}}
>>> list(kv_walk(d))
[(('a',), 'a', 1), (('b',), 'b', {'c': 2, 'd': 3}), (('b', 'c'), 'c', 2), (('b', 'd'), 'd', 3)]
>>> list(kv_walk(d, lambda p, k, v: '.'.join(p)))
['a', 'b', 'b.c', 'b.d']
>>> list(kv_walk(d, lambda p, k, v: '.'.join(p)))
['a', 'b', 'b.c', 'b.d']

py2store.my

functionalities meant to be configurable

py2store.my.grabbers

define stores (and functions) so they give you data as you want it, depending on the extension

py2store.trans

Forwards to dol.trans:

Transformation/wrapping tools

py2store.key_mappers.str_utils

utils from strings

py2store.key_mappers.str_utils.args_and_kwargs_indices(format_string)[source]

Get the sets of indices and names used in manual specification of format strings, or None, None if auto spec. :param format_string: A format string (i.e. a string with {…} to mark parameter placement and formatting

Returns

None, None if format_string is an automatic specification set_of_indices_used, set_of_fields_used if it is a manual specification

>>> format_string = '{0} (no 1) {2}, {see} this, {0} is a duplicate (appeared before) and {name} is string-named'
>>> assert args_and_kwargs_indices(format_string) == ({0, 2}, {'name', 'see'})
>>> format_string = 'This is a format string with only automatic field specification: {}, {}, {} etc.'
>>> assert args_and_kwargs_indices(format_string) == (set(), set())
py2store.key_mappers.str_utils.auto_field_format_str(format_str)[source]

Get an auto field version of the format_str

Parameters

format_str – A format string

Returns

A transformed format_str that has no names {inside} {formatting} {braces}.

>>> auto_field_format_str('R/{0}/{one}/{}/{two}/T')
'R/{}/{}/{}/{}/T'
py2store.key_mappers.str_utils.compile_str_from_parsed(parsed)[source]

The (quasi-)inverse of string.Formatter.parse.

Parameters
  • parsed – iterator of (literal_text, field_name, format_spec, conversion) tuples,

  • yield by string.Formatter.parse (as) –

Returns

A format string that would produce such a parsed input.

>>> s =  "ROOT/{}/{0!r}/{1!i:format}/hello{:0.02f}TAIL"
>>> assert compile_str_from_parsed(string.Formatter().parse(s)) == s
>>>
>>> # Or, if you want to see more details...
>>> parsed = list(string.Formatter().parse(s))
>>> for p in parsed:
...     print(p)
('ROOT/', '', '', None)
('/', '0', '', 'r')
('/', '1', 'format', 'i')
('/hello', '', '0.02f', None)
('TAIL', None, None, None)
>>> compile_str_from_parsed(parsed)
'ROOT/{}/{0!r}/{1!i:format}/hello{:0.02f}TAIL'
py2store.key_mappers.str_utils.format_params_in_str_format(format_string)[source]

Get the “parameter” indices/names of the format_string

Parameters

format_string – A format string (i.e. a string with {…} to mark parameter placement and formatting

Returns

A list of parameter indices used in the format string, in the order they appear, with repetition. Parameter indices could be integers, strings, or None (to denote “automatic field numbering”.

>>> format_string = '{0} (no 1) {2}, and {0} is a duplicate, {} is unnamed and {name} is string-named'
>>> format_params_in_str_format(format_string)
[0, 2, 0, None, 'name']
py2store.key_mappers.str_utils.get_explicit_positions(parsed_str_format)[source]
>>> parsed = parse_str_format("all/{}/is/{2}/position/{except}{this}{0}")
>>> get_explicit_positions(parsed)
{0, 2}
py2store.key_mappers.str_utils.is_automatic_format_params(format_params)[source]

Says if the format_params is from an automatic specification See Also: is_manual_format_params and is_hybrid_format_params

py2store.key_mappers.str_utils.is_automatic_format_string(format_string)[source]

Says if the format_string is uses automatic specification See Also: is_manual_format_params >>> is_automatic_format_string(‘Manual: indices: {1} {2}, named: {named} {fields}’) False >>> is_automatic_format_string(‘Auto: only un-indexed and un-named: {} {}…’) True >>> is_automatic_format_string(‘Hybrid: at least a {}, and a {0} or a {name}’) False >>> is_manual_format_string(‘No formatting is both manual and automatic formatting!’) True

py2store.key_mappers.str_utils.is_hybrid_format_params(format_params)[source]

Says if the format_params is from a hybrid of auto and manual. Note: Hybrid specifications are considered non-valid and can’t be formatted with format_string.format(…). Yet, it can be useful for flexibility of expression (but will need to be resolved to be used). See Also: is_manual_format_params and is_automatic_format_params

py2store.key_mappers.str_utils.is_hybrid_format_string(format_string)[source]

Says if the format_params is from a hybrid of auto and manual. Note: Hybrid specifications are considered non-valid and can’t be formatted with format_string.format(…). Yet, it can be useful for flexibility of expression (but will need to be resolved to be used).

>>> is_hybrid_format_string('Manual: indices: {1} {2}, named: {named} {fields}')
False
>>> is_hybrid_format_string('Auto: only un-indexed and un-named: {} {}...')
False
>>> is_hybrid_format_string('Hybrid: at least a {}, and a {0} or a {name}')
True
>>> is_manual_format_string('No formatting is both manual and automatic formatting (so hybrid is both)!')
True
py2store.key_mappers.str_utils.is_manual_format_params(format_params)[source]

Says if the format_params is from a manual specification See Also: is_automatic_format_params

py2store.key_mappers.str_utils.is_manual_format_string(format_string)[source]

Says if the format_string uses a manual specification See Also: is_automatic_format_string and >>> is_manual_format_string(‘Manual: indices: {1} {2}, named: {named} {fields}’) True >>> is_manual_format_string(‘Auto: only un-indexed and un-named: {} {}…’) False >>> is_manual_format_string(‘Hybrid: at least a {}, and a {0} or a {name}’) False >>> is_manual_format_string(‘No formatting is both manual and automatic formatting!’) True

py2store.key_mappers.str_utils.manual_field_format_str(format_str)[source]

Get an auto field version of the format_str

Parameters

format_str – A format string

Returns

A transformed format_str that has no names {inside} {formatting} {braces}.

>>> auto_field_format_str('R/{0}/{one}/{}/{two}/T')
'R/{}/{}/{}/{}/T'
py2store.key_mappers.str_utils.n_format_params_in_str_format(format_string)[source]

The number of parameters

py2store.key_mappers.str_utils.name_fields_in_format_str(format_str, field_names=None)[source]

Get a manual field version of the format_str

Parameters
  • format_str – A format string

  • names – An iterable that produces enough strings to fill all of format_str fields

Returns

A transformed format_str

>>> name_fields_in_format_str('R/{0}/{one}/{}/{two}/T')
'R/{0}/{1}/{2}/{3}/T'
>>> # Note here that we use the field name to inject a field format as well
>>> name_fields_in_format_str('R/{foo}/{0}/{}/T', ['42', 'hi:03.0f', 'world'])
'R/{42}/{hi:03.0f}/{world}/T'

py2store.key_mappers.tuples

Tools to map tuple-structured keys. That is, converting from any of the following kinds of keys:

  • tuples (or list-like)

  • dicts

  • formatted/templated strings

  • dsv (Delimiter-Separated Values)

py2store.key_mappers.tuples.dsv_of_list(d, sep=',')[source]

Converting a list of strings to a dsv (delimiter-separated values) string.

Note that unlike most key mappers, there is no schema imposing size here. If you wish to impose a size validation, do so externally (we suggest using a decorator for that).

Parameters
  • d – A list of component strings

  • sep – The delimiter text used to separate a string into a list of component strings

Returns

The delimiter-separated values (dsv) string for the input tuple

>>> dsv_of_list(['a', 'brown', 'fox'], sep=' ')
'a brown fox'
>>> dsv_of_list(('jumps', 'over'), sep='/')  # for filepaths (and see that tuple inputs work too!)
'jumps/over'
>>> dsv_of_list(['Sat', 'Jan', '1', '1983'], sep=',')  # csv: the usual delimiter-separated values format
'Sat,Jan,1,1983'
>>> dsv_of_list(['First', 'Last'], sep=':::')  # a longer delimiter
'First:::Last'
>>> dsv_of_list(['singleton'], sep='@')  # when the list has only one element
'singleton'
>>> dsv_of_list([], sep='@')  # when the list is empty
''
py2store.key_mappers.tuples.list_of_dsv(d, sep=',')[source]

Converting a dsv (delimiter-separated values) string to the list of it’s components.

Parameters
  • d – A (delimiter-separated values) string

  • sep – The delimiter text used to separate the string into a list of component strings

Returns

A list of component strings corresponding to the input delimiter-separated values (dsv) string

>>> list_of_dsv('a brown fox', sep=' ')
['a', 'brown', 'fox']
>>> tuple(list_of_dsv('jumps/over', sep='/'))  # for filepaths
('jumps', 'over')
>>> list_of_dsv('Sat,Jan,1,1983', sep=',')  # csv: the usual delimiter-separated values format
['Sat', 'Jan', '1', '1983']
>>> list_of_dsv('First:::Last', sep=':::')  # a longer delimiter
['First', 'Last']
>>> list_of_dsv('singleton', sep='@')  # when the list has only one element
['singleton']
>>> list_of_dsv('', sep='@')  # when the string is empty
[]
py2store.key_mappers.tuples.mk_obj_of_str(constructor)[source]

Make a function that transforms a string to an object. The factory making inverses of what mk_str_from_obj makes.

Parameters

constructor – The function (or class) that will be used to make objects from the **kwargs parsed out of the string.

Returns

A function factory.

py2store.key_mappers.tuples.mk_str_of_obj(attrs)[source]

Make a function that transforms objects to strings, using specific attributes of object.

Parameters

attrs – Attributes that should be read off of the object to make the parameters of the string

Returns

A transformation function

>>> from dataclasses import dataclass
>>> @dataclass
... class A:
...     foo: int
...     bar: str
>>> a = A(foo=0, bar='rin')
>>> a
A(foo=0, bar='rin')
>>>
>>> str_from_obj = mk_str_of_obj(['foo', 'bar'])
>>> str_from_obj(a, 'ST{foo}/{bar}/G')
'ST0/rin/G'
py2store.key_mappers.tuples.str_of_tuple(d, str_format)[source]

Convert tuple to str. It’s just str_format.format(*d). Why even write such a function? (1) To have a consistent interface for key conversions (2) We want a KeyValidationError to occur here :param d: tuple if params to str_format :param str_format: Auto fields format string. If you have manual fields, consider auto_field_format_str to convert.

Returns

parametrized string

>>> str_of_tuple(('hello', 'world'), "Well, {} dear {}!")
'Well, hello dear world!'

py2store.key_mappers.paths

Module that forwards to py2store.paths, kept for back-compatibility

py2store.key_mappers.naming

This module only forwards to py2store.naming, and is deprecated.

py2store.key_mappers

key mapping

py2store.errors

Forwards to dol.errors:

Error objects and utils

py2store.slib.s_configparser

Data Object Layer for configparser standard lib.

py2store.slib

modules for standard libs

py2store.slib.s_zipfile

a data object layer for zipfile

exception py2store.slib.s_zipfile.EmptyZipError[source]
class py2store.slib.s_zipfile.FileStreamsOfZip(zip_file, prefix='', open_kws=None)[source]

Like FilesOfZip, but object returns are file streams instead. So you use it like this:

``` z = FileStreamsOfZip(rootdir) with z[relpath] as fp:

… # do stuff with fp, like fp.readlines() or such…

```

class py2store.slib.s_zipfile.FilesOfZip(zip_file, prefix='', open_kws=None)[source]
class py2store.slib.s_zipfile.FlatZipFilesReader(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, zip_reader=<class 'py2store.slib.s_zipfile.ZipReader'>, **zip_reader_kwargs)[source]

Read the union of the contents of multiple zip files. A local file reader whose keys are the zip filepaths of the rootdir and values are corresponding ZipReaders.

exception py2store.slib.s_zipfile.OverwriteNotAllowed[source]
py2store.slib.s_zipfile.ZipFileReader

alias of py2store.slib.s_zipfile.ZipFilesReader

class py2store.slib.s_zipfile.ZipFileStreamsReader(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, *, zip_reader=<class 'py2store.slib.s_zipfile.FileStreamsOfZip'>, **zip_reader_kwargs)

Like ZipFilesReader, but objects returned are file streams instead.

class py2store.slib.s_zipfile.ZipFilesReader(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, zip_reader=<class 'py2store.slib.s_zipfile.ZipReader'>, **zip_reader_kwargs)[source]

A local file reader whose keys are the zip filepaths of the rootdir and values are corresponding ZipReaders.

class py2store.slib.s_zipfile.ZipFilesReaderAndBytesWriter(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, zip_reader=<class 'py2store.slib.s_zipfile.ZipReader'>, **zip_reader_kwargs)[source]

Like ZipFilesReader, but the ability to write bytes (assumed to be valid bytes of the zip format) to a key

class py2store.slib.s_zipfile.ZipReader(zip_file, prefix='', open_kws=None, file_info_filt=None)[source]

A KvReader to read the contents of a zip file. Provides a KV perspective of https://docs.python.org/3/library/zipfile.html

ZipReader has two value categories: Directories and Files. Both categories are distinguishable by the keys, through the “ends with slash” convention.

When a file, the value return is bytes, as usual.

When a directory, the value returned is a ZipReader itself, with all params the same, except for the prefix

which serves to specify the subfolder (that is, ``prefix` acts as a filter).

Note: If you get data zipped by a mac, you might get some junk along with it. Namely __MACOSX folders .DS_Store files. I won’t rant about it, since others have. But you might find it useful to remove them from view. One choice is to use py2store.trans.filt_iter to get a filtered view of the zips contents. In most cases, this should do the job: ` # applied to store instance or class: store = filt_iter(filt=lambda x: not x.startswith('__MACOSX') and '.DS_Store' not in x)(store) `

Another option is just to remove these from the zip file once and for all. In unix-like systems: ` zip -d filename.zip __MACOSX/\* zip -d filename.zip \*/.DS_Store `

Examples

# >>> s = ZipReader(‘/path/to/some_zip_file.zip’) # >>> len(s) # 53432 # >>> list(s)[:3] # the first 3 elements (well… their keys) # [‘odir/’, ‘odir/app/’, ‘odir/app/data/’] # >>> list(s)[-3:] # the last 3 elements (well… their keys) # [‘odir/app/data/audio/d/1574287049078391/m/Ctor.json’, # ‘odir/app/data/audio/d/1574287049078391/m/intensity.json’, # ‘odir/app/data/run/status.json’] # >>> # getting a file (note that by default, you get bytes, so need to decode) # >>> s[‘odir/app/data/run/status.json’].decode() # b’{“test_phase_number”: 9, “test_phase”: “TestActions.IGNORE_TEST”, “session_id”: 0}’ # >>> # when you ask for the contents for a key that’s a directory, # >>> # you get a ZipReader filtered for that prefix: # >>> s[‘odir/app/data/audio/’] # ZipReader(‘/path/to/some_zip_file.zip’, ‘odir/app/data/audio/’, {}, <function take_everything at 0x1538999e0>) # >>> # Often, you only want files (not directories) # >>> # You can filter directories out using the file_info_filt argument # >>> s = ZipReader(‘/path/to/some_zip_file.zip’, file_info_filt=ZipReader.FILES_ONLY) # >>> len(s) # compare to the 53432 above, that contained dirs too # 53280 # >>> list(s)[:3] # first 3 keys are all files now # [‘odir/app/data/plc/d/1574304926795633/d/1574305026895702’, # ‘odir/app/data/plc/d/1574304926795633/d/1574305276853053’, # ‘odir/app/data/plc/d/1574304926795633/d/1574305159343326’] # >>> # >>> # ZipReader.FILES_ONLY and ZipReader.DIRS_ONLY are just convenience filt functions # >>> # Really, you can provide any custom one yourself. # >>> # This filter function should take a ZipInfo object, and return True or False. # >>> # (https://docs.python.org/3/library/zipfile.html#zipfile.ZipInfo) # >>> # >>> import re # >>> p = re.compile(‘audio.*.json$’) # >>> my_filt_func = lambda fileinfo: bool(p.search(fileinfo.filename)) # >>> s = ZipReader(‘/Users/twhalen/Downloads/2019_11_21.zip’, file_info_filt=my_filt_func) # >>> len(s) # 48 # >>> list(s)[:3] # [‘odir/app/data/audio/d/1574333557263758/m/Ctor.json’, # ‘odir/app/data/audio/d/1574333557263758/m/intensity.json’, # ‘odir/app/data/audio/d/1574288084739961/m/Ctor.json’]

class py2store.slib.s_zipfile.ZipStore(zip_filepath, compression=8, allow_overwrites=True, pwd=None)[source]

Zip read and writing. When you want to read zips, there’s the FilesOfZip, ZipReader, or ZipFilesReader we know and love.

Sometimes though, you want to write to zips too. For this, we have ZipStore.

Since ZipStore can write to a zip, it’s read functionality is not going to assume static data, and cache things, as your favorite zip readers did. This, and the acrobatics need to disguise the weird zipfile into something more… key-value natural, makes for a not so efficient store, out of the box.

I advise using one of the zip readers if all you need to do is read, or subclassing or

wrapping ZipStore with caching layers if it is appropriate to you.

py2store.slib.s_zipfile.func_conjunction(func1, func2)[source]

Returns a function that is equivalent to lambda x: func1(x) and func2(x)

py2store.slib.s_zipfile.mk_flatzips_store(dir_of_zips, zip_pair_path_preproc=<built-in function sorted>, mk_store=<class 'py2store.slib.s_zipfile.FlatZipFilesReader'>, **extra_mk_store_kwargs)[source]

A store so that you can work with a folder that has a bunch of zip files, as if they’ve all been extracted in the same folder. Note that zip_pair_path_preproc can be used to control how to resolve key conflicts (i.e. when you get two different zip files that have a same path in their contents). The last path encountered by zip_pair_path_preproc(zip_path_pairs) is the one that will be used, so one should make zip_pair_path_preproc act accordingly.

py2store.base

Forwards to dol.base:

Base classes for making stores. In the language of the collections.abc module, a store is a MutableMapping that is configured to work with a specific representation of keys, serialization of objects (python values), and persistence of the serialized data.

That is, stores offer the same interface as a dict, but where the actual implementation of writes, reads, and listing are configurable.

Consider the following example. You’re store is meant to store waveforms as wav files on a remote server. Say waveforms are represented in python as a tuple (wf, sr), where wf is a list of numbers and sr is the sample rate, an int). The __setitem__ method will specify how to store bytes on a remote server, but you’ll need to specify how to SERIALIZE (wf, sr) to the bytes that constitute that wav file: _data_of_obj specifies that. You might also want to read those wav files back into a python (wf, sr) tuple. The __getitem__ method will get you those bytes from the server, but the store will need to know how to DESERIALIZE those bytes back into a python object: _obj_of_data specifies that

Further, say you’re storing these .wav files in /some/folder/on/the/server/, but you don’t want the store to use these as the keys. For one, it’s annoying to type and harder to read. But more importantly, it’s an irrelevant implementation detail that shouldn’t be exposed. THe _id_of_key and _key_of_id pair are what allow you to add this key interface layer.

These key converters object serialization methods default to the identity (i.e. they return the input as is). This means that you don’t have to implement these as all, and can choose to implement these concerns within the storage methods themselves.

py2store.selectors.mg_selectors

py2store.selectors.mongoquery

py2store.selectors

py2store.parse_format

Modified from https://github.com/r1chardj0n3s/parse

Parse strings using a specification based on the Python format() syntax.

parse() is the opposite of format()

From there it’s a simple thing to parse a string:

>>> parse("It's {}, I love it!", "It's spam, I love it!")
<Result ('spam',) {}>
>>> _[0]
'spam'

Or to search a string for some pattern:

>>> search('Age: {:d}\n', 'Name: Rufus\nAge: 42\nColor: red\n')
<Result (42,) {}>

Or find all the occurrences of some pattern in a string:

>>> ''.join(r.fixed[0] for r in findall(">{}<", "<p>the <b>bold</b> text</p>"))
'the bold text'

If you’re going to use the same pattern to match lots of strings you can compile it once:

>>> p = compile("It's {}, I love it!")
>>> print(p)
<Parser "It's {}, I love it!">
>>> p.parse("It's spam, I love it!")
<Result ('spam',) {}>

(“compile” is not exported for import * usage as it would override the built-in compile() function)

The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True:

>>> parse('SPAM', 'spam', case_sensitive=True) is None
True

Format Syntax

A basic version of the Format String Syntax is supported with anonymous (fixed-position), named and formatted fields:

{[field name]:[format spec]}

Field names must be a valid Python identifiers, including dotted names; element indexes imply dictionaries (see below for example).

Numbered fields are also not supported: the result of parsing will include the parsed fields in the order they are parsed.

The conversion of fields to types other than strings is done based on the type in the format specification, which mirrors the format() behaviour. There are no “!” field conversions like format() has.

Some simple parse() format string examples:

>>> parse("Bring me a {}", "Bring me a shrubbery")
<Result ('shrubbery',) {}>
>>> r = parse("The {} who say {}", "The knights who say Ni!")
>>> print(r)
<Result ('knights', 'Ni!') {}>
>>> print(r.fixed)
('knights', 'Ni!')
>>> r = parse("Bring out the holy {item}", "Bring out the holy hand grenade")
>>> print(r)
<Result () {'item': 'hand grenade'}>
>>> print(r.named)
{'item': 'hand grenade'}
>>> print(r['item'])
hand grenade

Dotted names and indexes are possible though the application must make additional sense of the result:

>>> r = parse("Mmm, {food.type}, I love it!", "Mmm, spam, I love it!")
>>> print(r)
<Result () {'food.type': 'spam'}>
>>> print(r.named)
{'food.type': 'spam'}
>>> print(r['food.type'])
spam
>>> r = parse("My quest is {quest[name]}", "My quest is to seek the holy grail!")
>>> print(r)
<Result () {'quest': {'name': 'to seek the holy grail!'}}>
>>> print(r['quest'])
{'name': 'to seek the holy grail!'}
>>> print(r['quest']['name'])
to seek the holy grail!

If the text you’re matching has braces in it you can match those by including a double-brace {{ or }} in your format string, just like format() does.

Format Specification

Most often a straight format-less {} will suffice where a more complex format specification might have been used.

Most of format()’s Format Specification Mini-Language is supported:

[[fill]align][0][width][.precision][type]

The differences between parse() and format() are:

  • The align operators will cause spaces (or specified fill character) to be stripped from the parsed value. The width is not enforced; it just indicates there may be whitespace or “0”s to strip.

  • Numeric parsing will automatically handle a “0b”, “0o” or “0x” prefix. That is, the “#” format character is handled automatically by d, b, o and x formats. For “d” any will be accepted, but for the others the correct prefix must be present if at all.

  • Numeric sign is handled automatically.

  • The thousands separator is handled automatically if the “n” type is used.

  • The types supported are a slightly different mix to the format() types. Some format() types come directly over: “d”, “n”, “%”, “f”, “e”, “b”, “o” and “x”. In addition some regular expression character group types “D”, “w”, “W”, “s” and “S” are also available.

  • The “e” and “g” types are case-insensitive so there is not need for the “E” or “G” types.

Type

Characters Matched

Output

w

Letters and underscore

str

W

Non-letter and underscore

str

s

Whitespace

str

S

Non-whitespace

str

d

Digits (effectively integer numbers)

int

D

Non-digit

str

n

Numbers with thousands separators (, or .)

int

%

Percentage (converted to value/100.0)

float

f

Fixed-point numbers

float

F

Decimal numbers

Decimal

e

Floating-point numbers with exponent e.g. 1.1e-10, NAN (all case insensitive)

float

g

General number format (either d, f or e)

float

b

Binary numbers

int

o

Octal numbers

int

x

Hexadecimal numbers (lower and upper case)

int

ti

ISO 8601 format date/time e.g. 1972-01-20T10:21:36Z (“T” and “Z” optional)

datetime

te

RFC2822 e-mail format date/time e.g. Mon, 20 Jan 1972 10:21:36 +1000

datetime

tg

Global (day/month) format date/time e.g. 20/1/1972 10:21:36 AM +1:00

datetime

ta

US (month/day) format date/time e.g. 1/20/1972 10:21:36 PM +10:30

datetime

tc

ctime() format date/time e.g. Sun Sep 16 01:03:52 1973

datetime

th

HTTP log format date/time e.g. 21/Nov/2011:00:07:11 +0000

datetime

ts

Linux system log format date/time e.g. Nov 9 03:37:44

datetime

tt

Time e.g. 10:21:36 PM -5:30

time

Some examples of typed parsing with None returned if the typing does not match:

>>> parse('Our {:d} {:w} are...', 'Our 3 weapons are...')
<Result (3, 'weapons') {}>
>>> parse('Our {:d} {:w} are...', 'Our three weapons are...')
>>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM')
<Result (datetime.datetime(2011, 2, 1, 23, 0),) {}>

And messing about with alignment:

>>> parse('with {:>} herring', 'with     a herring')
<Result ('a',) {}>
>>> parse('spam {:^} spam', 'spam    lovely     spam')
<Result ('lovely',) {}>

Note that the “center” alignment does not test to make sure the value is centered - it just strips leading and trailing whitespace.

Width and precision may be used to restrict the size of matched text from the input. Width specifies a minimum size and precision specifies a maximum. For example:

>>> parse('{:.2}{:.2}', 'look')           # specifying precision
<Result ('lo', 'ok') {}>
>>> parse('{:4}{:4}', 'look at that')     # specifying width
<Result ('look', 'at that') {}>
>>> parse('{:4}{:.4}', 'look at that')    # specifying both
<Result ('look at ', 'that') {}>
>>> parse('{:2d}{:2d}', '0440')           # parsing two contiguous numbers
<Result (4, 40) {}>

Some notes for the date and time types:

  • the presence of the time part is optional (including ISO 8601, starting at the “T”). A full datetime object will always be returned; the time will be set to 00:00:00. You may also specify a time without seconds.

  • when a seconds amount is present in the input fractions will be parsed to give microseconds.

  • except in ISO 8601 the day and month digits may be 0-padded.

  • the date separator for the tg and ta formats may be “-” or “/”.

  • named months (abbreviations or full names) may be used in the ta and tg formats in place of numeric months.

  • as per RFC 2822 the e-mail format may omit the day (and comma), and the seconds but nothing else.

  • hours greater than 12 will be happily accepted.

  • the AM/PM are optional, and if PM is found then 12 hours will be added to the datetime object’s hours amount - even if the hour is greater than 12 (for consistency.)

  • in ISO 8601 the “Z” (UTC) timezone part may be a numeric offset

  • timezones are specified as “+HH:MM” or “-HH:MM”. The hour may be one or two digits (0-padded is OK.) Also, the “:” is optional.

  • the timezone is optional in all except the e-mail format (it defaults to UTC.)

  • named timezones are not handled yet.

Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day.

Result and Match Objects

The result of a parse() and search() operation is either None (no match), a Result instance or a Match instance if evaluate_result is False.

The Result instance has three attributes:

fixed

A tuple of the fixed-position, anonymous fields extracted from the input.

named

A dictionary of the named fields extracted from the input.

spans

A dictionary mapping the names and fixed position indices matched to a 2-tuple slice range of where the match occurred in the input. The span does not include any stripped padding (alignment or width).

The Match instance has one method:

evaluate_result()

Generates and returns a Result instance for this Match object.

Custom Type Conversions

If you wish to have matched fields automatically converted to your own type you may pass in a dictionary of type conversion information to parse() and compile().

The converter will be passed the field string matched. Whatever it returns will be substituted in the Result instance for that field.

Your custom type conversions may override the builtin types if you supply one with the same identifier.

>>> def shouty(string):
...    return string.upper()
...
>>> parse('{:shouty} world', 'hello world', dict(shouty=shouty))
<Result ('HELLO',) {}>

If the type converter has the optional pattern attribute, it is used as regular expression for better pattern matching (instead of the default one).

>>> def parse_number(text):
...    return int(text)
>>> parse_number.pattern = r'\d+'
>>> parse('Answer: {number:Number}', 'Answer: 42', dict(Number=parse_number))
<Result () {'number': 42}>
>>> _ = parse('Answer: {:Number}', 'Answer: Alice', dict(Number=parse_number))
>>> assert _ is None, "MISMATCH"

You can also use the with_pattern(pattern) decorator to add this information to a type converter function:

>>> @with_pattern(r'\d+')
... def parse_number(text):
...    return int(text)
>>> parse('Answer: {number:Number}', 'Answer: 42', dict(Number=parse_number))
<Result () {'number': 42}>

A more complete example of a custom type might be:

>>> yesno_mapping = {
...     "yes":  True,   "no":    False,
...     "on":   True,   "off":   False,
...     "true": True,   "false": False,
... }
>>> @with_pattern(r"|".join(yesno_mapping))
... def parse_yesno(text):
...     return yesno_mapping[text.lower()]

If the type converter pattern uses regex-grouping (with parenthesis), you should indicate this by using the optional regex_group_count parameter in the with_pattern() decorator:

>>> @with_pattern(r'((\d+))', regex_group_count=2)
... def parse_number2(text):
...    return int(text)
>>> parse('Answer: {:Number2} {:Number2}', 'Answer: 42 43', dict(Number2=parse_number2))
<Result (42, 43) {}>

Otherwise, this may cause parsing problems with unnamed/fixed parameters.

Potential Gotchas

parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:

>>> pattern = '{dir1}/{dir2}'
>>> data = 'root/parent/subdir'
>>> sorted(parse(pattern, data).named.items())
[('dir1', 'root'), ('dir2', 'parent/subdir')]

So, even though {‘dir1’: ‘root/parent’, ‘dir2’: ‘subdir’} would also fit the pattern, the actual match represents the shortest successful match for dir1.


Version history (in brief):

  • 1.9.0 We now honor precision and width specifiers when parsing numbers and strings, allowing parsing of concatenated elements of fixed width (thanks Julia Signell)

  • 1.8.4 Add LICENSE file at request of packagers. Correct handling of AM/PM to follow most common interpretation. Correct parsing of hexadecimal that looks like a binary prefix. Add ability to parse case sensitively. Add parsing of numbers to Decimal with “F” (thanks John Vandenberg)

  • 1.8.3 Add regex_group_count to with_pattern() decorator to support user-defined types that contain brackets/parenthesis (thanks Jens Engel)

  • 1.8.2 add documentation for including braces in format string

  • 1.8.1 ensure bare hexadecimal digits are not matched

  • 1.8.0 support manual control over result evaluation (thanks Timo Furrer)

  • 1.7.0 parse dict fields (thanks Mark Visser) and adapted to allow more than 100 re groups in Python 3.5+ (thanks David King)

  • 1.6.6 parse Linux system log dates (thanks Alex Cowan)

  • 1.6.5 handle precision in float format (thanks Levi Kilcher)

  • 1.6.4 handle pipe “|” characters in parse string (thanks Martijn Pieters)

  • 1.6.3 handle repeated instances of named fields, fix bug in PM time overflow

  • 1.6.2 fix logging to use local, not root logger (thanks Necku)

  • 1.6.1 be more flexible regarding matched ISO datetimes and timezones in general, fix bug in timezones without “:” and improve docs

  • 1.6.0 add support for optional pattern attribute in user-defined types (thanks Jens Engel)

  • 1.5.3 fix handling of question marks

  • 1.5.2 fix type conversion error with dotted names (thanks Sebastian Thiel)

  • 1.5.1 implement handling of named datetime fields

  • 1.5 add handling of dotted field names (thanks Sebastian Thiel)

  • 1.4.1 fix parsing of “0” in int conversion (thanks James Rowe)

  • 1.4 add __getitem__ convenience access on Result.

  • 1.3.3 fix Python 2.5 setup.py issue.

  • 1.3.2 fix Python 3.2 setup.py issue.

  • 1.3.1 fix a couple of Python 3.2 compatibility issues.

  • 1.3 added search() and findall(); removed compile() from import * export as it overwrites builtin.

  • 1.2 added ability for custom and override type conversions to be provided; some cleanup

  • 1.1.9 to keep things simpler number sign is handled automatically; significant robustification in the face of edge-case input.

  • 1.1.8 allow “d” fields to have number base “0x” etc. prefixes; fix up some field type interactions after stress-testing the parser; implement “%” type.

  • 1.1.7 Python 3 compatibility tweaks (2.5 to 2.7 and 3.2 are supported).

  • 1.1.6 add “e” and “g” field types; removed redundant “h” and “X”; removed need for explicit “#”.

  • 1.1.5 accept textual dates in more places; Result now holds match span positions.

  • 1.1.4 fixes to some int type conversion; implemented “=” alignment; added date/time parsing with a variety of formats handled.

  • 1.1.3 type conversion is automatic based on specified field types. Also added “f” and “n” types.

  • 1.1.2 refactored, added compile() and limited from parse import *

  • 1.1.1 documentation improvements

  • 1.1.0 implemented more of the Format Specification Mini-Language and removed the restriction on mixing fixed-position and named fields

  • 1.0.0 initial release

This code is copyright 2012-2017 Richard Jones <richard@python.org> See the end of the source file for the license of use.

py2store.parse_format.findall(format, string, pos=0, endpos=None, extra_types=None, evaluate_result=True, case_sensitive=False)[source]

Search “string” for all occurrences of “format”.

You will be returned an iterator that holds Result instances for each format match found.

Optionally start the search at “pos” character index and limit the search to a maximum index of endpos - equivalent to search(string[:endpos]).

If evaluate_result is True each returned Result instance has two attributes:

.fixed - tuple of fixed-position values from the string .named - dict of named values from the string

If evaluate_result is False each returned value is a Match instance with one method:

.evaluate_result() - This will return a Result instance like you would get

with evaluate_result set to True

The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True.

If the format is invalid a ValueError will be raised.

See the module documentation for the use of “extra_types”.

py2store.parse_format.parse(format, string, extra_types=None, evaluate_result=True, case_sensitive=False)[source]

Using “format” attempt to pull values from “string”.

The format must match the string contents exactly. If the value you’re looking for is instead just a part of the string use search().

If evaluate_result is True the return value will be an Result instance with two attributes:

.fixed - tuple of fixed-position values from the string .named - dict of named values from the string

If evaluate_result is False the return value will be a Match instance with one method:

.evaluate_result() - This will return a Result instance like you would get

with evaluate_result set to True

The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True.

If the format is invalid a ValueError will be raised.

See the module documentation for the use of “extra_types”.

In the case there is no match parse() will return None.

py2store.parse_format.search(format, string, pos=0, endpos=None, extra_types=None, evaluate_result=True, case_sensitive=False)[source]

Search “string” for the first occurrence of “format”.

The format may occur anywhere within the string. If instead you wish for the format to exactly match the string use parse().

Optionally start the search at “pos” character index and limit the search to a maximum index of endpos - equivalent to search(string[:endpos]).

If evaluate_result is True the return value will be an Result instance with two attributes:

.fixed - tuple of fixed-position values from the string .named - dict of named values from the string

If evaluate_result is False the return value will be a Match instance with one method:

.evaluate_result() - This will return a Result instance like you would get

with evaluate_result set to True

The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True.

If the format is invalid a ValueError will be raised.

See the module documentation for the use of “extra_types”.

In the case there is no match parse() will return None.

py2store.parse_format.with_pattern(pattern, regex_group_count=None)[source]

Attach a regular expression pattern matcher to a custom type converter function.

This annotates the type converter with the pattern attribute.

Example

>>> @with_pattern(r"\d+")
... def parse_number(text):
...     return int(text)

is equivalent to:

>>> def parse_number(text):
...     return int(text)
>>> parse_number.pattern = r"\d+"
Parameters
  • pattern – regular expression pattern (as text)

  • regex_group_count – Indicates how many regex-groups are in pattern.

Returns

wrapped function