py2store.filesys¶
Forwards to dol.filesys:
File system access
py2store.misc¶
Functions to read from and write to misc sources
-
class
py2store.misc.
MiscGetter
(store=<py2store.persisters.local_files.PathFormatPersister object>, incoming_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function <lambda>>, '.gz': <function decompress>, '.gzip': <function decompress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>, '.zip': <class 'py2store.slib.s_zipfile.FilesOfZip'>}, dflt_incoming_val_trans=<function identity_method>, func_key=<function MiscGetter.<lambda>>)[source]¶ An object to write (and only write) to a store (default local files) with automatic deserialization according to a property of the key (default: file extension).
>>> from py2store.misc import get_obj, misc_objs_get >>> import os >>> import json >>> >>> pjoin = lambda *p: os.path.join(os.path.expanduser('~'), *p) >>> path = pjoin('tmp.json') >>> d = {'a': {'b': {'c': [1, 2, 3]}}} >>> json.dump(d, open(path, 'w')) # putting a json file there, the normal way, so we can use it later >>> >>> k = path >>> t = get_obj(k) # if you'd like to use a function >>> assert t == d >>> tt = misc_objs_get[k] # if you'd like to use an object (note: can get, but nothing else (no list, set, del, etc)) >>> assert tt == d >>> t {'a': {'b': {'c': [1, 2, 3]}}}
-
class
py2store.misc.
MiscGetterAndSetter
(store=<py2store.persisters.local_files.PathFormatPersister object>, incoming_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function <lambda>>, '.gz': <function decompress>, '.gzip': <function decompress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>, '.zip': <class 'py2store.slib.s_zipfile.FilesOfZip'>}, outgoing_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function csv_fileobj>, '.gz': <function compress>, '.gzip': <function compress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>}, dflt_incoming_val_trans=<function identity_method>, func_key=<function MiscGetterAndSetter.<lambda>>)[source]¶ An object to read and write (and nothing else) to a store (default local) with automatic (de)serialization according to a property of the key (default: file extension).
>>> from py2store.misc import set_obj, misc_objs # the function and the object >>> import json >>> import os >>> >>> pjoin = lambda *p: os.path.join(os.path.expanduser('~'), *p) >>> >>> d = {'a': {'b': {'c': [1, 2, 3]}}} >>> misc_objs[pjoin('tmp.json')] = d >>> filepath = os.path.expanduser('~/tmp.json') >>> assert misc_objs[filepath] == d # yep, it's there, and can be retrieved >>> assert json.load(open(filepath)) == d # in case you don't believe it's an actual json file >>> >>> # using pickle >>> misc_objs[pjoin('tmp.pkl')] = d >>> assert misc_objs[pjoin('tmp.pkl')] == d >>> >>> # using txt >>> misc_objs[pjoin('tmp.txt')] = 'hello world!' >>> assert misc_objs[pjoin('tmp.txt')] == 'hello world!' >>> >>> # using csv >>> misc_objs[pjoin('tmp.csv')] = [[1,2,3], ['a','b','c']] >>> assert misc_objs[pjoin('tmp.csv')] == [['1','2','3'], ['a','b','c']] # yeah, well, not numbers, but you deal with it >>> >>> # using bin ... misc_objs[pjoin('tmp.bin')] = b'let us pretend these are bytes of an audio waveform' >>> assert misc_objs[pjoin('tmp.bin')] == b'let us pretend these are bytes of an audio waveform'
-
class
py2store.misc.
MiscReaderMixin
(incoming_val_trans_for_key=None, dflt_incoming_val_trans=None, func_key=None)[source]¶ Mixin to transform incoming vals according to the key their under. Warning: If used as a subclass, this mixin should (in general) be placed before the store
>>> # make a reader that will wrap a dict >>> class MiscReader(MiscReaderMixin, dict): ... def __init__(self, d, ... incoming_val_trans_for_key=None, ... dflt_incoming_val_trans=None, ... func_key=None): ... dict.__init__(self, d) ... MiscReaderMixin.__init__(self, incoming_val_trans_for_key, dflt_incoming_val_trans, func_key) ... >>> >>> incoming_val_trans_for_key = dict( ... MiscReaderMixin._incoming_val_trans_for_key, # take the existing defaults... ... **{'.bin': lambda v: [ord(x) for x in v.decode()], # ... override how to handle the .bin extension ... '.reverse_this': lambda v: v[::-1] # add a new extension (and how to handle it) ... }) >>> >>> import pickle >>> d = { ... 'a.bin': b'abc123', ... 'a.reverse_this': b'abc123', ... 'a.csv': b'event,year\n Magna Carta,1215\n Guido,1956', ... 'a.txt': b'this is not a text', ... 'a.pkl': pickle.dumps(['text', [str, map], {'a list': [1, 2, 3]}]), ... 'a.json': '{"str": "field", "int": 42, "float": 3.14, "array": [1, 2], "nested": {"a": 1, "b": 2}}', ... } >>> >>> s = MiscReader(d=d, incoming_val_trans_for_key=incoming_val_trans_for_key) >>> list(s) ['a.bin', 'a.reverse_this', 'a.csv', 'a.txt', 'a.pkl', 'a.json'] >>> s['a.bin'] [97, 98, 99, 49, 50, 51] >>> s['a.reverse_this'] b'321cba' >>> s['a.csv'] [['event', 'year'], [' Magna Carta', '1215'], [' Guido', '1956']] >>> s['a.pkl'] ['text', [<class 'str'>, <class 'map'>], {'a list': [1, 2, 3]}] >>> s['a.json'] {'str': 'field', 'int': 42, 'float': 3.14, 'array': [1, 2], 'nested': {'a': 1, 'b': 2}}
-
class
py2store.misc.
MiscStoreMixin
(incoming_val_trans_for_key=None, outgoing_val_trans_for_key=None, dflt_incoming_val_trans=None, dflt_outgoing_val_trans=None, func_key=None)[source]¶ Mixin to transform incoming and outgoing vals according to the key their under. Warning: If used as a subclass, this mixin should (in general) be placed before the store
See also: preset and postget args from wrap_kvs decorator from py2store.trans.
>>> # Make a class to wrap a dict with a layer that transforms written and read values >>> class MiscStore(MiscStoreMixin, dict): ... def __init__(self, d, ... incoming_val_trans_for_key=None, outgoing_val_trans_for_key=None, ... dflt_incoming_val_trans=None, dflt_outgoing_val_trans=None, ... func_key=None): ... dict.__init__(self, d) ... MiscStoreMixin.__init__(self, incoming_val_trans_for_key, outgoing_val_trans_for_key, ... dflt_incoming_val_trans, dflt_outgoing_val_trans, func_key) ... >>> >>> outgoing_val_trans_for_key = dict( ... MiscStoreMixin._outgoing_val_trans_for_key, # take the existing defaults... ... **{'.bin': lambda v: ''.join([chr(x) for x in v]).encode(), # ... override how to handle the .bin extension ... '.reverse_this': lambda v: v[::-1] # add a new extension (and how to handle it) ... }) >>> ss = MiscStore(d={}, # store starts empty ... incoming_val_trans_for_key={}, # overriding incoming trans so we can see the raw data later ... outgoing_val_trans_for_key=outgoing_val_trans_for_key) ... >>> # here's what we're going to write in the store >>> data_to_write = { ... 'a.bin': [97, 98, 99, 49, 50, 51], ... 'a.reverse_this': b'321cba', ... 'a.csv': [['event', 'year'], [' Magna Carta', '1215'], [' Guido', '1956']], ... 'a.txt': 'this is not a text', ... 'a.pkl': ['text', [str, map], {'a list': [1, 2, 3]}], ... 'a.json': {'str': 'field', 'int': 42, 'float': 3.14, 'array': [1, 2], 'nested': {'a': 1, 'b': 2}}} >>> # write this data in our store >>> for k, v in data_to_write.items(): ... ss[k] = v >>> list(ss) ['a.bin', 'a.reverse_this', 'a.csv', 'a.txt', 'a.pkl', 'a.json'] >>> # Looking at the contents (what was actually stored/written) >>> for k, v in ss.items(): ... if k != 'a.pkl': ... print(f"{k}: {v}") ... else: # need to verify pickle data differently, since printing contents is problematic in doctest ... assert pickle.loads(v) == data_to_write['a.pkl'] a.bin: b'abc123' a.reverse_this: b'abc123' a.csv: b'event,year\r\n Magna Carta,1215\r\n Guido,1956\r\n' a.txt: b'this is not a text' a.json: b'{"str": "field", "int": 42, "float": 3.14, "array": [1, 2], "nested": {"a": 1, "b": 2}}'
-
py2store.misc.
get_obj
(k, store=<py2store.persisters.local_files.PathFormatPersister object>, incoming_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function <lambda>>, '.gz': <function decompress>, '.gzip': <function decompress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>, '.zip': <class 'py2store.slib.s_zipfile.FilesOfZip'>}, dflt_incoming_val_trans=<function identity_method>, func_key=<function <lambda>>)[source]¶ A quick way to get an object, with default… everything (but the key, you know, a clue of what you want)
-
py2store.misc.
set_obj
(k, v, store=<py2store.persisters.local_files.PathFormatPersister object>, outgoing_val_trans_for_key={'.bin': <function identity_method>, '.cnf': <function <lambda>>, '.conf': <function <lambda>>, '.config': <function <lambda>>, '.csv': <function csv_fileobj>, '.gz': <function compress>, '.gzip': <function compress>, '.ini': <function <lambda>>, '.json': <function <lambda>>, '.pickle': <function <lambda>>, '.pkl': <function <lambda>>, '.txt': <function <lambda>>}, func_key=<function <lambda>>)[source]¶ A quick way to get an object, with default… everything (but the key, you know, a clue of what you want)
py2store.mixins¶
Forwards to dol.mixins:
Mixins
py2store.test.util¶
utils for testing
-
py2store.test.util.
random_dict_gen
(fields=('a', 'b', 'c'), word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n: int = 100)[source]¶ Random dict (of strings) generator
- Parameters
fields – Field names for the random dicts
word_size_range – An int, 2-tuple of ints, or list-like object that defines the choices of word sizes
alphabet – A string or iterable defining the alphabet to draw from
n – The number of elements the generator will yield
- Returns
Random dict (of strings) generator
-
py2store.test.util.
random_formatted_str_gen
(format_string='root/{}/{}_{}.test', word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n=100)[source]¶ Random formatted string generator
- Parameters
format_string – A format string
word_size_range – An int, 2-tuple of ints, or list-like object that defines the choices of word sizes
alphabet – A string or iterable defining the alphabet to draw from
n – The number of elements the generator will yield
- Returns
Yields random strings of the format defined by format_string
Examples
# >>> list(random_formatted_str_gen(‘root/{}/{}_{}.test’, (2, 5), ‘abc’, n=5)) [(‘root/acba/bb_abc.test’,),
(‘root/abcb/cbbc_ca.test’,), (‘root/ac/ac_cc.test’,), (‘root/aacc/ccbb_ab.test’,), (‘root/aab/abb_cbab.test’,)]
>>> # The following will be made not random (by restricting the constraints to "no choice" >>> # ... this is so that we get consistent outputs to assert for the doc test. >>> >>> # Example with automatic specification >>> list(random_formatted_str_gen('root/{}/{}_{}.test', (3, 4), 'a', n=2)) [('root/aaa/aaa_aaa.test',), ('root/aaa/aaa_aaa.test',)] >>> >>> # Example with manual specification >>> list(random_formatted_str_gen('indexed field: {0}: named field: {name}', (2, 3), 'z', n=1)) [('indexed field: zz: named field: zz',)]
-
py2store.test.util.
random_string
(length=7, alphabet='abcdefghijklmnopqrstuvwxyz')[source]¶ Same as random_word, but it optimized for strings (5-10% faster for words of length 7, 25-30% faster for words of size 1000)
-
py2store.test.util.
random_tuple_gen
(tuple_length=3, word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n: int = 100)[source]¶ Random tuple (of strings) generator
- Parameters
tuple_length – The length of the tuples generated
word_size_range – An int, 2-tuple of ints, or list-like object that defines the choices of word sizes
alphabet – A string or iterable defining the alphabet to draw from
n – The number of elements the generator will yield
- Returns
Random tuple (of strings) generator
-
py2store.test.util.
random_word
(length, alphabet, concat_func=<built-in function add>)[source]¶ Make a random word by concatenating randomly drawn elements from alphabet together :param length: Length of the word :param alphabet: Alphabet to draw from :param concat_func: The concatenation function (e.g. + for strings and lists)
Note: Repeated elements in alphabet will have more chances of being drawn.
- Returns
A word (whose type depends on what concatenating elements from alphabet produces).
Not making this a proper doctest because I don’t know how to seed the global random temporarily >>> t = random_word(4, ‘abcde’); # e.g. ‘acae’ >>> t = random_word(5, [‘a’, ‘b’, ‘c’]); # e.g. ‘cabba’ >>> t = random_word(4, [[1, 2, 3], [40, 50], [600], [7000]]); # e.g. [40, 50, 7000, 7000, 1, 2, 3] >>> t = random_word(4, [1, 2, 3, 4]); # e.g. 13 (because adding numbers…) >>> # … sometimes it’s what you want: >>> t = random_word(4, [2 ** x for x in range(8)]); # e.g. 105 (binary combination) >>> t = random_word(4, [1, 2, 3, 4], concat_func=lambda x, y: str(x) + str(y)); # e.g. ‘4213’ >>> t = random_word(4, [1, 2, 3, 4], concat_func=lambda x, y: int(str(x) + str(y))); # e.g. 3432
-
py2store.test.util.
random_word_gen
(word_size_range=(1, 10), alphabet='abcdefghijklmnopqrstuvwxyz', n=100)[source]¶ Random string generator :param word_size_range: An int, 2-tuple of ints, or list-like object that defines the choices of word sizes :param alphabet: A string or iterable defining the alphabet to draw from :param n: The number of elements the generator will yield
- Returns
Random string generator
py2store.test.quick¶
py2store.test¶
test files
py2store.test.simple¶
py2store.test.scrap¶
scrap code
py2store.util¶
Forwards to dol.util:
General util objects
py2store.ext.docx¶
Simple access to docx (Word Doc) elements.
py2store.ext.gitlab¶
Stores to talk to gitlab, using requests.
Example: ``` ogl = GitLabAccessor(base_url=”http://…”, project_name=None)
print(ogl.get_project_names()) # prints all project names ogl.set_project(“PROJECT_NAME”) # sets the project to “PROJECT_NAME” print(
ogl.get_branch_names()
) # gets the branch names of current project (as set previously) print(
ogl.get_branch(“master”)
) # gets a json of information about the master branch of current project. ```
py2store.ext.hdf¶
a data object layer for HDF files
py2store.ext¶
py2store Extensions, Add-ons, etc. We kept py2store purely dependency-less, using only built-ins for everything but storage system connectors.
That said, in order to provide the user with more power, and show him/her how py2store tools can be used to build powerful data accessors, we provide specialized modules that do require more than builtins. These dependencies are not listed in the setup.py module, but we wrap their imports with informative ImportError handlers.
py2store.ext.matlab¶
a data object layer for matlab
py2store.ext.kaggle¶
py2store.ext.module_imports¶
py2store.ext.audio¶
py2store.ext.github¶
a data object layer for github
py2store.ext.dataframes¶
Data as pandas.DataFrame from various sources
py2store.access¶
Utils to load stores from store specifications. Includes the logic to allow configurations (and defaults) to be parametrized by external environmental variables and files.
Every data-sourced problem has it’s problem-relevant stores. Once you get your stores right, along with the right access credentials, indexing, serialization, caching, filtering etc. you’d like to be able to name, save and/or share this specification, and easily get access to it later on.
Here are tools to help you out.
There are two main key-value stores: One for configurations the user wants to reuse, and the other for the user’s desired defaults. Both have the same structure:
first level key: Name of the resource (should be a valid python variable name)
The reminder is more or less free form (until the day we lay out some schemas for this)
The system will look for the specification of user_configs and user_defaults in a json file. The filepath to this json file can specified in environment variables
PY2STORE_CONFIGS_JSON_FILEPATH and PY2STORE_DEFAULTS_JSON_FILEPATH
respectively. By default, they are:
~/.py2store_configs.json and ~/.py2store_defaults.json
respectively.
-
py2store.access.
compose
(*functions)[source]¶ Make a function that is the composition of the input functions
-
py2store.access.
dflt_func_loader
(f) → callable[source]¶ Loads and returns the function referenced by f, which could be a callable or a DOTPATH_TO_MODULE.FUNC_NAME dotpath string to one, or a pipeline of these
-
py2store.access.
dotpath_to_func
(f: (<class 'str'>, <built-in function callable>)) → callable[source]¶ Loads and returns the function referenced by f, which could be a callable or a DOTPATH_TO_MODULE.FUNC_NAME dotpath string to one.
-
py2store.access.
dotpath_to_obj
(dotpath)[source]¶ Loads and returns the object referenced by the string DOTPATH_TO_MODULE.OBJ_NAME
-
py2store.access.
fakit
(fak, func_loader=<function dflt_func_loader>)[source]¶ Execute a fak with given f, a, k and function loader.
Essentially returns func_loader(f)(*a, **k)
- Parameters
fak – A (f, a, k) specification. Could be a tuple or a dict (with ‘f’, ‘a’, ‘k’ keys). All but f are optional.
func_loader – A function returning a function. This is where you specify any validation of func specification f, and/or how to get a callable from it.
Returns: A python object.
py2store.__init__¶
Your portal to many Data Object Layer goodies
py2store.stores.s3_store¶
Forwards to s3dol.s3_store
py2store.stores.delegation_stores¶
py2store.stores.sql_w_sqlalchemy¶
Forwards to sqldol
py2store.stores.arangodb_store¶
py2store.stores.dropbox_store¶
Forwards to dropboxdol
py2store.stores.local_store¶
stores to operate on local files
-
class
py2store.stores.local_store.
AutoMkDirsOnSetitemMixin
[source]¶ A mixin that will automatically create directories on setitem, when missing.
-
class
py2store.stores.local_store.
AutoMkPathformatMixin
(path_format=None, max_levels=None)[source]¶ A mixin that will choose a path_format if none given
-
class
py2store.stores.local_store.
DirStore
(rootdir)[source]¶ A store for local directories. Keys are directory names and values are subdirectory DirStores.
>>> from py2store import __file__ >>> import os >>> root = os.path.dirname(__file__) >>> s = DirStore(root) >>> assert set(s).issuperset({'stores', 'persisters', 'serializers', 'key_mappers'})
-
class
py2store.stores.local_store.
LocalBinaryStore
(path_format, max_levels=None)[source]¶ Local files store for binary data
-
class
py2store.stores.local_store.
LocalJsonStore
(path_format, max_levels=None)[source]¶ Local files store for text dataData is assumed to be a JSON string, and is loaded with json.loads and dumped with json.dumps
-
class
py2store.stores.local_store.
LocalPickleStore
(path_format, max_levels=None, fix_imports=True, protocol=None, pickle_encoding='ASCII', pickle_errors='strict', **open_kwargs)[source]¶ Local files store with pickle serialization
-
py2store.stores.local_store.
LocalStore
¶
-
class
py2store.stores.local_store.
LocalTextStore
(path_format, max_levels=None)[source]¶ Local files store for text data
-
class
py2store.stores.local_store.
MakeMissingDirsStoreMixin
[source]¶ Will make a local file store automatically create the directories needed to create a file. Should be placed before the concrete perisister in the mro but in such a manner so that it receives full paths.
-
class
py2store.stores.local_store.
PathFormatStore
(path_format, max_levels: int = inf, mode='', **open_kwargs)[source]¶ Local file store using templated relative paths.
>>> from tempfile import gettempdir >>> import os >>> >>> def write_to_key(fullpath_of_relative_path, relative_path, content): # a function to write content in files ... with open(fullpath_of_relative_path(relative_path), 'w') as fp: ... fp.write(content) >>> >>> # Preparation: Make a temporary rootdir and write two files in it >>> rootdir = os.path.join(gettempdir(), 'path_format_store_test' + os.sep) >>> if not os.path.isdir(rootdir): ... os.mkdir(rootdir) >>> # recreate directory (remove existing files, delete directory, and re-create it) >>> for f in os.listdir(rootdir): ... fullpath = os.path.join(rootdir, f) ... if os.path.isfile(fullpath): ... os.remove(os.path.join(rootdir, f)) >>> if os.path.isdir(rootdir): ... os.rmdir(rootdir) >>> if not os.path.isdir(rootdir): ... os.mkdir(rootdir) >>> >>> filepath_of = lambda p: os.path.join(rootdir, p) # a function to get a fullpath from a relative one >>> # and make two files in this new dir, with some content >>> write_to_key(filepath_of, 'a', 'foo') >>> write_to_key(filepath_of, 'b', 'bar') >>> >>> # point the obj source to the rootdir >>> s = PathFormatStore(path_format=rootdir) >>> >>> # assert things... >>> assert s._prefix == rootdir # the _rootdir is the one given in constructor >>> assert s[filepath_of('a')] == 'foo' # (the filepath for) 'a' contains 'foo' >>> >>> # two files under rootdir (as long as the OS didn't create it's own under the hood) >>> len(s) 2 >>> assert list(s) == [filepath_of('a'), filepath_of('b')] # there's two files in s >>> filepath_of('a') in s # rootdir/a is in s True >>> filepath_of('not_there') in s # rootdir/not_there is not in s False >>> filepath_of('not_there') not in s # rootdir/not_there is not in s True >>> assert list(s.keys()) == [filepath_of('a'), filepath_of('b')] # the keys (filepaths) of s >>> sorted(list(s.values())) # the values of s (contents of files) ['bar', 'foo'] >>> assert list(s.items()) == [(filepath_of('a'), 'foo'), (filepath_of('b'), 'bar')] # the (path, content) items >>> assert s.get('this key is not there', None) is None # trying to get the val of a non-existing key returns None >>> s.get('this key is not there', 'some default value') # ... or whatever you say 'some default value' >>> >>> # add more files to the same folder >>> write_to_key(filepath_of, 'this.txt', 'this') >>> write_to_key(filepath_of, 'that.txt', 'blah') >>> write_to_key(filepath_of, 'the_other.txt', 'bloo') >>> # see that you now have 5 files >>> len(s) 5 >>> # and these files contain values: >>> sorted(s.values()) ['bar', 'blah', 'bloo', 'foo', 'this'] >>> >>> # but if we make an obj source to only take files whose extension is '.txt'... >>> s = PathFormatStore(path_format=rootdir + '{}.txt') >>> >>> rootdir_2 = os.path.join(gettempdir(), 'obj_source_test_2') # get another rootdir >>> if not os.path.isdir(rootdir_2): ... os.mkdir(rootdir_2) >>> filepath_of_2 = lambda p: os.path.join(rootdir_2, p) >>> # and make two files in this new dir, with some content >>> write_to_key(filepath_of, 'this.txt', 'this') >>> write_to_key(filepath_of, 'that.txt', 'blah') >>> write_to_key(filepath_of, 'the_other.txt', 'bloo') >>> >>> ss = PathFormatStore(path_format=rootdir_2 + '{}.txt') >>> >>> assert s != ss # though pointing to identical content, o and oo are not equal since the paths are not equal!
-
py2store.stores.local_store.
PickleStore
¶
-
class
py2store.stores.local_store.
QuickBinaryStore
(path_format=None, max_levels=None)[source]¶ Local files store for binary data with default temp root and auto dir generation on write.
-
class
py2store.stores.local_store.
QuickJsonStore
(path_format=None, max_levels=None)[source]¶ Local files store for text data with default temp root and auto dir generation on write.Data is assumed to be a JSON string, and is loaded with json.loads and dumped with json.dumps
-
class
py2store.stores.local_store.
QuickLocalStoreMixin
(path_format=None, max_levels=None)[source]¶ A mixin that will choose a path_format if none given, and will automatically create directories on setitem, when missing.
-
class
py2store.stores.local_store.
QuickPickleStore
(path_format=None, max_levels=None)[source]¶ Local files store with pickle serialization with default temp root and auto dir generation on write.
-
py2store.stores.local_store.
QuickStore
¶
py2store.stores¶
a package of various stores
py2store.stores.couchdb_store¶
py2store.stores.mongo_store¶
py2store.core¶
Forwards to dol.core:
Core tools
py2store.utils.uri_utils¶
utils to work with URIs
py2store.utils.explicit¶
utils to make stores based on a the input data itself
-
class
py2store.utils.explicit.
ExplicitKeymapReader
(store, key_of_id=None, id_of_key=None)[source]¶ Wrap a store (instance) so that it gets it’s keys from an explicit iterable of keys.
>>> s = {'a': 1, 'b': 2, 'c': 3, 'd': 4} >>> id_of_key = {'A': 'a', 'C': 'c'} >>> ss = ExplicitKeymapReader(s, id_of_key=id_of_key) >>> list(ss) ['A', 'C'] >>> ss['C'] # will look up 'C', find 'c', and call the store on that. 3
-
class
py2store.utils.explicit.
ExplicitKeys
(key_collection: Collection)[source]¶ py2store.base.Keys implementation that gets it’s keys explicitly from a collection given at initialization time. The key_collection must be a collections.abc.Collection (such as list, tuple, set, etc.)
>>> keys = ExplicitKeys(key_collection=['foo', 'bar', 'alice']) >>> 'foo' in keys True >>> 'not there' in keys False >>> list(keys) ['foo', 'bar', 'alice']
-
class
py2store.utils.explicit.
ExplicitKeysSource
(key_collection: Collection, _obj_of_key: Callable)[source]¶ An object source that uses an explicit keys collection and a specified function to read contents for a key.
-
class
py2store.utils.explicit.
ExplicitKeysStore
(store, key_collection)[source]¶ Wrap a store (instance) so that it gets it’s keys from an explicit iterable of keys.
>>> s = {'a': 1, 'b': 2, 'c': 3, 'd': 4} >>> list(s) ['a', 'b', 'c', 'd'] >>> ss = ExplicitKeysStore(s, ['d', 'a']) >>> len(ss) 2 >>> list(ss) ['d', 'a'] >>> list(ss.values()) [4, 1] >>> ss.head() ('d', 4)
-
class
py2store.utils.explicit.
ExplicitKeysWithPrefixRelativization
(key_collection, _prefix=None)[source]¶ py2store.base.Keys implementation that gets it’s keys explicitly from a collection given at initialization time. The key_collection must be a collections.abc.Collection (such as list, tuple, set, etc.)
>>> from py2store.base import Store >>> s = ExplicitKeysWithPrefixRelativization(key_collection=['/root/of/foo', '/root/of/bar', '/root/for/alice']) >>> keys = Store(store=s) >>> 'of/foo' in keys True >>> 'not there' in keys False >>> list(keys) ['of/foo', 'of/bar', 'for/alice']
-
class
py2store.utils.explicit.
ObjReader
(_obj_of_key: Callable)[source]¶ A reader that uses a specified function to get the contents for a given key.
>>> # define a contents_of_key that reads stuff from a dict >>> data = {'foo': 'bar', 42: "everything"} >>> def read_dict(k): ... return data[k] >>> pr = ObjReader(_obj_of_key=read_dict) >>> pr['foo'] 'bar' >>> pr[42] 'everything' >>> >>> # define contents_of_key that reads stuff from a file given it's path >>> def read_file(path): ... with open(path) as fp: ... return fp.read() >>> pr = ObjReader(_obj_of_key=read_file) >>> file_where_this_code_is = __file__ # it should be THIS file you're reading right now! >>> print(pr[file_where_this_code_is][62:155]) # print some characters of this file from collections.abc import Mapping from typing import Callable, Collection as CollectionType
-
py2store.utils.explicit.
invertible_maps
(mapping=None, inv_mapping=None)[source]¶ Returns two maps that are inverse of each other. Raises an AssertionError iif both maps are None, or if the maps are not inverse of each other
Get a pair of invertible maps >>> invertible_maps({1: 11, 2: 22}) ({1: 11, 2: 22}, {11: 1, 22: 2}) >>> invertible_maps(None, {11: 1, 22: 2}) ({1: 11, 2: 22}, {11: 1, 22: 2})
If two maps are given and invertible, you just get them back >>> invertible_maps({1: 11, 2: 22}, {11: 1, 22: 2}) ({1: 11, 2: 22}, {11: 1, 22: 2})
Or if they’re not invertible >>> invertible_maps({1: 11, 2: 22}, {11: 1, 22: ‘ha, not what you expected!’}) Traceback (most recent call last):
…
AssertionError: mapping and inv_mapping are not inverse of each other!
>>> invertible_maps(None, None) Traceback (most recent call last): ... ValueError: You need to specify one or both maps
py2store.utils.timeseries_caching¶
Tools to cache time-series data.
-
class
py2store.utils.timeseries_caching.
RegularTimeseriesCache
(data_rate=1, time_rate=1, maxlen=None)[source]¶ A type that pretends to be a (possibly very large) list, but where contents of the list are populated as they are needed. Further, the indexing of the list can be overwritten for the convenience of the user.
The canonical application is where we have segments of continuous waveform indexed by utc microseconds timestamps.
It is convenient to be able to read segments of this waveform as if it was one big waveform (handling the discontinuities gracefully), and have the choice of using (relative or absolute) integer indices or utc indices.
py2store.utils.attr_dict.py.attr_dict¶
py2store.utils.attr_dict.py¶
py2store.utils.cumul_aggreg_write¶
utils for bulk writing – accumulate, aggregate and write when some condition is met
-
class
py2store.utils.cumul_aggreg_write.
CumulAggregWrite
(store, cache_to_kv=<function mk_kv_from_keygen.<locals>.aggregate>, mk_cache=<class 'list'>)[source]¶
-
class
py2store.utils.cumul_aggreg_write.
CumulAggregWriteWithAutoFlush
(store, cache_to_kv=<function mk_kv_from_keygen.<locals>.aggregate>, mk_cache=<class 'list'>, flush_cache_condition=<function condition_flush_on_every_write>)[source]¶
-
py2store.utils.cumul_aggreg_write.
condition_flush_on_every_write
(cache)[source]¶ Boolean function used as flush_cache_condition to anytime the cache is non-empty
-
py2store.utils.cumul_aggreg_write.
mk_group_aggregator
(item_to_kv, aggregator_op=<built-in function add>, initial=<py2store.utils.cumul_aggreg_write.NoInitial object>)[source]¶ Make a generator transforming function that will (a) make a key for each given item, (b) group all items according to the key
- Parameters
item_to_kv –
aggregator_op –
initial –
Returns:
>>> # Collect words (as a csv string), grouped by the lower case of the first letter >>> ag = mk_group_aggregator(lambda item: (item[0].lower(), item), ... aggregator_op=lambda x, y: ', '.join([x, y])) >>> list(ag(['apple', 'bananna', 'Airplane'])) [('a', 'apple, Airplane'), ('b', 'bananna')] >>> # Collect (and concatinate) characters according to their ascii value modulo 3 >>> ag = mk_group_aggregator(lambda item: (item['age'], item['thing']), ... aggregator_op=lambda x, y: x + [y], ... initial=[]) >>> list(ag([{'age': 0, 'thing': 'new'}, {'age': 42, 'thing': 'every'}, {'age': 0, 'thing': 'just born'}])) [(0, ['new', 'just born']), (42, ['every'])]
-
py2store.utils.cumul_aggreg_write.
mk_group_aggregator_with_key_func
(item_to_key, aggregator_op=<built-in function add>, initial=<py2store.utils.cumul_aggreg_write.NoInitial object>)[source]¶ Make a generator transforming function that will (a) make a key for each given item, (b) group all items according to the key
- Parameters
item_to_key – Function that takes an item of the generator and outputs the key that should be used to group items
aggregator_op – The aggregation binary function that is used to aggregate two items together. The function is used as is by the functools.reduce, applied to the sequence of items that were collected for a given group
initial – The “empty” element to start the reduce (aggregation) with, if necessary.
Returns:
>>> # Collect words (as a csv string), grouped by the lower case of the first letter >>> ag = mk_group_aggregator_with_key_func(lambda item: item[0].lower(), ... aggregator_op=lambda x, y: ', '.join([x, y])) >>> list(ag(['apple', 'bananna', 'Airplane'])) [('a', 'apple, Airplane'), ('b', 'bananna')] >>> >>> # Collect (and concatenate) characters according to their ascii value modulo 3 ... ag = mk_group_aggregator_with_key_func(lambda item: (ord(item) % 3)) >>> list(ag('abcdefghijklmnop')) [(1, 'adgjmp'), (2, 'behkn'), (0, 'cfilo')] >>> >>> # sum all even and odd number separately ... ag = mk_group_aggregator_with_key_func(lambda item: (item % 2)) >>> list(ag([1, 2, 3, 4, 5])) # sum of evens is 6, and sum of odds is 9 [(1, 9), (0, 6)] >>> >>> # if we wanted to collect all odds and evens, we'd need a different aggregator and initial ... ag = mk_group_aggregator_with_key_func(lambda item: (item % 2), aggregator_op=lambda x, y: x + [y], initial=[]) >>> list(ag([1, 2, 3, 4, 5])) [(1, [1, 3, 5]), (0, [2, 4])]
py2store.utils¶
general utils
py2store.utils.cache_descriptors¶
descriptors to cache data
-
py2store.utils.cache_descriptors.
CachedProperty
(*args)[source]¶ CachedProperties. This is usable directly as a decorator when given names, or when not. Any of these patterns will work: *
@CachedProperty
*@CachedProperty()
*@CachedProperty('n','n2')
* def thing(self: …; thing = CachedProperty(thing) * def thing(self: …; thing = CachedProperty(thing, ‘n’)
py2store.utils.appendable¶
utils to make add append and extend functionality to KV stores
py2store.utils.affine_conversion¶
utils to carry out affine transformations (of indices)
-
class
py2store.utils.affine_conversion.
AffineConverter
(scale=1.0, offset=0.0)[source]¶ Getting a callable that will perform an affine conversion. Note, it does it as
(val - offset) * scale
(Note slope-intercept style (though there is the .from_slope_and_intercept constructor method for that)
- Inverse is available through the inv method, performing:
val / scale + offset
>>> convert = AffineConverter(scale=0.5, offset=1) >>> convert(0) -0.5 >>> convert(10) 4.5 >>> convert.inv(4) 9.0 >>> convert.inv(4.5) 10.0
-
py2store.utils.affine_conversion.
get_affine_converter_and_inverse
(scale=1, offset=0, source_type_cast=None, target_type_cast=None)[source]¶ - Getting two affine functions with given scale and offset, that are inverse of each other. Namely (for input val):
(val - offset) * scale and val / scale + offset
Note this is not “slope intercept” style!!
The source_type_cast and target_type_case (optional), allow the user to specify if these transformations need to be further cast to a given type. :param scale: :param offset: :param source_type_cast: function to apply to input :param target_type_cast: function to apply to output :return: Two single val functions: affine_converter, inverse_affine_converter
Note: Code is a lot more complex than the basic operations it performs. The reason was a worry of efficiency since the functions that are returned are intended to be used in long loops.
See also: ocore.utils.conversion.AffineConverter
>>> affine_converter, inverse_affine_converter = get_affine_converter_and_inverse(scale=0.5,offset=1) >>> affine_converter(0) -0.5 >>> affine_converter(10) 4.5 >>> inverse_affine_converter(4) 9.0 >>> inverse_affine_converter(4.5) 10.0 >>> affine_converter, inverse_affine_converter = get_affine_converter_and_inverse(scale=0.5,offset=1,target_type_cast=int) >>> affine_converter(10) 4
py2store.utils.signatures¶
Deprecated: Forwards to py2store.signatures
py2store.utils.sliceable¶
utils to add sliceable functionality to stores
-
class
py2store.utils.sliceable.
iSliceStore
(store)[source]¶ Wraps a store to make a reader that acts as if the store was a list (with integer keys, and that can be sliced). I say “list”, but it should be noted that the behavior is more that of range, that outputs an element of the list when keying with an integer, but returns an iterable object (a range) if sliced.
Here, a map object is returned when the sliceable store is sliced.
>>> s = {'foo': 'bar', 'hello': 'world', 'alice': 'bob'} >>> sliceable_s = iSliceStore(s) >>> sliceable_s[1] 'world' >>> list(sliceable_s[0:2]) ['bar', 'world'] >>> list(sliceable_s[-2:]) ['world', 'bob'] >>> list(sliceable_s[:-1]) ['bar', 'world']
py2store.utils.mappify¶
Utils to wrap any object into a mapping interface
-
class
py2store.utils.mappify.
LeafMappify
(target, node_types=(<class 'dict'>, ), key_concat=<function Mappify.<lambda>>, names_of_literals=(), **kwargs)[source]¶ A dict-like interface to glom. Here, only leaf keys are taken into account.
>>> d = { ... 'a': 'simple', ... 'b': {'is': 'nested'}, ... 'c': {'is': 'nested', 'and': 'has', 'a': [1, 2, 3]} ... } >>> g = LeafMappify(d) >>> >>> assert list(g) == ['a', 'b.is', 'c.is', 'c.and', 'c.a'] >>> assert g['a'] == 'simple' >>> assert g['b.is'] == 'nested' >>> assert g['c.a'] == [1, 2, 3] >>> >>> for k, v in g.items(): ... print(f"{k}: {v}") ... a: simple b.is: nested c.is: nested c.and: has c.a: [1, 2, 3]
-
class
py2store.utils.mappify.
Mappify
(target, node_types=(<class 'dict'>, ), key_concat=<function Mappify.<lambda>>, names_of_literals=(), **kwargs)[source]¶ >>> d = { ... 'a': 'simple', ... 'b': {'is': 'nested'}, ... 'c': {'is': 'nested', 'and': 'has', 'a': [1, 2, 3]} ... } >>> g = Mappify(d) >>> >>> assert list(g) == ['a', 'b.is', 'b', 'c.is', 'c.and', 'c.a', 'c'] >>> assert g['a'] == 'simple' >>> assert g['b.is'] == 'nested' >>> assert g['c.a'] == [1, 2, 3] >>> >>> for k, v in g.items(): ... print(f"{k}: {v}") ... a: simple b.is: nested b: {'is': 'nested'} c.is: nested c.and: has c.a: [1, 2, 3] c: {'is': 'nested', 'and': 'has', 'a': [1, 2, 3]}
py2store.utils.glom¶
glom is a util to extract stuff from nested structures. It’s one of those excellent utils that I’ve written many times, but never got quite right. Mahmoud Hashemi got it right.
- BEGIN LICENSE
Copyright (c) 2018, Mahmoud Hashemi
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
The names of the contributors may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- END LICENSE
Now, at the time of writing this, I’ve already transformed it to bend it to my liking. At some point it may become something else, but I wanted there to be a trace of what my seed was. Though I can’t promise I’ll maintain the same functionality as I transform this module, here’s a tutorial on how to use it in it’s original form:
I only took the main (core) module from the glom project. Here’s the original docs of this glom module.
If there was ever a Python example of “big things come in small
packages”, glom
might be it.
The glom
package has one central entrypoint,
glom.glom()
. Everything else in the package revolves around that
one function.
A couple of conventional terms you’ll see repeated many times below:
target - glom is built to work on any data, so we simply refer to the object being accessed as the “target”
spec - (aka “glomspec”, short for specification) The accompanying template used to specify the structure of the return value.
Now that you know the terms, let’s take a look around glom’s powerful semantics.
-
class
py2store.utils.glom.
Auto
(spec=None)[source]¶ Switch to Auto mode (the default)
TODO: this seems like it should be a sub-class of class Spec() – if Spec() could help define the interface for new “modes” or dialects that would also help make match mode feel less duct-taped on
-
class
py2store.utils.glom.
Call
(func=None, args=None, kwargs=None)[source]¶ Call
specifies when a target should be passed to a function, func.Call
is similar topartial()
in that it is no more powerful thanlambda
or other functions, but it is designed to be more readable, with a betterrepr
.- Parameters
func (callable) – a function or other callable to be called with the target
Call
combines well withT
to construct objects. For instance, to generate a dict and then pass it to a constructor:>>> class ExampleClass(object): ... def __init__(self, attr): ... self.attr = attr ... >>> target = {'attr': 3.14} >>> glom(target, Call(ExampleClass, kwargs=T)).attr 3.14
This does the same as
glom(target, lambda target: ExampleClass(**target))
, but it’s easy to see which one reads better.Note
Call
is mostly for functions. Use aT
object if you need to call a method.
-
class
py2store.utils.glom.
Check
(spec=T, **kwargs)[source]¶ Check objects are used to make assertions about the target data, and either pass through the data or raise exceptions if there is a problem.
If any check condition fails, a
CheckError
is raised.- Parameters
spec – a sub-spec to extract the data to which other assertions will be checked (defaults to applying checks to the target itself)
type – a type or sequence of types to be checked for exact match
equal_to – a value to be checked for equality match (“==”)
validate – a callable or list of callables, each representing a check condition. If one or more return False or raise an exception, the Check will fail.
instance_of – a type or sequence of types to be checked with isinstance()
one_of – an iterable of values, any of which can match the target (“in”)
default – an optional default value to replace the value when the check fails (if default is not specified, GlomCheckError will be raised)
Aside from spec, all arguments are keyword arguments. Each argument, except for default, represent a check condition. Multiple checks can be passed, and if all check conditions are left unset, Check defaults to performing a basic truthy check on the value.
-
exception
py2store.utils.glom.
CheckError
(msgs, check, path)[source]¶ This
GlomError
subtype is raised when target data fails to pass aCheck
’s specified validation.An uncaught
CheckError
looks like this:>>> target = {'a': {'b': 'c'}} >>> glom(target, {'b': ('a.b', Check(type=int))}) Traceback (most recent call last): ... glom.CheckError: target at path ['a.b'] failed check, got error: "expected type to be 'int', found type 'str'"
If the
Check
contains more than one condition, there may be more than one error message. The string rendition of theCheckError
will include all messages.You can also catch the
CheckError
and programmatically access messages through themsgs
attribute on theCheckError
instance.Note
As of 2018-07-05 (glom v18.2.0), the validation subsystem is still very new. Exact error message formatting may be enhanced in future releases.
-
class
py2store.utils.glom.
Coalesce
(*subspecs, **kwargs)[source]¶ Coalesce objects specify fallback behavior for a list of subspecs.
Subspecs are passed as positional arguments, and keyword arguments control defaults. Each subspec is evaluated in turn, and if none match, a
CoalesceError
is raised, or a default is returned, depending on the options used.Note
This operation may seem very familar if you have experience with SQL or even C# and others.
In practice, this fallback behavior’s simplicity is only surpassed by its utility:
>>> target = {'c': 'd'} >>> glom(target, Coalesce('a', 'b', 'c')) 'd'
glom tries to get
'a'
fromtarget
, but gets a KeyError. Rather than raise aPathAccessError
as usual, glom coalesces into the next subspec,'b'
. The process repeats until it gets to'c'
, which returns our value,'d'
. If our value weren’t present, we’d see:>>> target = {} >>> glom(target, Coalesce('a', 'b')) Traceback (most recent call last): ... glom.CoalesceError: no valid values found. Tried ('a', 'b') and got (PathAccessError, PathAccessError) (at path [])
Same process, but because
target
is empty, we get aCoalesceError
. If we want to avoid an exception, and we know which value we want by default, we can set default:>>> target = {} >>> glom(target, Coalesce('a', 'b', 'c'), default='d-fault') 'd-fault'
'a'
,'b'
, and'c'
weren’t present so we got'd-fault'
.- Parameters
subspecs – One or more glommable subspecs
default – A value to return if no subspec results in a valid value
default_factory – A callable whose result will be returned as a default
skip – A value, tuple of values, or predicate function representing values to ignore
skip_exc – An exception or tuple of exception types to catch and move on to the next subspec. Defaults to
GlomError
, the parent type of all glom runtime exceptions.
If all subspecs produce skipped values or exceptions, a
CoalesceError
will be raised. For more examples, check out the tutorial, which makes extensive use of Coalesce.
-
exception
py2store.utils.glom.
CoalesceError
(coal_obj, skipped, path)[source]¶ This
GlomError
subtype is raised from within aCoalesce
spec’s processing, when none of the subspecs match and no default is provided.The exception object itself keeps track of several values which may be useful for processing:
- Parameters
>>> target = {} >>> glom(target, Coalesce('a', 'b')) Traceback (most recent call last): ... glom.CoalesceError: no valid values found. Tried ('a', 'b') and got (PathAccessError, PathAccessError) ...
-
class
py2store.utils.glom.
Fill
(spec=None)[source]¶ A specifier type which switches to glom into “fill-mode”. For the spec contained within the Fill, glom will only interpret explicit specifier types (including T objects). Whereas the default mode has special interpretations for each of these builtins, fill-mode takes a lighter touch, making Fill great for “filling out” Python literals, like tuples, dicts, sets, and lists.
>>> target = {'data': [0, 2, 4]} >>> spec = Fill((T['data'][2], T['data'][0])) >>> glom(target, spec) (4, 0)
As you can see, glom’s usual built-in tuple item chaining behavior has switched into a simple tuple constructor.
(Sidenote for Lisp fans: Fill is like glom’s quasi-quoting.)
-
exception
py2store.utils.glom.
GlomError
[source]¶ The base exception for all the errors that might be raised from
glom()
processing logic.By default, exceptions raised from within functions passed to glom (e.g.,
len
,sum
, anylambda
) will not be wrapped in a GlomError.
-
class
py2store.utils.glom.
Glommer
(**kwargs)[source]¶ All the wholesome goodness that it takes to make glom work. This type mostly serves to encapsulate the type registration context so that advanced uses of glom don’t need to worry about stepping on each other’s toes.
Glommer objects are lightweight and, once instantiated, provide the
glom()
method we know and love:>>> glommer = Glommer() >>> glommer.glom({}, 'a.b.c', default='d') 'd' >>> Glommer().glom({'vals': list(range(3))}, ('vals', len)) 3
Instances also provide
register()
method for localized control over type handling.- Parameters
register_default_types (bool) – Whether or not to enable the handling behaviors of the default
glom()
. These default actions include dict access, list and iterable iteration, and generic object attribute access. Defaults to True.
-
register
(target_type, **kwargs)[source]¶ Register target_type so
glom()
will know how to handle instances of that type as targets.- Parameters
target_type (type) – A type expected to appear in a glom() call target
get (callable) – A function which takes a target object and a name, acting as a default accessor. Defaults to
getattr()
.iterate (callable) – A function which takes a target object and returns an iterator. Defaults to
iter()
if target_type appears to be iterable.exact (bool) – Whether or not to match instances of subtypes of target_type.
Note
The module-level
register()
function affects the module-levelglom()
function’s behavior. If this global effect is undesirable for your application, or you’re implementing a library, consider instantiating aGlommer
instance, and using theregister()
andGlommer.glom()
methods instead.
-
class
py2store.utils.glom.
Inspect
(*a, **kw)[source]¶ The
Inspect
specifier type provides a way to get visibility into glom’s evaluation of a specification, enabling debugging of those tricky problems that may arise with unexpected data.Inspect
can be inserted into an existing spec in one of two ways. First, as a wrapper around the spec in question, or second, as an argument-less placeholder wherever a spec could be.Inspect
supports several modes, controlled by keyword arguments. Its default, no-argument mode, simply echos the state of the glom at the point where it appears:>>> target = {'a': {'b': {}}} >>> val = glom(target, Inspect('a.b')) # wrapping a spec --- path: ['a.b'] target: {'a': {'b': {}}} output: {} ---
Debugging behavior aside,
Inspect
has no effect on values in the target, spec, or result.- Parameters
echo (bool) – Whether to print the path, target, and output of each inspected glom. Defaults to True.
recursive (bool) – Whether or not the Inspect should be applied at every level, at or below the spec that it wraps. Defaults to False.
breakpoint (bool) – This flag controls whether a debugging prompt should appear before evaluating each inspected spec. Can also take a callable. Defaults to False.
post_mortem (bool) – This flag controls whether exceptions should be caught and interactively debugged with
pdb
on inspected specs.
All arguments above are keyword-only to avoid overlap with a wrapped spec.
Note
Just like
pdb.set_trace()
, be careful about leaving strayInspect()
instances in production glom specs.
-
class
py2store.utils.glom.
Invoke
(func)[source]¶ Specifier type designed for easy invocation of callables from glom.
- Parameters
func (callable) – A function or other callable object.
Invoke
is similar tofunctools.partial()
, but with the ability to set up a “templated” call which interleaves constants and glom specs.For example, the following creates a spec which can be used to check if targets are integers:
>>> is_int = Invoke(isinstance).specs(T).constants(int) >>> glom(5, is_int) True
And this composes like any other glom spec:
>>> target = [7, object(), 9] >>> glom(target, [is_int]) [True, False, True]
Another example, mixing positional and keyword arguments:
>>> spec = Invoke(sorted).specs(T).constants(key=int, reverse=True) >>> target = ['10', '5', '20', '1'] >>> glom(target, spec) ['20', '10', '5', '1']
Invoke also helps with evaluating zero-argument functions:
>>> glom(target={}, spec=Invoke(int)) 0
(A trivial example, but from timestamps to UUIDs, zero-arg calls do come up!)
Note
Invoke
is mostly for functions, object construction, and callable objects. For calling methods, consider theT
object.-
constants
(*a, **kw)[source]¶ Returns a new
Invoke
spec, with the provided positional and keyword argument values stored for passing to the underlying function.>>> spec = Invoke(T).constants(5) >>> glom(range, (spec, list)) [0, 1, 2, 3, 4]
Subsequent positional arguments are appended:
>>> spec = Invoke(T).constants(2).constants(10, 2) >>> glom(range, (spec, list)) [2, 4, 6, 8]
Keyword arguments also work as one might expect:
>>> round_2 = Invoke(round).constants(ndigits=2).specs(T) >>> glom(3.14159, round_2) 3.14
constants()
and otherInvoke
methods may be called multiple times, just remember that every call returns a new spec.
-
classmethod
specfunc
(spec)[source]¶ Creates an
Invoke
instance where the function is indicated by a spec.>>> spec = Invoke.specfunc('func').constants(5) >>> glom({'func': range}, (spec, list)) [0, 1, 2, 3, 4]
-
specs
(*a, **kw)[source]¶ Returns a new
Invoke
spec, with the provided positional and keyword arguments stored to be interpreted as specs, with the results passed to the underlying function.>>> spec = Invoke(range).specs('value') >>> glom({'value': 5}, (spec, list)) [0, 1, 2, 3, 4]
Subsequent positional arguments are appended:
>>> spec = Invoke(range).specs('start').specs('end', 'step') >>> target = {'start': 2, 'end': 10, 'step': 2} >>> glom(target, (spec, list)) [2, 4, 6, 8]
Keyword arguments also work as one might expect:
>>> multiply = lambda x, y: x * y >>> times_3 = Invoke(multiply).constants(y=3).specs(x='value') >>> glom({'value': 5}, times_3) 15
specs()
and otherInvoke
methods may be called multiple times, just remember that every call returns a new spec.
-
star
(args=None, kwargs=None)[source]¶ Returns a new
Invoke
spec, with args and/or kwargs specs set to be “starred” or “star-starred” (respectively)>>> import os.path >>> spec = Invoke(os.path.join).star(args='path') >>> target = {'path': ['path', 'to', 'dir']} >>> glom(target, spec) 'path/to/dir'
- Parameters
args (spec) – A spec to be evaluated and “starred” into the underlying function.
kwargs (spec) – A spec to be evaluated and “star-starred” into the underlying function.
One or both of the above arguments should be set.
The
star()
, like otherInvoke
methods, may be called multiple times. The args and kwargs will be stacked in the order in which they are provided.
-
class
py2store.utils.glom.
Let
(**kw)[source]¶ This specifier type assigns variables to the scope.
>>> target = {'data': {'val': 9}} >>> spec = (Let(value=T['data']['val']), {'val': S['value']}) >>> glom(target, spec) {'val': 9}
-
class
py2store.utils.glom.
Literal
(value)[source]¶ Literal objects specify literal values in rare cases when part of the spec should not be interpreted as a glommable subspec. Wherever a Literal object is encountered in a spec, it is replaced with its wrapped value in the output.
>>> target = {'a': {'b': 'c'}} >>> spec = {'a': 'a.b', 'readability': Literal('counts')} >>> pprint(glom(target, spec)) {'a': 'c', 'readability': 'counts'}
Instead of accessing
'counts'
as a key like it did with'a.b'
,glom()
just unwrapped the literal and included the value.Literal
takes one argument, the literal value that should appear in the glom output.This could also be achieved with a callable, e.g.,
lambda x: 'literal_string'
in the spec, but using aLiteral
object adds explicitness, code clarity, and a cleanrepr()
.
-
class
py2store.utils.glom.
Path
(*path_parts)[source]¶ Path objects specify explicit paths when the default
'a.b.c'
-style general access syntax won’t work or isn’t desirable. Use this to wrap ints, datetimes, and other valid keys, as well as strings with dots that shouldn’t be expanded.>>> target = {'a': {'b': 'c', 'd.e': 'f', 2: 3}} >>> glom(target, Path('a', 2)) 3 >>> glom(target, Path('a', 'd.e')) 'f'
Paths can be used to join together other Path objects, as well as
T
objects:>>> Path(T['a'], T['b']) T['a']['b'] >>> Path(Path('a', 'b'), Path('c', 'd')) Path('a', 'b', 'c', 'd')
Paths also support indexing and slicing, with each access returning a new Path object:
>>> path = Path('a', 'b', 1, 2) >>> path[0] Path('a') >>> path[-2:] Path(1, 2)
-
classmethod
from_text
(text)[source]¶ Make a Path from .-delimited text:
>>> Path.from_text('a.b.c') Path('a', 'b', 'c')
-
classmethod
-
exception
py2store.utils.glom.
PathAccessError
(exc, path, part_idx)[source]¶ This
GlomError
subtype represents a failure to access an attribute as dictated by the spec. The most commonly-seen error when using glom, it maintains a copy of the original exception and produces a readable error message for easy debugging.If you see this error, you may want to:
Check the target data is accurate using
Inspect
Catch the exception and return a semantically meaningful error message
Use
glom.Coalesce
to specify a defaultUse the top-level
default
kwarg onglom()
In any case, be glad you got this error and not the one it was wrapping!
- Parameters
exc (Exception) – The error that arose when we tried to access path. Typically an instance of KeyError, AttributeError, IndexError, or TypeError, and sometimes others.
path (Path) – The full Path glom was in the middle of accessing when the error occurred.
part_idx (int) – The index of the part of the path that caused the error.
>>> target = {'a': {'b': None}} >>> glom(target, 'a.b.c') Traceback (most recent call last): ... glom.PathAccessError: could not access 'c', part 2 of Path('a', 'b', 'c'), got error: ...
-
class
py2store.utils.glom.
Spec
(spec, scope=None)[source]¶ Spec objects serve three purposes, here they are, roughly ordered by utility:
As a form of compiled or “curried” glom call, similar to Python’s built-in
re.compile()
.A marker as an object as representing a spec rather than a literal value in certain cases where that might be ambiguous.
A way to update the scope within another Spec.
In the second usage, Spec objects are the complement to
Literal
, wrapping a value and marking that it should be interpreted as a glom spec, rather than a literal value. This is useful in places where it would be interpreted as a value by default. (Such as T[key], Call(func) where key and func are assumed to be literal values and not specs.)- Parameters
spec – The glom spec.
scope (dict) – additional values to add to the scope when evaluating this Spec
-
class
py2store.utils.glom.
TType
[source]¶ T
, short for “target”. A singleton object that enables object-oriented expression of a glom specification.Note
T
is a singleton, and does not need to be constructed.Basically, think of
T
as your data’s stunt double. Everything that you do toT
will be recorded and executed during theglom()
call. Take this example:>>> spec = T['a']['b']['c'] >>> target = {'a': {'b': {'c': 'd'}}} >>> glom(target, spec) 'd'
So far, we’ve relied on the
'a.b.c'
-style shorthand for access, or used thePath
objects, but if you want to explicitly do attribute and key lookups, look no further thanT
.But T doesn’t stop with unambiguous access. You can also call methods and perform almost any action you would with a normal object:
>>> spec = ('a', (T['b'].items(), list)) # reviewed below >>> glom(target, spec) [('c', 'd')]
A
T
object can go anywhere in the spec. As seen in the example above, we access'a'
, use aT
to get'b'
and iterate over itsitems
, turning them into alist
.You can even use
T
withCall
to construct objects:>>> class ExampleClass(object): ... def __init__(self, attr): ... self.attr = attr ... >>> target = {'attr': 3.14} >>> glom(target, Call(ExampleClass, kwargs=T)).attr 3.14
On a further note, while
lambda
works great in glom specs, and can be very handy at times,T
andCall
eliminate the need for the vast majority oflambda
usage with glom.Unlike
lambda
and other functions,T
roundtrips beautifully and transparently:>>> T['a'].b['c']('success') T['a'].b['c']('success')
T
-related access errors raise aPathAccessError
during theglom()
call.Note
While
T
is clearly useful, powerful, and here to stay, its semantics are still being refined. Currently, operations beyond method calls and attribute/item access are considered experimental and should not be relied upon.
-
class
py2store.utils.glom.
TargetRegistry
(register_default_types=True)[source]¶ responsible for registration of target types for iteration and attribute walking
-
get_handler
(op, obj, path=None, raise_exc=True)[source]¶ for an operation and object instance, obj, return the closest-matching handler function, raising UnregisteredTarget if no handler can be found for obj (or False if raise_exc=False)
-
register_op
(op_name, auto_func=None, exact=False)[source]¶ add operations beyond the builtins (‘get’ and ‘iterate’ at the time of writing).
auto_func is a function that when passed a type, returns a handler associated with op_name if it’s supported, or False if it’s not.
See glom.core.register_op() for the global version used by extensions.
-
-
exception
py2store.utils.glom.
UnregisteredTarget
(op, target_type, type_map, path)[source]¶ This
GlomError
subtype is raised when a spec calls for an unsupported action on a target type. For instance, trying to iterate on an non-iterable target:>>> glom(object(), ['a.b.c']) Traceback (most recent call last): ... glom.UnregisteredTarget: target type 'object' not registered for 'iterate', expected one of registered types: (...)
It should be noted that this is a pretty uncommon occurrence in production glom usage. See the setup-and-registration section for details on how to avoid this error.
An UnregisteredTarget takes and tracks a few values:
- Parameters
op (str) – The name of the operation being performed (‘get’ or ‘iterate’)
target_type (type) – The type of the target being processed.
type_map (dict) – A mapping of target types that do support this operation
path – The path at which the error occurred.
-
py2store.utils.glom.
glom
(target, spec, **kwargs)[source]¶ Access or construct a value from a given target based on the specification declared by spec.
Accessing nested data, aka deep-get:
>>> target = {'a': {'b': 'c'}} >>> glom(target, 'a.b') 'c'
Here the spec was just a string denoting a path,
'a.b.
. As simple as it should be. The next example shows how to use nested data to access many fields at once, and make a new nested structure.Constructing, or restructuring more-complicated nested data:
>>> target = {'a': {'b': 'c', 'd': 'e'}, 'f': 'g', 'h': [0, 1, 2]} >>> spec = {'a': 'a.b', 'd': 'a.d', 'h': ('h', [lambda x: x * 2])} >>> output = glom(target, spec) >>> pprint(output) {'a': 'c', 'd': 'e', 'h': [0, 2, 4]}
glom
also takes a keyword-argument, default. When set, if aglom
operation fails with aGlomError
, the default will be returned, very much likedict.get()
:>>> glom(target, 'a.xx', default='nada') 'nada'
The skip_exc keyword argument controls which errors should be ignored.
>>> glom({}, lambda x: 100.0 / len(x), default=0.0, skip_exc=ZeroDivisionError) 0.0
- Parameters
target (object) – the object on which the glom will operate.
spec (object) – Specification of the output object in the form of a dict, list, tuple, string, other glom construct, or any composition of these.
default (object) – An optional default to return in the case an exception, specified by skip_exc, is raised.
skip_exc (Exception) – An optional exception or tuple of exceptions to ignore and return default (None if omitted). If skip_exc and default are both not set, glom raises errors through.
scope (dict) – Additional data that can be accessed via S inside the glom-spec.
It’s a small API with big functionality, and glom’s power is only surpassed by its intuitiveness. Give it a whirl!
-
py2store.utils.glom.
is_iterable
(x)[source]¶ Similar in nature to
callable()
,is_iterable
returnsTrue
if an object is `iterable`_,False
if not. >>> is_iterable([]) True >>> is_iterable(1) False
-
py2store.utils.glom.
make_sentinel
(name='_MISSING', var_name=None)[source]¶ Creates and returns a new instance of a new class, suitable for usage as a “sentinel”, a kind of singleton often used to indicate a value is missing when
None
is a valid input.- Parameters
name (str) – Name of the Sentinel
var_name (str) – Set this name to the name of the variable in its respective module enable pickleability.
>>> make_sentinel(var_name='_MISSING') _MISSING
The most common use cases here in boltons are as default values for optional function arguments, partly because of its less-confusing appearance in automatically generated documentation. Sentinels also function well as placeholders in queues and linked lists.
Note
By design, additional calls to
make_sentinel
with the same values will not produce equivalent objects.>>> make_sentinel('TEST') == make_sentinel('TEST') False >>> type(make_sentinel('TEST')) == type(make_sentinel('TEST')) False
-
py2store.utils.glom.
register
(target_type, **kwargs)[source]¶ Register target_type so
glom()
will know how to handle instances of that type as targets.- Parameters
target_type (type) – A type expected to appear in a glom() call target
get (callable) – A function which takes a target object and a name, acting as a default accessor. Defaults to
getattr()
.iterate (callable) – A function which takes a target object and returns an iterator. Defaults to
iter()
if target_type appears to be iterable.exact (bool) – Whether or not to match instances of subtypes of target_type.
Note
The module-level
register()
function affects the module-levelglom()
function’s behavior. If this global effect is undesirable for your application, or you’re implementing a library, consider instantiating aGlommer
instance, and using theregister()
andGlommer.glom()
methods instead.
py2store.persisters.sql_w_odbc¶
py2store.persisters.dynamodb_w_boto3¶
py2store.persisters.couchdb_w_couchdb¶
py2store.persisters.ftp_persister¶
py2store.persisters.dropbox_w_urllib¶
py2store.persisters._google_drive_in_progress¶
py2store.persisters.dropbox_w_dropbox¶
Forwards to dropboxdol
py2store.persisters.redis_w_redis¶
Forwards to redisdol
py2store.persisters.sql_w_sqlalchemy¶
Forwards to sqldol
py2store.persisters.new_s3¶
Forwards to s3dol.new_s3
py2store.persisters¶
base persisters – now all forwarding to separate libraries
py2store.persisters.dropbox_w_requests¶
py2store.persisters.w_aiofile¶
Forwards to aiofiledol
py2store.persisters.local_files¶
base classes to work with local files
-
class
py2store.persisters.local_files.
DirReader
(rootdir)[source]¶ KV Reader whose keys (AND VALUES) are directory full paths of the subdirectories of rootdir.
-
class
py2store.persisters.local_files.
DirpathFormatKeys
(path_format: str, max_levels: int = inf)[source]¶
-
class
py2store.persisters.local_files.
FileReader
(rootdir)[source]¶ KV Reader whose keys are paths and values are: - Another FileReader if a path points to a directory - The bytes of the file if the path points to a file.
-
class
py2store.persisters.local_files.
FilepathFormatKeys
(path_format: str, max_levels: int = inf)[source]¶
-
class
py2store.persisters.local_files.
LocalFileRWD
(mode='', **open_kwargs)[source]¶ A class providing get, set and delete functionality using local files as the storage backend.
-
class
py2store.persisters.local_files.
LocalFileStreamGetter
(**open_kwargs)[source]¶ A class to get stream objects of local open files. The class can only get keys, and only to read, write (destructive or append).
>>> from tempfile import mkdtemp >>> import os >>> rootdir = mkdtemp() >>> >>> appendable_stream = LocalFileStreamGetter(mode='a+') >>> reader = PathFormatPersister(rootdir) >>> filepath = os.path.join(rootdir, 'tmp.txt') >>> >>> with appendable_stream[filepath] as fp: ... fp.write('hello') 5 >>> print(reader[filepath]) hello >>> with appendable_stream[filepath] as fp: ... fp.write(' world') 6 >>> >>> print(reader[filepath]) hello world
-
class
py2store.persisters.local_files.
PathFormatPersister
(path_format, max_levels: int = inf, mode='', **open_kwargs)[source]¶
-
class
py2store.persisters.local_files.
PrefixedDirpathsRecursive
[source]¶ Keys collection for local files, where the keys are full filepaths RECURSIVELY under a given root dir _prefix. This mixin adds iteration (__iter__), length (__len__), and containment (__contains__(k)).
-
class
py2store.persisters.local_files.
PrefixedFilepaths
[source]¶ Keys collection for local files, where the keys are full filepaths DIRECTLY under a given root dir _prefix. This mixin adds iteration (__iter__), length (__len__), and containment (__contains__(k)).
py2store.persisters.arangodb_w_pyarango¶
py2store.persisters._cassandra_in_progress¶
py2store.persisters._couchdb_in_progress¶
py2store.persisters.s3_w_boto3¶
Forwards to s3dol.s3_w_boto3
py2store.persisters._postgres_w_psycopg2_in_progress¶
py2store.persisters.ssh_persister¶
py2store.persisters.mongo_w_pymongo¶
py2store.persisters.googledrive_w_pydrive¶
Forwards to pydrivedol
py2store.sources¶
Forwards to dol.sources:
This module contains key-value views of disparate sources.
py2store.dig¶
Forwards to dol.dig:
Layers introspection
py2store.serializers.pickled¶
functions to pickle objects
-
py2store.serializers.pickled.
mk_marshal_rw_funcs
(**kwargs)[source]¶ Generates a reader and writer using marshal. That is, a pair of parametrized loads and dumps
>>> read, write = mk_marshal_rw_funcs() >>> d = {'a': 'simple', 'and': {'a': b'more', 'complex': [1, 2.2]}} >>> serialized_d = write(d) >>> deserialized_d = read(serialized_d) >>> assert d == deserialized_d
-
py2store.serializers.pickled.
mk_pickle_rw_funcs
(fix_imports=True, protocol=None, pickle_encoding='ASCII', pickle_errors='strict')[source]¶ Generates a reader and writer using pickle. That is, a pair of parametrized loads and dumps
>>> read, write = mk_pickle_rw_funcs() >>> d = {'a': 'simple', 'and': {'a': b'more', 'complex': [1, 2.2, dict]}} >>> serialized_d = write(d) >>> deserialized_d = read(serialized_d) >>> assert d == deserialized_d
py2store.serializers.jsonization¶
py2store.serializers¶
a package of serializers
py2store.serializers.sequential¶
py2store.serializers.regular_panel_data¶
py2store.serializers.audio¶
py2store.caching¶
Forwards to dol.caching:
Tools to add caching layers to stores.
py2store.scrap¶
py2store.scrap.new_gen_local¶
py2store.examples.write_caches¶
stores that implement various write caching algorithms
py2store.examples¶
modules demoing various uses of py2store
py2store.examples.python_code_stats¶
Note: Moved to umpyre (pip install umpyre)
Get stats about packages. Your own, or other’s. Things like…
# >>> import collections # >>> modules_info_df(collections) # lines empty_lines … num_of_functions num_of_classes # collections.__init__ 1273 189 … 1 9 # collections.abc 3 1 … 0 25 # <BLANKLINE> # [2 rows x 7 columns] # >>> modules_info_df_stats(collections.abc) # lines 1276.000000 # empty_lines 190.000000 # comment_lines 73.000000 # docs_lines 133.000000 # function_lines 138.000000 # num_of_functions 1.000000 # num_of_classes 34.000000 # empty_lines_ratio 0.148903 # comment_lines_ratio 0.057210 # function_lines_ratio 0.108150 # mean_lines_per_function 138.000000 # dtype: float64 # >>> stats_of([‘urllib’, ‘json’, ‘collections’]) # urllib json collections # empty_lines_ratio 0.157034 0.136818 0.148903 # comment_lines_ratio 0.074142 0.038432 0.057210 # function_lines_ratio 0.213907 0.449654 0.108150 # mean_lines_per_function 13.463768 41.785714 138.000000 # lines 4343.000000 1301.000000 1276.000000 # empty_lines 682.000000 178.000000 190.000000 # comment_lines 322.000000 50.000000 73.000000 # docs_lines 425.000000 218.000000 133.000000 # function_lines 929.000000 585.000000 138.000000 # num_of_functions 69.000000 14.000000 1.000000 # num_of_classes 55.000000 3.000000 34.000000
py2store.examples.kv_walking¶
walking through kv stores
-
class
py2store.examples.kv_walking.
SrcReader
(src, src_to_keys, key_to_obj)[source]¶ -
update_keys_cache
(keys)¶ Updates the _keys_cache by calling its {} method
-
-
py2store.examples.kv_walking.
conjunction
(*args, **kwargs)[source]¶ ` will be equal to `
func_1(*args, **kwargs) & … & func_n(*args, **kwargs) ``` for all args, kwargs.
-
py2store.examples.kv_walking.
kv_walk
(v: collections.abc.Mapping, yield_func=<function asis>, walk_filt=<function val_is_mapping>, pkv_to_pv=<function tuple_keypath_and_val>, p=())[source]¶ - Parameters
v –
yield_func – (pp, k, vv) -> what ever you want the gen to yield
walk_filt – (p, k, vv) -> (bool) whether to explore the nested structure v further
pkv_to_pv – (p, k, v) -> (pp, vv) where pp is a form of p + k (update of the path with the new node k) and vv is the value that will be used by both walk_filt and yield_func
p – The path to v
>>> d = {'a': 1, 'b': {'c': 2, 'd': 3}} >>> list(kv_walk(d)) [(('a',), 'a', 1), (('b',), 'b', {'c': 2, 'd': 3}), (('b', 'c'), 'c', 2), (('b', 'd'), 'd', 3)] >>> list(kv_walk(d, lambda p, k, v: '.'.join(p))) ['a', 'b', 'b.c', 'b.d'] >>> list(kv_walk(d, lambda p, k, v: '.'.join(p))) ['a', 'b', 'b.c', 'b.d']
py2store.my¶
functionalities meant to be configurable
py2store.my.grabbers¶
define stores (and functions) so they give you data as you want it, depending on the extension
py2store.trans¶
Forwards to dol.trans:
Transformation/wrapping tools
py2store.key_mappers.str_utils¶
utils from strings
-
py2store.key_mappers.str_utils.
args_and_kwargs_indices
(format_string)[source]¶ Get the sets of indices and names used in manual specification of format strings, or None, None if auto spec. :param format_string: A format string (i.e. a string with {…} to mark parameter placement and formatting
- Returns
None, None if format_string is an automatic specification set_of_indices_used, set_of_fields_used if it is a manual specification
>>> format_string = '{0} (no 1) {2}, {see} this, {0} is a duplicate (appeared before) and {name} is string-named' >>> assert args_and_kwargs_indices(format_string) == ({0, 2}, {'name', 'see'}) >>> format_string = 'This is a format string with only automatic field specification: {}, {}, {} etc.' >>> assert args_and_kwargs_indices(format_string) == (set(), set())
-
py2store.key_mappers.str_utils.
auto_field_format_str
(format_str)[source]¶ Get an auto field version of the format_str
- Parameters
format_str – A format string
- Returns
A transformed format_str that has no names {inside} {formatting} {braces}.
>>> auto_field_format_str('R/{0}/{one}/{}/{two}/T') 'R/{}/{}/{}/{}/T'
-
py2store.key_mappers.str_utils.
compile_str_from_parsed
(parsed)[source]¶ The (quasi-)inverse of string.Formatter.parse.
- Parameters
parsed – iterator of (literal_text, field_name, format_spec, conversion) tuples,
yield by string.Formatter.parse (as) –
- Returns
A format string that would produce such a parsed input.
>>> s = "ROOT/{}/{0!r}/{1!i:format}/hello{:0.02f}TAIL" >>> assert compile_str_from_parsed(string.Formatter().parse(s)) == s >>> >>> # Or, if you want to see more details... >>> parsed = list(string.Formatter().parse(s)) >>> for p in parsed: ... print(p) ('ROOT/', '', '', None) ('/', '0', '', 'r') ('/', '1', 'format', 'i') ('/hello', '', '0.02f', None) ('TAIL', None, None, None) >>> compile_str_from_parsed(parsed) 'ROOT/{}/{0!r}/{1!i:format}/hello{:0.02f}TAIL'
-
py2store.key_mappers.str_utils.
format_params_in_str_format
(format_string)[source]¶ Get the “parameter” indices/names of the format_string
- Parameters
format_string – A format string (i.e. a string with {…} to mark parameter placement and formatting
- Returns
A list of parameter indices used in the format string, in the order they appear, with repetition. Parameter indices could be integers, strings, or None (to denote “automatic field numbering”.
>>> format_string = '{0} (no 1) {2}, and {0} is a duplicate, {} is unnamed and {name} is string-named' >>> format_params_in_str_format(format_string) [0, 2, 0, None, 'name']
-
py2store.key_mappers.str_utils.
get_explicit_positions
(parsed_str_format)[source]¶ >>> parsed = parse_str_format("all/{}/is/{2}/position/{except}{this}{0}") >>> get_explicit_positions(parsed) {0, 2}
-
py2store.key_mappers.str_utils.
is_automatic_format_params
(format_params)[source]¶ Says if the format_params is from an automatic specification See Also: is_manual_format_params and is_hybrid_format_params
-
py2store.key_mappers.str_utils.
is_automatic_format_string
(format_string)[source]¶ Says if the format_string is uses automatic specification See Also: is_manual_format_params >>> is_automatic_format_string(‘Manual: indices: {1} {2}, named: {named} {fields}’) False >>> is_automatic_format_string(‘Auto: only un-indexed and un-named: {} {}…’) True >>> is_automatic_format_string(‘Hybrid: at least a {}, and a {0} or a {name}’) False >>> is_manual_format_string(‘No formatting is both manual and automatic formatting!’) True
-
py2store.key_mappers.str_utils.
is_hybrid_format_params
(format_params)[source]¶ Says if the format_params is from a hybrid of auto and manual. Note: Hybrid specifications are considered non-valid and can’t be formatted with format_string.format(…). Yet, it can be useful for flexibility of expression (but will need to be resolved to be used). See Also: is_manual_format_params and is_automatic_format_params
-
py2store.key_mappers.str_utils.
is_hybrid_format_string
(format_string)[source]¶ Says if the format_params is from a hybrid of auto and manual. Note: Hybrid specifications are considered non-valid and can’t be formatted with format_string.format(…). Yet, it can be useful for flexibility of expression (but will need to be resolved to be used).
>>> is_hybrid_format_string('Manual: indices: {1} {2}, named: {named} {fields}') False >>> is_hybrid_format_string('Auto: only un-indexed and un-named: {} {}...') False >>> is_hybrid_format_string('Hybrid: at least a {}, and a {0} or a {name}') True >>> is_manual_format_string('No formatting is both manual and automatic formatting (so hybrid is both)!') True
-
py2store.key_mappers.str_utils.
is_manual_format_params
(format_params)[source]¶ Says if the format_params is from a manual specification See Also: is_automatic_format_params
-
py2store.key_mappers.str_utils.
is_manual_format_string
(format_string)[source]¶ Says if the format_string uses a manual specification See Also: is_automatic_format_string and >>> is_manual_format_string(‘Manual: indices: {1} {2}, named: {named} {fields}’) True >>> is_manual_format_string(‘Auto: only un-indexed and un-named: {} {}…’) False >>> is_manual_format_string(‘Hybrid: at least a {}, and a {0} or a {name}’) False >>> is_manual_format_string(‘No formatting is both manual and automatic formatting!’) True
-
py2store.key_mappers.str_utils.
manual_field_format_str
(format_str)[source]¶ Get an auto field version of the format_str
- Parameters
format_str – A format string
- Returns
A transformed format_str that has no names {inside} {formatting} {braces}.
>>> auto_field_format_str('R/{0}/{one}/{}/{two}/T') 'R/{}/{}/{}/{}/T'
-
py2store.key_mappers.str_utils.
n_format_params_in_str_format
(format_string)[source]¶ The number of parameters
-
py2store.key_mappers.str_utils.
name_fields_in_format_str
(format_str, field_names=None)[source]¶ Get a manual field version of the format_str
- Parameters
format_str – A format string
names – An iterable that produces enough strings to fill all of format_str fields
- Returns
A transformed format_str
>>> name_fields_in_format_str('R/{0}/{one}/{}/{two}/T') 'R/{0}/{1}/{2}/{3}/T' >>> # Note here that we use the field name to inject a field format as well >>> name_fields_in_format_str('R/{foo}/{0}/{}/T', ['42', 'hi:03.0f', 'world']) 'R/{42}/{hi:03.0f}/{world}/T'
py2store.key_mappers.tuples¶
Tools to map tuple-structured keys. That is, converting from any of the following kinds of keys:
tuples (or list-like)
dicts
formatted/templated strings
dsv (Delimiter-Separated Values)
-
py2store.key_mappers.tuples.
dsv_of_list
(d, sep=',')[source]¶ Converting a list of strings to a dsv (delimiter-separated values) string.
Note that unlike most key mappers, there is no schema imposing size here. If you wish to impose a size validation, do so externally (we suggest using a decorator for that).
- Parameters
d – A list of component strings
sep – The delimiter text used to separate a string into a list of component strings
- Returns
The delimiter-separated values (dsv) string for the input tuple
>>> dsv_of_list(['a', 'brown', 'fox'], sep=' ') 'a brown fox' >>> dsv_of_list(('jumps', 'over'), sep='/') # for filepaths (and see that tuple inputs work too!) 'jumps/over' >>> dsv_of_list(['Sat', 'Jan', '1', '1983'], sep=',') # csv: the usual delimiter-separated values format 'Sat,Jan,1,1983' >>> dsv_of_list(['First', 'Last'], sep=':::') # a longer delimiter 'First:::Last' >>> dsv_of_list(['singleton'], sep='@') # when the list has only one element 'singleton' >>> dsv_of_list([], sep='@') # when the list is empty ''
-
py2store.key_mappers.tuples.
list_of_dsv
(d, sep=',')[source]¶ Converting a dsv (delimiter-separated values) string to the list of it’s components.
- Parameters
d – A (delimiter-separated values) string
sep – The delimiter text used to separate the string into a list of component strings
- Returns
A list of component strings corresponding to the input delimiter-separated values (dsv) string
>>> list_of_dsv('a brown fox', sep=' ') ['a', 'brown', 'fox'] >>> tuple(list_of_dsv('jumps/over', sep='/')) # for filepaths ('jumps', 'over') >>> list_of_dsv('Sat,Jan,1,1983', sep=',') # csv: the usual delimiter-separated values format ['Sat', 'Jan', '1', '1983'] >>> list_of_dsv('First:::Last', sep=':::') # a longer delimiter ['First', 'Last'] >>> list_of_dsv('singleton', sep='@') # when the list has only one element ['singleton'] >>> list_of_dsv('', sep='@') # when the string is empty []
-
py2store.key_mappers.tuples.
mk_obj_of_str
(constructor)[source]¶ Make a function that transforms a string to an object. The factory making inverses of what mk_str_from_obj makes.
- Parameters
constructor – The function (or class) that will be used to make objects from the **kwargs parsed out of the string.
- Returns
A function factory.
-
py2store.key_mappers.tuples.
mk_str_of_obj
(attrs)[source]¶ Make a function that transforms objects to strings, using specific attributes of object.
- Parameters
attrs – Attributes that should be read off of the object to make the parameters of the string
- Returns
A transformation function
>>> from dataclasses import dataclass >>> @dataclass ... class A: ... foo: int ... bar: str >>> a = A(foo=0, bar='rin') >>> a A(foo=0, bar='rin') >>> >>> str_from_obj = mk_str_of_obj(['foo', 'bar']) >>> str_from_obj(a, 'ST{foo}/{bar}/G') 'ST0/rin/G'
-
py2store.key_mappers.tuples.
str_of_tuple
(d, str_format)[source]¶ Convert tuple to str. It’s just str_format.format(*d). Why even write such a function? (1) To have a consistent interface for key conversions (2) We want a KeyValidationError to occur here :param d: tuple if params to str_format :param str_format: Auto fields format string. If you have manual fields, consider auto_field_format_str to convert.
- Returns
parametrized string
>>> str_of_tuple(('hello', 'world'), "Well, {} dear {}!") 'Well, hello dear world!'
py2store.key_mappers.paths¶
Module that forwards to py2store.paths, kept for back-compatibility
py2store.key_mappers.naming¶
This module only forwards to py2store.naming, and is deprecated.
py2store.key_mappers¶
key mapping
py2store.errors¶
Forwards to dol.errors:
Error objects and utils
py2store.slib.s_configparser¶
Data Object Layer for configparser standard lib.
py2store.slib¶
modules for standard libs
py2store.slib.s_zipfile¶
a data object layer for zipfile
-
class
py2store.slib.s_zipfile.
FileStreamsOfZip
(zip_file, prefix='', open_kws=None)[source]¶ Like FilesOfZip, but object returns are file streams instead. So you use it like this:
``` z = FileStreamsOfZip(rootdir) with z[relpath] as fp:
… # do stuff with fp, like fp.readlines() or such…
-
class
py2store.slib.s_zipfile.
FlatZipFilesReader
(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, zip_reader=<class 'py2store.slib.s_zipfile.ZipReader'>, **zip_reader_kwargs)[source]¶ Read the union of the contents of multiple zip files. A local file reader whose keys are the zip filepaths of the rootdir and values are corresponding ZipReaders.
-
py2store.slib.s_zipfile.
ZipFileReader
¶
-
class
py2store.slib.s_zipfile.
ZipFileStreamsReader
(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, *, zip_reader=<class 'py2store.slib.s_zipfile.FileStreamsOfZip'>, **zip_reader_kwargs)¶ Like ZipFilesReader, but objects returned are file streams instead.
-
class
py2store.slib.s_zipfile.
ZipFilesReader
(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, zip_reader=<class 'py2store.slib.s_zipfile.ZipReader'>, **zip_reader_kwargs)[source]¶ A local file reader whose keys are the zip filepaths of the rootdir and values are corresponding ZipReaders.
-
class
py2store.slib.s_zipfile.
ZipFilesReaderAndBytesWriter
(rootdir, subpath='.+\\.zip', pattern_for_field=None, max_levels=0, zip_reader=<class 'py2store.slib.s_zipfile.ZipReader'>, **zip_reader_kwargs)[source]¶ Like ZipFilesReader, but the ability to write bytes (assumed to be valid bytes of the zip format) to a key
-
class
py2store.slib.s_zipfile.
ZipReader
(zip_file, prefix='', open_kws=None, file_info_filt=None)[source]¶ A KvReader to read the contents of a zip file. Provides a KV perspective of https://docs.python.org/3/library/zipfile.html
ZipReader
has two value categories: Directories and Files. Both categories are distinguishable by the keys, through the “ends with slash” convention.When a file, the value return is bytes, as usual.
- When a directory, the value returned is a
ZipReader
itself, with all params the same, except for theprefix
which serves to specify the subfolder (that is, ``prefix` acts as a filter).
Note: If you get data zipped by a mac, you might get some junk along with it. Namely __MACOSX folders .DS_Store files. I won’t rant about it, since others have. But you might find it useful to remove them from view. One choice is to use py2store.trans.filt_iter to get a filtered view of the zips contents. In most cases, this should do the job:
` # applied to store instance or class: store = filt_iter(filt=lambda x: not x.startswith('__MACOSX') and '.DS_Store' not in x)(store) `
Another option is just to remove these from the zip file once and for all. In unix-like systems:
` zip -d filename.zip __MACOSX/\* zip -d filename.zip \*/.DS_Store `
Examples
# >>> s = ZipReader(‘/path/to/some_zip_file.zip’) # >>> len(s) # 53432 # >>> list(s)[:3] # the first 3 elements (well… their keys) # [‘odir/’, ‘odir/app/’, ‘odir/app/data/’] # >>> list(s)[-3:] # the last 3 elements (well… their keys) # [‘odir/app/data/audio/d/1574287049078391/m/Ctor.json’, # ‘odir/app/data/audio/d/1574287049078391/m/intensity.json’, # ‘odir/app/data/run/status.json’] # >>> # getting a file (note that by default, you get bytes, so need to decode) # >>> s[‘odir/app/data/run/status.json’].decode() # b’{“test_phase_number”: 9, “test_phase”: “TestActions.IGNORE_TEST”, “session_id”: 0}’ # >>> # when you ask for the contents for a key that’s a directory, # >>> # you get a ZipReader filtered for that prefix: # >>> s[‘odir/app/data/audio/’] # ZipReader(‘/path/to/some_zip_file.zip’, ‘odir/app/data/audio/’, {}, <function take_everything at 0x1538999e0>) # >>> # Often, you only want files (not directories) # >>> # You can filter directories out using the file_info_filt argument # >>> s = ZipReader(‘/path/to/some_zip_file.zip’, file_info_filt=ZipReader.FILES_ONLY) # >>> len(s) # compare to the 53432 above, that contained dirs too # 53280 # >>> list(s)[:3] # first 3 keys are all files now # [‘odir/app/data/plc/d/1574304926795633/d/1574305026895702’, # ‘odir/app/data/plc/d/1574304926795633/d/1574305276853053’, # ‘odir/app/data/plc/d/1574304926795633/d/1574305159343326’] # >>> # >>> # ZipReader.FILES_ONLY and ZipReader.DIRS_ONLY are just convenience filt functions # >>> # Really, you can provide any custom one yourself. # >>> # This filter function should take a ZipInfo object, and return True or False. # >>> # (https://docs.python.org/3/library/zipfile.html#zipfile.ZipInfo) # >>> # >>> import re # >>> p = re.compile(‘audio.*.json$’) # >>> my_filt_func = lambda fileinfo: bool(p.search(fileinfo.filename)) # >>> s = ZipReader(‘/Users/twhalen/Downloads/2019_11_21.zip’, file_info_filt=my_filt_func) # >>> len(s) # 48 # >>> list(s)[:3] # [‘odir/app/data/audio/d/1574333557263758/m/Ctor.json’, # ‘odir/app/data/audio/d/1574333557263758/m/intensity.json’, # ‘odir/app/data/audio/d/1574288084739961/m/Ctor.json’]
- When a directory, the value returned is a
-
class
py2store.slib.s_zipfile.
ZipStore
(zip_filepath, compression=8, allow_overwrites=True, pwd=None)[source]¶ Zip read and writing. When you want to read zips, there’s the FilesOfZip, ZipReader, or ZipFilesReader we know and love.
Sometimes though, you want to write to zips too. For this, we have ZipStore.
Since ZipStore can write to a zip, it’s read functionality is not going to assume static data, and cache things, as your favorite zip readers did. This, and the acrobatics need to disguise the weird zipfile into something more… key-value natural, makes for a not so efficient store, out of the box.
- I advise using one of the zip readers if all you need to do is read, or subclassing or
wrapping ZipStore with caching layers if it is appropriate to you.
-
py2store.slib.s_zipfile.
func_conjunction
(func1, func2)[source]¶ Returns a function that is equivalent to lambda x: func1(x) and func2(x)
-
py2store.slib.s_zipfile.
mk_flatzips_store
(dir_of_zips, zip_pair_path_preproc=<built-in function sorted>, mk_store=<class 'py2store.slib.s_zipfile.FlatZipFilesReader'>, **extra_mk_store_kwargs)[source]¶ A store so that you can work with a folder that has a bunch of zip files, as if they’ve all been extracted in the same folder. Note that zip_pair_path_preproc can be used to control how to resolve key conflicts (i.e. when you get two different zip files that have a same path in their contents). The last path encountered by zip_pair_path_preproc(zip_path_pairs) is the one that will be used, so one should make zip_pair_path_preproc act accordingly.
py2store.base¶
Forwards to dol.base:
Base classes for making stores. In the language of the collections.abc module, a store is a MutableMapping that is configured to work with a specific representation of keys, serialization of objects (python values), and persistence of the serialized data.
That is, stores offer the same interface as a dict, but where the actual implementation of writes, reads, and listing are configurable.
Consider the following example. You’re store is meant to store waveforms as wav files on a remote server. Say waveforms are represented in python as a tuple (wf, sr), where wf is a list of numbers and sr is the sample rate, an int). The __setitem__ method will specify how to store bytes on a remote server, but you’ll need to specify how to SERIALIZE (wf, sr) to the bytes that constitute that wav file: _data_of_obj specifies that. You might also want to read those wav files back into a python (wf, sr) tuple. The __getitem__ method will get you those bytes from the server, but the store will need to know how to DESERIALIZE those bytes back into a python object: _obj_of_data specifies that
Further, say you’re storing these .wav files in /some/folder/on/the/server/, but you don’t want the store to use these as the keys. For one, it’s annoying to type and harder to read. But more importantly, it’s an irrelevant implementation detail that shouldn’t be exposed. THe _id_of_key and _key_of_id pair are what allow you to add this key interface layer.
These key converters object serialization methods default to the identity (i.e. they return the input as is). This means that you don’t have to implement these as all, and can choose to implement these concerns within the storage methods themselves.
py2store.selectors.mg_selectors¶
py2store.selectors.mongoquery¶
py2store.selectors¶
py2store.parse_format¶
Modified from https://github.com/r1chardj0n3s/parse
Parse strings using a specification based on the Python format() syntax.
parse()
is the opposite offormat()
From there it’s a simple thing to parse a string:
>>> parse("It's {}, I love it!", "It's spam, I love it!")
<Result ('spam',) {}>
>>> _[0]
'spam'
Or to search a string for some pattern:
>>> search('Age: {:d}\n', 'Name: Rufus\nAge: 42\nColor: red\n')
<Result (42,) {}>
Or find all the occurrences of some pattern in a string:
>>> ''.join(r.fixed[0] for r in findall(">{}<", "<p>the <b>bold</b> text</p>"))
'the bold text'
If you’re going to use the same pattern to match lots of strings you can compile it once:
>>> p = compile("It's {}, I love it!")
>>> print(p)
<Parser "It's {}, I love it!">
>>> p.parse("It's spam, I love it!")
<Result ('spam',) {}>
(“compile” is not exported for import *
usage as it would override the
built-in compile()
function)
The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True:
>>> parse('SPAM', 'spam', case_sensitive=True) is None
True
Format Syntax¶
A basic version of the Format String Syntax is supported with anonymous (fixed-position), named and formatted fields:
{[field name]:[format spec]}
Field names must be a valid Python identifiers, including dotted names; element indexes imply dictionaries (see below for example).
Numbered fields are also not supported: the result of parsing will include the parsed fields in the order they are parsed.
The conversion of fields to types other than strings is done based on the
type in the format specification, which mirrors the format()
behaviour.
There are no “!” field conversions like format()
has.
Some simple parse() format string examples:
>>> parse("Bring me a {}", "Bring me a shrubbery")
<Result ('shrubbery',) {}>
>>> r = parse("The {} who say {}", "The knights who say Ni!")
>>> print(r)
<Result ('knights', 'Ni!') {}>
>>> print(r.fixed)
('knights', 'Ni!')
>>> r = parse("Bring out the holy {item}", "Bring out the holy hand grenade")
>>> print(r)
<Result () {'item': 'hand grenade'}>
>>> print(r.named)
{'item': 'hand grenade'}
>>> print(r['item'])
hand grenade
Dotted names and indexes are possible though the application must make additional sense of the result:
>>> r = parse("Mmm, {food.type}, I love it!", "Mmm, spam, I love it!")
>>> print(r)
<Result () {'food.type': 'spam'}>
>>> print(r.named)
{'food.type': 'spam'}
>>> print(r['food.type'])
spam
>>> r = parse("My quest is {quest[name]}", "My quest is to seek the holy grail!")
>>> print(r)
<Result () {'quest': {'name': 'to seek the holy grail!'}}>
>>> print(r['quest'])
{'name': 'to seek the holy grail!'}
>>> print(r['quest']['name'])
to seek the holy grail!
If the text you’re matching has braces in it you can match those by including
a double-brace {{
or }}
in your format string, just like format() does.
Format Specification¶
Most often a straight format-less {}
will suffice where a more complex
format specification might have been used.
Most of format()’s Format Specification Mini-Language is supported:
[[fill]align][0][width][.precision][type]
The differences between parse() and format() are:
The align operators will cause spaces (or specified fill character) to be stripped from the parsed value. The width is not enforced; it just indicates there may be whitespace or “0”s to strip.
Numeric parsing will automatically handle a “0b”, “0o” or “0x” prefix. That is, the “#” format character is handled automatically by d, b, o and x formats. For “d” any will be accepted, but for the others the correct prefix must be present if at all.
Numeric sign is handled automatically.
The thousands separator is handled automatically if the “n” type is used.
The types supported are a slightly different mix to the format() types. Some format() types come directly over: “d”, “n”, “%”, “f”, “e”, “b”, “o” and “x”. In addition some regular expression character group types “D”, “w”, “W”, “s” and “S” are also available.
The “e” and “g” types are case-insensitive so there is not need for the “E” or “G” types.
Type |
Characters Matched |
Output |
---|---|---|
w |
Letters and underscore |
str |
W |
Non-letter and underscore |
str |
s |
Whitespace |
str |
S |
Non-whitespace |
str |
d |
Digits (effectively integer numbers) |
int |
D |
Non-digit |
str |
n |
Numbers with thousands separators (, or .) |
int |
% |
Percentage (converted to value/100.0) |
float |
f |
Fixed-point numbers |
float |
F |
Decimal numbers |
Decimal |
e |
Floating-point numbers with exponent e.g. 1.1e-10, NAN (all case insensitive) |
float |
g |
General number format (either d, f or e) |
float |
b |
Binary numbers |
int |
o |
Octal numbers |
int |
x |
Hexadecimal numbers (lower and upper case) |
int |
ti |
ISO 8601 format date/time e.g. 1972-01-20T10:21:36Z (“T” and “Z” optional) |
datetime |
te |
RFC2822 e-mail format date/time e.g. Mon, 20 Jan 1972 10:21:36 +1000 |
datetime |
tg |
Global (day/month) format date/time e.g. 20/1/1972 10:21:36 AM +1:00 |
datetime |
ta |
US (month/day) format date/time e.g. 1/20/1972 10:21:36 PM +10:30 |
datetime |
tc |
ctime() format date/time e.g. Sun Sep 16 01:03:52 1973 |
datetime |
th |
HTTP log format date/time e.g. 21/Nov/2011:00:07:11 +0000 |
datetime |
ts |
Linux system log format date/time e.g. Nov 9 03:37:44 |
datetime |
tt |
Time e.g. 10:21:36 PM -5:30 |
time |
Some examples of typed parsing with None
returned if the typing
does not match:
>>> parse('Our {:d} {:w} are...', 'Our 3 weapons are...')
<Result (3, 'weapons') {}>
>>> parse('Our {:d} {:w} are...', 'Our three weapons are...')
>>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM')
<Result (datetime.datetime(2011, 2, 1, 23, 0),) {}>
And messing about with alignment:
>>> parse('with {:>} herring', 'with a herring')
<Result ('a',) {}>
>>> parse('spam {:^} spam', 'spam lovely spam')
<Result ('lovely',) {}>
Note that the “center” alignment does not test to make sure the value is centered - it just strips leading and trailing whitespace.
Width and precision may be used to restrict the size of matched text from the input. Width specifies a minimum size and precision specifies a maximum. For example:
>>> parse('{:.2}{:.2}', 'look') # specifying precision
<Result ('lo', 'ok') {}>
>>> parse('{:4}{:4}', 'look at that') # specifying width
<Result ('look', 'at that') {}>
>>> parse('{:4}{:.4}', 'look at that') # specifying both
<Result ('look at ', 'that') {}>
>>> parse('{:2d}{:2d}', '0440') # parsing two contiguous numbers
<Result (4, 40) {}>
Some notes for the date and time types:
the presence of the time part is optional (including ISO 8601, starting at the “T”). A full datetime object will always be returned; the time will be set to 00:00:00. You may also specify a time without seconds.
when a seconds amount is present in the input fractions will be parsed to give microseconds.
except in ISO 8601 the day and month digits may be 0-padded.
the date separator for the tg and ta formats may be “-” or “/”.
named months (abbreviations or full names) may be used in the ta and tg formats in place of numeric months.
as per RFC 2822 the e-mail format may omit the day (and comma), and the seconds but nothing else.
hours greater than 12 will be happily accepted.
the AM/PM are optional, and if PM is found then 12 hours will be added to the datetime object’s hours amount - even if the hour is greater than 12 (for consistency.)
in ISO 8601 the “Z” (UTC) timezone part may be a numeric offset
timezones are specified as “+HH:MM” or “-HH:MM”. The hour may be one or two digits (0-padded is OK.) Also, the “:” is optional.
the timezone is optional in all except the e-mail format (it defaults to UTC.)
named timezones are not handled yet.
Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day.
Result and Match Objects¶
The result of a parse()
and search()
operation is either None
(no match), a
Result
instance or a Match
instance if evaluate_result
is False.
The Result
instance has three attributes:
- fixed
A tuple of the fixed-position, anonymous fields extracted from the input.
- named
A dictionary of the named fields extracted from the input.
- spans
A dictionary mapping the names and fixed position indices matched to a 2-tuple slice range of where the match occurred in the input. The span does not include any stripped padding (alignment or width).
The Match
instance has one method:
- evaluate_result()
Generates and returns a
Result
instance for thisMatch
object.
Custom Type Conversions¶
If you wish to have matched fields automatically converted to your own type you
may pass in a dictionary of type conversion information to parse()
and
compile()
.
The converter will be passed the field string matched. Whatever it returns
will be substituted in the Result
instance for that field.
Your custom type conversions may override the builtin types if you supply one with the same identifier.
>>> def shouty(string):
... return string.upper()
...
>>> parse('{:shouty} world', 'hello world', dict(shouty=shouty))
<Result ('HELLO',) {}>
If the type converter has the optional pattern
attribute, it is used as
regular expression for better pattern matching (instead of the default one).
>>> def parse_number(text):
... return int(text)
>>> parse_number.pattern = r'\d+'
>>> parse('Answer: {number:Number}', 'Answer: 42', dict(Number=parse_number))
<Result () {'number': 42}>
>>> _ = parse('Answer: {:Number}', 'Answer: Alice', dict(Number=parse_number))
>>> assert _ is None, "MISMATCH"
You can also use the with_pattern(pattern)
decorator to add this
information to a type converter function:
>>> @with_pattern(r'\d+')
... def parse_number(text):
... return int(text)
>>> parse('Answer: {number:Number}', 'Answer: 42', dict(Number=parse_number))
<Result () {'number': 42}>
A more complete example of a custom type might be:
>>> yesno_mapping = {
... "yes": True, "no": False,
... "on": True, "off": False,
... "true": True, "false": False,
... }
>>> @with_pattern(r"|".join(yesno_mapping))
... def parse_yesno(text):
... return yesno_mapping[text.lower()]
If the type converter pattern
uses regex-grouping (with parenthesis),
you should indicate this by using the optional regex_group_count
parameter
in the with_pattern()
decorator:
>>> @with_pattern(r'((\d+))', regex_group_count=2)
... def parse_number2(text):
... return int(text)
>>> parse('Answer: {:Number2} {:Number2}', 'Answer: 42 43', dict(Number2=parse_number2))
<Result (42, 43) {}>
Otherwise, this may cause parsing problems with unnamed/fixed parameters.
Potential Gotchas¶
parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:
>>> pattern = '{dir1}/{dir2}'
>>> data = 'root/parent/subdir'
>>> sorted(parse(pattern, data).named.items())
[('dir1', 'root'), ('dir2', 'parent/subdir')]
So, even though {‘dir1’: ‘root/parent’, ‘dir2’: ‘subdir’} would also fit the pattern, the actual match represents the shortest successful match for dir1.
Version history (in brief):
1.9.0 We now honor precision and width specifiers when parsing numbers and strings, allowing parsing of concatenated elements of fixed width (thanks Julia Signell)
1.8.4 Add LICENSE file at request of packagers. Correct handling of AM/PM to follow most common interpretation. Correct parsing of hexadecimal that looks like a binary prefix. Add ability to parse case sensitively. Add parsing of numbers to Decimal with “F” (thanks John Vandenberg)
1.8.3 Add regex_group_count to with_pattern() decorator to support user-defined types that contain brackets/parenthesis (thanks Jens Engel)
1.8.2 add documentation for including braces in format string
1.8.1 ensure bare hexadecimal digits are not matched
1.8.0 support manual control over result evaluation (thanks Timo Furrer)
1.7.0 parse dict fields (thanks Mark Visser) and adapted to allow more than 100 re groups in Python 3.5+ (thanks David King)
1.6.6 parse Linux system log dates (thanks Alex Cowan)
1.6.5 handle precision in float format (thanks Levi Kilcher)
1.6.4 handle pipe “|” characters in parse string (thanks Martijn Pieters)
1.6.3 handle repeated instances of named fields, fix bug in PM time overflow
1.6.2 fix logging to use local, not root logger (thanks Necku)
1.6.1 be more flexible regarding matched ISO datetimes and timezones in general, fix bug in timezones without “:” and improve docs
1.6.0 add support for optional
pattern
attribute in user-defined types (thanks Jens Engel)1.5.3 fix handling of question marks
1.5.2 fix type conversion error with dotted names (thanks Sebastian Thiel)
1.5.1 implement handling of named datetime fields
1.5 add handling of dotted field names (thanks Sebastian Thiel)
1.4.1 fix parsing of “0” in int conversion (thanks James Rowe)
1.4 add __getitem__ convenience access on Result.
1.3.3 fix Python 2.5 setup.py issue.
1.3.2 fix Python 3.2 setup.py issue.
1.3.1 fix a couple of Python 3.2 compatibility issues.
1.3 added search() and findall(); removed compile() from
import *
export as it overwrites builtin.1.2 added ability for custom and override type conversions to be provided; some cleanup
1.1.9 to keep things simpler number sign is handled automatically; significant robustification in the face of edge-case input.
1.1.8 allow “d” fields to have number base “0x” etc. prefixes; fix up some field type interactions after stress-testing the parser; implement “%” type.
1.1.7 Python 3 compatibility tweaks (2.5 to 2.7 and 3.2 are supported).
1.1.6 add “e” and “g” field types; removed redundant “h” and “X”; removed need for explicit “#”.
1.1.5 accept textual dates in more places; Result now holds match span positions.
1.1.4 fixes to some int type conversion; implemented “=” alignment; added date/time parsing with a variety of formats handled.
1.1.3 type conversion is automatic based on specified field types. Also added “f” and “n” types.
1.1.2 refactored, added compile() and limited
from parse import *
1.1.1 documentation improvements
1.1.0 implemented more of the Format Specification Mini-Language and removed the restriction on mixing fixed-position and named fields
1.0.0 initial release
This code is copyright 2012-2017 Richard Jones <richard@python.org> See the end of the source file for the license of use.
-
py2store.parse_format.
findall
(format, string, pos=0, endpos=None, extra_types=None, evaluate_result=True, case_sensitive=False)[source]¶ Search “string” for all occurrences of “format”.
You will be returned an iterator that holds Result instances for each format match found.
Optionally start the search at “pos” character index and limit the search to a maximum index of endpos - equivalent to search(string[:endpos]).
If
evaluate_result
is True each returned Result instance has two attributes:.fixed - tuple of fixed-position values from the string .named - dict of named values from the string
If
evaluate_result
is False each returned value is a Match instance with one method:- .evaluate_result() - This will return a Result instance like you would get
with
evaluate_result
set to True
The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True.
If the format is invalid a ValueError will be raised.
See the module documentation for the use of “extra_types”.
-
py2store.parse_format.
parse
(format, string, extra_types=None, evaluate_result=True, case_sensitive=False)[source]¶ Using “format” attempt to pull values from “string”.
The format must match the string contents exactly. If the value you’re looking for is instead just a part of the string use search().
If
evaluate_result
is True the return value will be an Result instance with two attributes:.fixed - tuple of fixed-position values from the string .named - dict of named values from the string
If
evaluate_result
is False the return value will be a Match instance with one method:- .evaluate_result() - This will return a Result instance like you would get
with
evaluate_result
set to True
The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True.
If the format is invalid a ValueError will be raised.
See the module documentation for the use of “extra_types”.
In the case there is no match parse() will return None.
-
py2store.parse_format.
search
(format, string, pos=0, endpos=None, extra_types=None, evaluate_result=True, case_sensitive=False)[source]¶ Search “string” for the first occurrence of “format”.
The format may occur anywhere within the string. If instead you wish for the format to exactly match the string use parse().
Optionally start the search at “pos” character index and limit the search to a maximum index of endpos - equivalent to search(string[:endpos]).
If
evaluate_result
is True the return value will be an Result instance with two attributes:.fixed - tuple of fixed-position values from the string .named - dict of named values from the string
If
evaluate_result
is False the return value will be a Match instance with one method:- .evaluate_result() - This will return a Result instance like you would get
with
evaluate_result
set to True
The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True.
If the format is invalid a ValueError will be raised.
See the module documentation for the use of “extra_types”.
In the case there is no match parse() will return None.
-
py2store.parse_format.
with_pattern
(pattern, regex_group_count=None)[source]¶ Attach a regular expression pattern matcher to a custom type converter function.
This annotates the type converter with the
pattern
attribute.Example
>>> @with_pattern(r"\d+") ... def parse_number(text): ... return int(text)
is equivalent to:
>>> def parse_number(text): ... return int(text) >>> parse_number.pattern = r"\d+"
- Parameters
pattern – regular expression pattern (as text)
regex_group_count – Indicates how many regex-groups are in pattern.
- Returns
wrapped function