dol.kv_codecs

Tools to make Key-Value Codecs (encoder-decoder pairs) from standard library tools.

class dol.kv_codecs.CodecCollection(*args, **kwargs)[source]: The base class for collections of codecs. Makes sure that the class cannot be instantiated, but only used as a collection. Also provides an _iter_codecs method that iterates over the codec names.

class dol.kv_codecs.KeyCodecs(*args, **kwargs)[source]

A collection of key codecs

mapped_keys(decoder: Mapping | Callable | None = None)[source]

A factory that creates a key codec that uses “explicit” mappings to encode and decode keys.

The encoders and decoders can be an explicit mapping of a function. If the encoder is a mapping, the decoder is the inverse of that mapping. If given explicitly, this will be asserted. If not, the decoder will be computed by swapping the keys and values of the encoder and asserting that no values were lost in the process (that is, that the mappings are invertible). The statements above are true if you swap “encoder” and “decoder”.

>>> km = KeyCodecs.mapped_keys({'a': 1, 'b': 2})
>>> km.encoder('a')
1
>>> km.decoder(1)
'a'

If the encoder is a function, the decoder must be an iterable of keys who will be used as arguments of the function to get the encoded key, and the decode will be the inverse of that mapping. The statement above is true if you swap “encoder” and “decoder”.

>>> km = KeyCodecs.mapped_keys(['a', 'b'], str.upper)
>>> km.encoder('A')
'a'
>>> km.decoder('a')
'A'

class dol.kv_codecs.KeyValueCodecs(*args, **kwargs)[source]

A collection of key-value codecs that can be used with postget and preset kv_wraps.

extension_based(*, default: Callable | None = None)[source]: A factory that creates a key-value codec that uses the file extension to determine the value codec to use.

key_based(key_func: ~typing.Callable = <function identity_func>, *, default: ~typing.Callable | None = None)[source]: A factory that creates a key-value codec that uses the key to determine the value codec to use.

class dol.kv_codecs.NotGiven[source]: A singleton to indicate that a value was not given

class dol.kv_codecs.ValueCodecs(*args, **kwargs)[source]

A collection of value codec factories using standard lib tools.

>>> json_codec = ValueCodecs.json()  # call the json codec factory
>>> encoder, decoder = json_codec
>>> encoder({'b': 2})
'{"b": 2}'
>>> decoder('{"b": 2}')
{'b': 2}

The json_codec object is also a Mapping value wrapper:

>>> backend = dict()
>>> interface = json_codec(backend)
>>> interface['a'] = {'b': 2}  # we write a dict
>>> assert backend == {'a': '{"b": 2}'}  # json was written in backend
>>> interface['a']  # but this json is decoded to a dict when read from interface
{'b': 2}

In order not to have to call the codec factory when you just want the default, we’ve made a default attribute that contains all the default codecs:

>>> backend = dict()
>>> interface = ValueCodecs.default.json(backend)
>>> interface['a'] = {'b': 2}  # we write a dict
>>> assert backend == {'a': '{"b": 2}'}  # json was written in backend

For times when you want to parametrize your code though, know that you can also pass arguments to the encoder and decoder when you make your codec. For example, to make a json codec that indents the json, you can do:

>>> json_codec = ValueCodecs.json(indent=2)
>>> backend = dict()
>>> interface = json_codec(backend)
>>> interface['a'] = {'b': 2}  # we write a dict
>>> print(backend['a'])  # written in backend with indent
{
  "b": 2
}

b64 = <module 'base64' from '/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/base64.py'>

class default[source]: To contain default codecs. Is populated by @_add_default_codecs

io = <module 'io' from '/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/io.py'>

class methodcaller

methodcaller(name, …) –> methodcaller object

Return a callable object that calls the given method on its operand. After f = methodcaller(‘name’), the call f(r) returns r.name(). After g = methodcaller(‘name’, ‘date’, foo=1), the call g(r) returns r.name(‘date’, foo=1).

single_nested_value()[source]

>>> d = {
...     1: {'en': 'one', 'fr': 'un', 'sp': 'uno'},
...     2: {'en': 'two', 'fr': 'deux', 'sp': 'dos'},
... }
>>> en = ValueCodecs.single_nested_value('en')(d)
>>> en[1]
'one'
>>> en[1] = 'ONE'
>>> d[1]  # note that here d[1] is completely replaced (not updated)
{'en': 'ONE'}

tuple_of_dict()[source]

Get a tuple-view of dict values.

>>> d = {
...     1: {'en': 'one', 'fr': 'un', 'sp': 'uno'},
...     2: {'en': 'two', 'fr': 'deux', 'sp': 'dos'},
... }
>>> codec = ValueCodecs.tuple_of_dict(['fr', 'sp'])
>>> codec.encoder(['deux', 'tre'])
{'fr': 'deux', 'sp': 'tre'}
>>> codec.decoder({'en': 'one', 'fr': 'un', 'sp': 'uno'})
('un', 'uno')
>>> frsp = codec(d)
>>> frsp[2]
('deux', 'dos')
>>> ('deux', 'dos')
('deux', 'dos')
>>> frsp[2] = ('DEUX', 'DOS')
>>> frsp[2]
('DEUX', 'DOS')

Note that writes completely replace the values in the backend dict, it doesn’t update them:

>>> d[2]
{'fr': 'DEUX', 'sp': 'DOS'}

See also dol.KeyTemplate for more general key-based views.

zip_compress(filename='some_bytes', *, compression=8, allowZip64=True, compresslevel=None, strict_timestamps=True, encoding='utf-8') → bytes

Compress input bytes, returning the compressed bytes

>>> b = b'x' * 1000 + b'y' * 1000  # 2000 (quite compressible) bytes
>>> len(b)
2000
>>>
>>> zipped_bytes = zip_compress(b)
>>> # Note: Compression details will be system dependent
>>> len(zipped_bytes)  
137
>>> unzipped_bytes = zip_decompress(zipped_bytes)
>>> unzipped_bytes == b  # verify that unzipped bytes are the same as the original
True
>>>
>>> from dol.zipfiledol import compression_methods
>>>
>>> zipped_bytes = zip_compress(b, compression=compression_methods['bzip2'])
>>> # Note: Compression details will be system dependent
>>> len(zipped_bytes)  
221
>>> unzipped_bytes = zip_decompress(zipped_bytes)
>>> unzipped_bytes == b  # verify that unzipped bytes are the same as the original
True

zip_decompress(*, allowZip64=True, compresslevel=None, strict_timestamps=True) → bytes

Decompress input bytes of a single file zip, returning the uncompressed bytes

See zip_compress for usage examples.

dol.kv_codecs.add_invertible_key_decoder(store: Mapping, *, decoder: Callable)[source]: Add a key decoder to a store (instance)

dol.kv_codecs.common_prefix_keys_wrap(s: Mapping)[source]: Transforms keys of mapping to omit the longest prefix they have in common

dol.kv_codecs.csv_dict_decode(string: str, fieldnames, dialect: str = 'excel', delimiter: str = ',', quotechar: str | None = '"', escapechar: str | None = None, doublequote: bool = True, skipinitialspace: bool = False, lineterminator: str = '\r\n', quoting=0, strict: bool = False, restkey=None, restval='', extrasaction='raise', fieldcasts=None)[source]

Decode a csv string into a list of dicts.

Parameters:

string – The csv string to decode
fieldcasts – A function that takes a row and returns a row with the same keys but with values cast to the desired type. If a dict, it should be a mapping from fieldnames to cast functions. If an iterable, it should be an iterable of cast functions, in which case each cast function will be applied to each element of the row, element wise.

>>> data = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
>>> encoded = csv_dict_encode(data, fieldnames=['a', 'b'])
>>> encoded
'a,b\r\n1,2\r\n3,4\r\n'
>>> csv_dict_decode(encoded)
[{'a': '1', 'b': '2'}, {'a': '3', 'b': '4'}]

See that you don’t get back when you started with. The ints aren’t ints anymore! You can resolve this by using the fieldcasts argument (that’s our argument – not present in builtin csv module). I should be a function (that transforms a dict to the one you want) or list or tuple of the same size as the row (that specifies the cast function for each field)

>>> csv_dict_decode(encoded, fieldnames=['a', 'b'], fieldcasts=[int] * 2)
[{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
>>> csv_dict_decode(encoded, fieldnames=['a', 'b'], fieldcasts={'b': float})
[{'a': '1', 'b': 2.0}, {'a': '3', 'b': 4.0}]

dol.kv_codecs.csv_dict_encode(string: str, fieldnames, dialect: str = 'excel', delimiter: str = ',', quotechar: str | None = '"', escapechar: str | None = None, doublequote: bool = True, skipinitialspace: bool = False, lineterminator: str = '\r\n', quoting=0, strict: bool = False, restkey=None, restval='', extrasaction='raise', fieldcasts=None)[source]

Encode a list of dicts into a csv string.

>>> data = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
>>> encoded = csv_dict_encode(data, fieldnames=['a', 'b'])
>>> encoded
'a,b\r\n1,2\r\n3,4\r\n'

dol.kv_codecs.key_based_codec_factory(key_mapping: dict, key_func: ~typing.Callable = <function identity_func>)[source]: A factory that creates a key codec that uses the key to determine the codec to use.

dol.kv_codecs.key_based_value_trans(key_func: ~typing.Callable[[KT], KT], value_trans_mapping, default_factory: ~typing.Callable[[], ~typing.Callable], k=<class 'dol.kv_codecs.NotGiven'>)[source]

A factory that creates a value codec that uses the key to determine the codec to use.

# a key_func that gets the extension of a file path

>>> import json
>>> from functools import partial
>>> key_func = lambda k: os.path.splitext(k)[1]
>>> value_trans_mapping = {'.json': json.loads, '.txt': bytes.decode}
>>> default_factory = partial(ValueError, "No codec for this extension")
>>> trans = key_based_value_trans(
...     key_func, value_trans_mapping, default_factory=lambda: identity_func
... )