meshed.util

util functions

class meshed.util.ConditionalIterize(func: Callable, iterize_type: type = typing.Iterator, iterize_condition: Callable[[Any], bool] | None = None)[source]

A decorator that “iterizes” a function call if input satisfies a condition. That is, apply map(func, input) (iterize) or func(input) according to some conidition on input.

>>> def foo(x, y=2):
...     return x * y

The function does this:

>>> foo(3)
6
>>> foo('string')
'stringstring'

The iterized version of the function does this:

>>> iterized_foo = iterize(foo)
>>> list(iterized_foo([1, 2, 3]))
[2, 4, 6]

>>> from typing import Iterable
>>> new_foo = ConditionalIterize(foo, Iterable)
>>> new_foo(3)
6
>>> list(new_foo([1, 2, 3]))
[2, 4, 6]

See what happens if we do this:

>>> list(new_foo('string'))
['ss', 'tt', 'rr', 'ii', 'nn', 'gg']

Maybe you expected ‘stringstring’ because you are thinking of string as a valid, single input. But the condition of iterization is to be an Iterable, which a string is, thus the (perhaps) unexpected result.

In fact, this problem is a general one: If your base function doesn’t process iterables, the isinstance(x, Iterable) is good enough – but if it is supposed to process an iterable in the first place, how can you distinguish whether to use the iterized version or not? The solution depends on the situation and the iterface you want. You choose.

Since the situation where you’ll want to iterize functions in the first place is when you’re building streaming pipelines, a good fallback choice is to iterize if and only if the input is an iterator. This is condition will trigger the iterization when the input has a __next__ – so things like generators, but not lists, tuples, sets, etc.

See in the following that ConditionalIterize also has a wrap class method that can be used to wrap a function at definition time.

>>> @ConditionalIterize.wrap(Iterator)  # Iterator is the default, so no need here
... def foo(x, y=2):
...     return x * y
>>> foo(3)
6
>>> foo('string')
'stringstring'

If you want to process a “stream” of numbers 1, 2, 3, don’t do it this way:

>>> foo([1, 2, 3])
[1, 2, 3, 1, 2, 3]

Instead, you should explicitly wrap that iterable in an iterator, to trigger the iterization:

>>> list(foo(iter([1, 2, 3])))
[2, 4, 6]

So far, the only way we controlled the iterize condition is through a type. Really, the condition that is used behind the scenes is isinstance(obj, self.iterize_type). If you need more complex conditions though, you can specify it through the iterize_condition argument. The iterize_type is also used to annotate the resulting wrapped function if it’s first argument is annotated. As a consequence, iterize_type needs to be a “generic” type.

>>> @ConditionalIterize.wrap(Iterable, lambda x: isinstance(x, (list, tuple)))
... def foo(x: int, y=2):
...     return x * y
>>> foo(3)
6
>>> list(foo([1, 2, 3]))
[2, 4, 6]
>>> from inspect import signature

We annotated x as int, so see now the annotation of the wrapped function:

>>> str(signature(foo))
'(x: Union[int, Iterable[int]], y=2)'

exception meshed.util.InvalidFunctionParameters[source]: To be used when a function’s parameters are not compliant with some rule about them.

exception meshed.util.NameValidationError[source]: Use to indicate that there’s a problem with a name or generating a valid name

exception meshed.util.NotFound[source]: To be raised when something is expected to exist, but doesn’t

exception meshed.util.NotUniqueError[source]: Error to be raised when unicity is expected, but violated

exception meshed.util.ValidationError[source]: Error that is raised when an object’s validation failed

meshed.util.args_funcnames(funcs: ~typing.Iterable[~typing.Callable], name_of_func: ~typing.Callable[[~typing.Callable], str] | None = <function func_name>)[source]: Generates (arg_name, func_id) pairs from the iterable of functions

meshed.util.conditional_trans(obj: T, condition: Callable[[T], bool], trans: Callable[[T], Any])[source]

Conditionally transform an object unless it is marked as a literal.

>>> from functools import partial
>>> trans = partial(
...     conditional_trans, condition=str.isnumeric, trans=float
... )
>>> trans('not a number')
'not a number'
>>> trans('10')
10.0

To use this function but tell it to not transform some a specific input no matter what, wrap the input with Literal

>>> # from meshed import Literal
>>> conditional_trans(LiteralVal('10'), str.isnumeric, float)
'10'

meshed.util.conservative_parameter_merge(*params, same_name=True, same_kind=True, same_default=True, same_annotation=True)

Validates that all the params are exactly the same, returning the first if so.

This is used when hooking up functions that use the same parameters (i.e. arg names). When the name of an argument is used more than once, which kind, default, and annotation should be used in the interface of the DAG?

If they’re all the same, there’s no problem.

But if they’re not the same, we need to provide control on which to ignore.

>>> from inspect import Parameter as P
>>> PK = P.POSITIONAL_OR_KEYWORD
>>> KO = P.KEYWORD_ONLY
>>> parameter_merger(P('a', PK), P('a', PK))
<Parameter "a">
>>> parameter_merger(P('a', PK), P('different_name', PK), same_name=False)
<Parameter "a">
>>> parameter_merger(P('a', PK), P('a', KO), same_kind=False)
<Parameter "a">
>>> parameter_merger(P('a', PK), P('a', PK,  default=42), same_default=False)
<Parameter "a">
>>> parameter_merger(P('a', PK, default=42), P('a', PK), same_default=False)
<Parameter "a=42">
>>> parameter_merger(P('a', PK, annotation=int), P('a', PK), same_annotation=False)
<Parameter "a: int">

meshed.util.dot_to_ascii(dot: str, fancy: bool = True)[source]

Convert a dot string to an ascii rendering of the diagram.

Needs a connection to the internet to work.

>>> graph_dot = '''
...     graph {
...         rankdir=LR
...         0 -- {1 2}
...         1 -- {2}
...         2 -> {0 1 3}
...         3
...     }
... '''
>>>
>>> graph_ascii = dot_to_ascii(graph_dot)  
>>>
>>> print(graph_ascii)  

                 ┌─────────┐
                 ▼         │
     ┌───┐     ┌───┐     ┌───┐     ┌───┐
  ┌▶ │ 0 │ ─── │ 1 │ ─── │   │ ──▶ │ 3 │
  │  └───┘     └───┘     │   │     └───┘
  │    │                 │   │
  │    └──────────────── │ 2 │
  │                      │   │
  │                      │   │
  └───────────────────── │   │
                         └───┘

meshed.util.extract_dict(d: dict, keys: Iterable)[source]

Extract items from dict d, returning them as a dict.

>>> extract_dict({'a': 1, 'b': 2, 'c': 3}, ['a', 'c'])
{'a': 1, 'c': 3}

Order matters!

>>> extract_dict({'a': 1, 'b': 2, 'c': 3}, ['c', 'a'])
{'c': 3, 'a': 1}

meshed.util.extract_items(d: dict, keys: Iterable)[source]

generator of (k, v) pairs extracted from d for keys

>>> list(extract_items({'a': 1, 'b': 2, 'c': 3}, ['a', 'c']))
[('a', 1), ('c', 3)]

meshed.util.extract_values(d: dict, keys: Iterable)[source]

Extract values from dict d, returning them:

as a tuple if len(keys) > 1
a single value if len(keys) == 1
None if not

This is used as the default extractor in DAG

>>> extract_values({'a': 1, 'b': 2, 'c': 3}, ['a', 'c'])
(1, 3)

Order matters!

>>> extract_values({'a': 1, 'b': 2, 'c': 3}, ['c', 'a'])
(3, 1)

meshed.util.filepath_to_module(file_path: str)[source]

A context manager to import a Python file as a module.

Parameters:: file_path – The file path of the Python file to import.
Yield:: The module object.

meshed.util.func_name(func) → str[source]: The func.__name__ of a callable func, or makes and returns one if that fails. To make one, it calls unamed_func_name which produces incremental names to reduce the chances of clashing

meshed.util.funcs_conjunction(*funcs)[source]

Makes a conjunction of functions. That is, func1(x) and func2(x) and ...

>>> f = funcs_conjunction(lambda x: isinstance(x, str), lambda x: len(x) >= 5)
>>> f('app')  # because length is less than 5...
False
>>> f('apple')  # length at least 5 so...
True

Note that in:

>>> f(42)
False

it is False because it is not a string. This shows that the second function is not applied to the input at all, since it doesn’t need to, and if it were, we’d get an error (length of a number?!).

meshed.util.funcs_disjunction(*funcs)[source]

Makes a disjunction of functions. That is, func1(x) or func2(x) or ...

>>> f = funcs_disjunction(lambda x: x > 10, lambda x: x < -5)
>>> f(7)
False
>>> f(-7)
True

meshed.util.if_then_else(if_func, then_func, else_func, *args, **kwargs)[source]

Tool to “functionalize” the if-then-else logic.

>>> from functools import partial
>>> f = partial(if_then_else, str.isnumeric, int, str)
>>> f('a string')
'a string'
>>> f('42')
42

meshed.util.incremental_str_maker(str_format='{:03.f}')[source]: Make a function that will produce a (incrementally) new string at every call.

meshed.util.instance_checker(class_or_tuple)[source]

Makes a boolean function that checks the instance of an object

>>> isinstance_of_str = instance_checker(str)
>>> isinstance_of_str('asdf')
True
>>> isinstance_of_str(3)
False

meshed.util.iterize(func, name=None)[source]

From an Input->Ouput function, makes a Iterator[Input]->Itertor[Output] Some call this “vectorization”, but it’s not really a vector, but an iterable, thus the name.

iterize is a partial of map.

>>> f = lambda x: x * 10
>>> f(2)
20
>>> iterized_f = iterize(f)
>>> list(iterized_f(iter([1,2,3])))
[10, 20, 30]

Consider the following pipeline:

>>> from i2 import Pipe
>>> pipe = Pipe(lambda x: x * 2, lambda x: f"hello {x}")
>>> pipe(1)
'hello 2'

But what if you wanted to use the pipeline on a “stream” of data. The following wouldn’t work:

>>> try:
...     pipe(iter([1,2,3]))
... except TypeError as e:
...     print(f"{type(e).__name__}: {e}")
...
...
TypeError: unsupported operand type(s) for *: 'list_iterator' and 'int'

Remember that error: You’ll surely encounter it at some point.

The solution to it is (often): iterize, which transforms a function that is meant to be applied to a single object, into a function that is meant to be applied to an array, or any iterable of such objects. (You might be familiar (if you use numpy for example) with the related concept of “vectorization”, or [array programming](https://en.wikipedia.org/wiki/Array_programming).)

>>> from i2 import Pipe
>>> from meshed.util import iterize
>>> from typing import Iterable
>>>
>>> pipe = Pipe(
...     iterize(lambda x: x * 2),
...     iterize(lambda x: f"hello {x}")
... )
>>> iterable = pipe([1, 2, 3])
>>> # see that the result is an iterable
>>> assert isinstance(iterable, Iterable)
>>> list(iterable)  # consume the iterable and gather it's items
['hello 2', 'hello 4', 'hello 6']

meshed.util.mk_func_name(func, exclude_names=())[source]: Makes a function name that doesn’t clash with the exclude_names iterable. Tries it’s best to not be lazy, but instead extract a name from the function itself.

meshed.util.mk_place_holder_func(arg_names_or_sig, name=None, defaults=(), annotations=())[source]

Make (working and picklable) function with a specific signature.

This is useful for testing as well as injecting compliant functions in DAG templates.

Parameters:

arg_names_or_sig – Anything that i2.Sig can accept as it’s first input. (Such as a string of argument(s), function, signature, etc.)
name – The __name__ to give the function.
defaults – If you want to add/change defaults
annotations – If you want to add/change annotations

Returns:

A (working and picklable) function with a specific signature

>>> f = mk_place_holder_func('a b', 'my_func')
>>> f(1,2)
'my_func(a=1, b=2)'

The first argument can be any expression of a signature that i2.Sig can understand. For instance, it could be a function itself. See how the function takes on mk_place_holder_func’s signature and name in the following example:

>>> g = mk_place_holder_func(mk_place_holder_func)
>>> from inspect import signature
>>> str(signature(g))  # should give the same signature as mk_place_holder_func
'(arg_names_or_sig, name=None, defaults=(), annotations=())'
>>> g(1,2,defaults=3, annotations=4)
'mk_place_holder_func(arg_names_or_sig=1, name=2, defaults=3, annotations=4)'

meshed.util.my_isinstance(obj, class_or_tuple)[source]

Same as builtin instance, but without position only constraint. Therefore, we can partialize class_or_tuple:

Otherwise, couldn’t do:

>>> isinstance_of_str = partial(my_isinstance, class_or_tuple=str)
>>> isinstance_of_str('asdf')
True
>>> isinstance_of_str(3)
False

meshed.util.named_partial(func, *args, __name__=None, **keywords)[source]

functools.partial, but with a __name__

>>> f = named_partial(print, sep='\n')
>>> f.__name__
'print'

>>> f = named_partial(print, sep='\n', __name__='now_partial_has_a_name')
>>> f.__name__
'now_partial_has_a_name'

meshed.util.numbered_suffix_renamer(name, sep='_')[source]

>>> numbered_suffix_renamer('item')
'item_1'
>>> numbered_suffix_renamer('item_1')
'item_2'

meshed.util.objects_defined_in_module(module: str | module, *, name_filt: Callable | None = None, obj_filt: Callable | None = None)[source]

Get a dictionary of objects defined in a Python module, optionally filtered by their names and values.

Parameters:

module (Union[str, ModuleType]) – The module to look up. Can either be - the module object itself, - a string specifying the module’s fully qualified name (e.g., ‘os.path’), or - a .py filepath to the module
name_filt (Optional[Callable], default=None) – An optional function used to filter the names of objects in the module. This function should take a single argument (the object name as a string) and return a boolean. Only objects whose names pass the filter (i.e., for which the function returns True) are included. If None, no name filtering is applied.
obj_filt (Optional[Callable], default=None) – An optional function used to filter the objects in the module. This function should take a single argument (the object itself) and return a boolean. Only objects that pass the filter (i.e., for which the function returns True) are included. If None, no object filtering is applied.

Returns:

A dictionary where keys are names of objects defined in the module (filtered by name_filt and obj_filt) and values are the corresponding objects.

Return type:

dict

Examples

>>> import os
>>> all_os_objects = objects_defined_in_module(os)
>>> 'removedirs' in all_os_objects
True
>>> all_os_objects['removedirs'] == os.removedirs
True

See that you can specify the module via a string too, and filter to get only callables that don’t start with an underscore:

>>> this_modules_funcs = objects_defined_in_module(
...     'meshed.util',
...     name_filt=lambda name: not name.startswith('_'),
...     obj_filt=callable,
... )
>>> callable(this_modules_funcs['objects_defined_in_module'])
True

meshed.util.ordered_set_operations(a: Iterable, b: Iterable) → Tuple[List, List, List][source]

Returns a triple (a-b, a&b, b-a) for two iterables a and b. The operations are performed as if a and b were sets, but the order in a is conserved.

>>> ordered_set_operations([1, 2, 3, 4], [3, 4, 5, 6])
([1, 2], [3, 4], [5, 6])

>>> ordered_set_operations("abcde", "cdefg")
(['a', 'b'], ['c', 'd', 'e'], ['f', 'g'])

>>> ordered_set_operations([1, 2, 2, 3], [2, 3, 3, 4])
([1], [2, 3], [4])

meshed.util.parameter_merger(*params, same_name=True, same_kind=True, same_default=True, same_annotation=True)[source]

Validates that all the params are exactly the same, returning the first if so.

This is used when hooking up functions that use the same parameters (i.e. arg names). When the name of an argument is used more than once, which kind, default, and annotation should be used in the interface of the DAG?

If they’re all the same, there’s no problem.

But if they’re not the same, we need to provide control on which to ignore.

>>> from inspect import Parameter as P
>>> PK = P.POSITIONAL_OR_KEYWORD
>>> KO = P.KEYWORD_ONLY
>>> parameter_merger(P('a', PK), P('a', PK))
<Parameter "a">
>>> parameter_merger(P('a', PK), P('different_name', PK), same_name=False)
<Parameter "a">
>>> parameter_merger(P('a', PK), P('a', KO), same_kind=False)
<Parameter "a">
>>> parameter_merger(P('a', PK), P('a', PK,  default=42), same_default=False)
<Parameter "a">
>>> parameter_merger(P('a', PK, default=42), P('a', PK), same_default=False)
<Parameter "a=42">
>>> parameter_merger(P('a', PK, annotation=int), P('a', PK), same_annotation=False)
<Parameter "a: int">

meshed.util.provides(*var_names: str) → Callable[[Callable], Callable][source]

Decorator to assign var_names to a _provides attribute of function.

This is meant to be used to indicate to a mesh what var nodes a function can source values for.

>>> @provides('a', 'b')
... def f(x):
...     return x + 1
>>> f._provides
('a', 'b')

If no var_names are given, then the function name is used as the var name:

>>> @provides()
... def g(x):
...     return x + 1
>>> g._provides
('g',)

If var_names contains '_', then the function name is used as the var name for that position:

>>> @provides('b', '_')
... def h(x):
...     return x + 1
>>> h._provides
('b', 'h')

meshed.util.replace_item_in_iterable(iterable, condition, replacement, *, egress=None)[source]

Returns a list where all items satisfying condition(item) were replaced with replacement(item).

If condition is not a callable, it will be considered as a value to check against using ==.

If replacement is not a callable, it will be considered as the actual value to replace by.

Parameters:

iterable – Input iterable of items
condition – Condition to apply to item to see if it should be replaced
replacement – (Conditional) replacement value or function
egress – The function to apply to transformed iterable

>>> replace_item_in_iterable([1,2,3,4,5], condition=2, replacement = 'two')
[1, 'two', 3, 4, 5]
>>> is_even = lambda x: x % 2 == 0
>>> replace_item_in_iterable([1,2,3,4,5], condition=is_even, replacement = 'even')
[1, 'even', 3, 'even', 5]
>>> replace_item_in_iterable([1,2,3,4,5], is_even, replacement=lambda x: x * 10)
[1, 20, 3, 40, 5]

Note that if the input iterable is not a list, tuple, or set, your output will be an iterator that you’ll have to iterate through to gather transformed items.

>>> g = replace_item_in_iterable(iter([1,2,3,4,5]), condition=2, replacement = 'two')
>>> isinstance(g, Iterator)
True

Unless you specify an egress of your choice:

>>> replace_item_in_iterable(
... iter([1,2,3,4,5]), is_even, lambda x: x * 10, egress=sorted
... )
[1, 3, 5, 20, 40]