creek.labeling

Tools to label/annotate stream elements

The motivating example is the case of an incoming stream that we need to segment, according to the detection of an event.

For example, take a stream of integers and detect the event “multiple of 5”:

1->2->3->4->'multiple of 5'->6->7->...

When the stream is “live”, we don’t want to process it immediately, but instead we prefer to annotate it on the fly, by adding some metadata to it.

The simplest addition of metadata information could look like:

3->4->('multiple of 5', 5) -> 6 -> ...

One critique of using tuples to contain both annotated (here, 5) and annotation (here, 'multiple of 5') is that the semantics aren’t explicit. The fact that the original element was annotated the distinction of annotation and annotated is based on an implicit convention. This is not too much of a problem here, but becomes unwieldy in more complex situations, for example, if we want to accommodate for multiple labels.

A LabelledElement x has an attribute x.element, and a container of labels x.labels (list, set or dict).

Multilabels can be used to segment streams into overlapping segments.

(group0)->(group0)->(group0, group1)->(group0, group1)-> (group1)->(group1)->...
class creek.labeling.DictLabeledElement(element: NewType.<locals>.new_type)[source]

A LabeledElement that uses a dict as the labels container. Use this when you need to keep labels classified and have quick access to the a specific class of labels. Note that when adding a label, you need to specify it as a {key: val, …} dict, the keys being the (hashable) label kinds, and the vals being the values for those kinds.

>>> x = DictLabeledElement(42).add_label({'string': 'forty-two'})
>>> x.element
42
>>> x.labels
{'string': 'forty-two'}
>>> x.add_label({'type': 'number', 'prime': False})
DictLabeledElement(42)
>>> x.element
42
>>> assert x.labels == {'string': 'forty-two', 'type': 'number', 'prime': False}
mk_new_labels_container

alias of builtins.dict

class creek.labeling.LabeledElement(element: NewType.<locals>.new_type)[source]

Abstract class to label elements – that is, associate some metadata to an element.

To make a concrete LabeledElement, one must subclass LabeledElement and provide

  • a mk_new_labels_container, a LabelFactory, which is a callable that takes no input and returns a new empty labels container

  • a add_new_label, an AddLabel, a (Labels, Label) callable that adds a single label to the labels container.

class creek.labeling.ListLabeledElement(element: NewType.<locals>.new_type)[source]

A LabeledElement that uses a list as the labels container. Use this when you need to use unhashable labels, or label insertion order matters, or don’t need fast label in labels checks or label deduplication.

>>> x = ListLabeledElement(42).add_label('forty-two')
>>> x.element
42
>>> x.labels
['forty-two']
>>> x.add_label('number')
ListLabeledElement(42)
>>> x.element
42
>>> assert x.labels == ['forty-two', 'number']
static add_new_label(self, object, /)

Append object to the end of the list.

mk_new_labels_container

alias of builtins.list

class creek.labeling.SetLabeledElement(element: NewType.<locals>.new_type)[source]

A LabeledElement that uses a set as the labels container. Use this when you want to get fast label in labels check and/or maintain the labels unduplicated. Note that since set is the container, the labels will have to be hashable.

>>> x = SetLabeledElement(42).add_label('forty-two')
>>> x.element
42
>>> x.labels
{'forty-two'}
>>> x.add_label('number')
SetLabeledElement(42)
>>> x.element
42
>>> assert x.labels == {'forty-two', 'number'}
static add_new_label()

Add an element to a set.

This has no effect if the element is already present.

mk_new_labels_container

alias of builtins.set

creek.labeling.label_element(elem: Union[NewType.<locals>.new_type, creek.labeling.LabeledElement], label: NewType.<locals>.new_type, labeled_element_cls)creek.labeling.LabeledElement[source]

Label element with label (or add this label to the existing labels).

The labeled_element_cls, the LabeledElement class to use to label the element, is meant to be “partialized out”, like this:

>>> from functools import partial
>>> from creek.labeling import DictLabeledElement
>>> my_label_element = partial(label_element, labeled_element_cls=DictLabeledElement)
>>> # and then just use my_label_element(elem, label) to label elem

You’ll probably often want to use DictLabeledElement, because, for example:

` {'n_channels': 2, 'phase', 2, 'session': 16987485} `

is a lot easier (and less dangerous) to use then, say:

` [2, 2, 16987485] `

But there are cases where, say:

>>> from creek.labeling import SetLabeledElement
>>> my_label_element = partial(label_element, labeled_element_cls=SetLabeledElement)
>>> x = my_label_element(42, 'divisible_by_seven')
>>> _ = my_label_element(x, 'is_a_number')
>>> 'divisible_by_seven' in x  # equivalent to 'divisible_by_seven' in x.labels
True
>>> x.labels.issuperset({'is_a_number', 'divisible_by_seven'})
True

is more convenient to use then using a dict with boolean values to do the same

Parameters
  • elem – The element that is being labeled

  • label – The label to add to the element

  • labeled_element_cls – The LabeledElement class to use to label the element

Returns