meshed.ext.gk

This module is meant to explore a different representation of a computation graph and a different way of executing it. It is based on Yahoo’s graphkit library. The library hasn’t been maintained since 2018, so vendored and modified here). One of the main differences is that we got rid of the networkx dependency, which was used to represent the computation graph. Instead, this module uses meshed’s itools library to represent the computation graph.

# Yahoo’s graphkit library is under Apache License 2.0: # Copyright 2016, Yahoo Inc. # Licensed under the terms of the Apache License, Version 2.0. See the LICENSE file associated with the project for terms.

NOTE: This module is only meant to an exploratory “extension”. It is not planned to be maintained.

class meshed.ext.gk.Data(**kwargs)[source]: This wraps any data that is consumed or produced by a Operation. This data should also know how to serialize itself appropriately. This class an “abstract” class that should be extended by any class working with data in the HiC framework.

class meshed.ext.gk.DataPlaceholderNode[source]: A node for the Network graph that describes the name of a Data instance produced or required by a layer.

class meshed.ext.gk.DeleteInstruction[source]: An instruction for the compiled list of evaluation steps to free or delete a Data instance from the Network’s cache after it is no longer needed.

class meshed.ext.gk.FunctionalOperation(**kwargs)[source]

class meshed.ext.gk.Network(**kwargs)[source]

This is the main network implementation. The class contains all of the code necessary to weave together operations into a directed-acyclic-graph (DAG) and pass data through.

add_op(operation)[source]

Adds the given operation and its data requirements to the network graph based on the name of the operation, the names of the operation’s needs, and the names of the data it provides.

Parameters:: operation (Operation) – Operation object to add.

compile()[source]: Create a set of steps for evaluating layers and freeing memory as necessary

compute(outputs, named_inputs, method=None)[source]

Run the graph. Any inputs to the network must be passed in by name.

Parameters:

output (list) – The names of the data node you’d like to have returned once all necessary computations are complete. If you set this variable to None, all data nodes will be kept and returned at runtime.
named_inputs (dict) – A dict of key/value pairs where the keys represent the data nodes you want to populate, and the values are the concrete values you want to set for the data node.

Returns:

a dictionary of output data objects, keyed by name.

plot(filename=None, show=False)[source]

Plot the graph.

params: :param str filename:

Write the output to a png, pdf, or graphviz dot file. The extension controls the output format.

Parameters:: show (boolean) – If this is set to True, use matplotlib to show the graph diagram (Default: False)
Returns:: An instance of the pydot graph

show_layers()[source]: Shows info (name, needs, and provides) about all layers in this network.

class meshed.ext.gk.NetworkOperation(**kwargs)[source]

set_execution_method(method)[source]: Determine how the network will be executed. :param method: str

If “parallel”, execute graph operations concurrently using a threadpool.

class meshed.ext.gk.Operation(name: str = 'None', needs: list | None = None, provides: list | None = None, params: dict = <factory>)[source]

This is an abstract class representing a data transformation. To use this, please inherit from this class and customize the .compute method to your specific application.

Names may be given to this layer and its inputs and outputs. This is important when connecting layers and data in a Network object, as the names are used to construct the graph. :param str name: The name the operation (e.g. conv1, conv2, etc..) :param list needs: Names of input data objects this layer requires. :param list provides: Names of output data objects this provides. :param dict params: A dict of key/value pairs representing parameters

associated with your operation. These values will be accessible using the .params attribute of your object. NOTE: It’s important that any values stored in this argument must be pickelable.

compute(inputs)[source]

This method must be implemented to perform this layer’s feed-forward computation on a given set of inputs. :param list inputs:

A list of Data objects on which to run the layer’s feed-forward computation.

Returns list:: Should return a list of Data objects representing the results of running the feed-forward computation on inputs.

class meshed.ext.gk.compose(name=None, merge=False)[source]

This is a simple class that’s used to compose operation instances into a computation graph.

Parameters:

name (str) – A name for the graph being composed by this object.
merge (bool) – If True, this compose object will attempt to merge together operation instances that represent entire computation graphs. Specifically, if one of the operation instances passed to this compose object is itself a graph operation created by an earlier use of compose the sub-operations in that graph are compared against other operations passed to this compose instance (as well as the sub-operations of other graphs passed to this compose instance). If any two operations are the same (based on name), then that operation is computed only once, instead of multiple times (one for each time the operation appears).

meshed.ext.gk.get_data_node(name, graph)[source]: Gets a data node from a graph using its name

class meshed.ext.gk.operation(fn=None, **kwargs)[source]

This object represents an operation in a computation graph. Its relationship to other operations in the graph is specified via its needs and provides arguments.

Parameters:

fn (function) – The function used by this operation. This does not need to be specified when the operation object is instantiated and can instead be set via __call__ later.
name (str) – The name of the operation in the computation graph.
needs (list) – Names of input data objects this operation requires. These should correspond to the args of fn.
provides (list) – Names of output data objects this operation provides.
params (dict) – A dict of key/value pairs representing constant parameters associated with your operation. These can correspond to either args or kwargs of ``fn`.

class meshed.ext.gk.optional[source]

Input values in needs may be designated as optional using this modifier. If this modifier is applied to an input value, that value will be input to the operation if it is available. The function underlying the operation should have a parameter with the same name as the input value in needs, and the input value will be passed as a keyword argument if it is available.

Here is an example of an operation that uses an optional argument:

from graphkit import operation, compose
from graphkit.modifiers import optional

# Function that adds either two or three numbers.
def myadd(a, b, c=0):
    return a + b + c

# Designate c as an optional argument.
graph = compose('mygraph')(
    operator(name='myadd', needs=['a', 'b', optional('c')], provides='sum')(myadd)
)

# The graph works with and without 'c' provided as input.
assert graph({'a': 5, 'b': 2, 'c': 4})['sum'] == 11
assert graph({'a': 5, 'b': 2})['sum'] == 7

meshed.ext.gk.ready_to_delete_data_node(name, has_executed, graph)[source]

Determines if a DataPlaceholderNode is ready to be deleted from the cache.

Parameters:

name – The name of the data node to check
has_executed – set A set containing all operations that have been executed so far
graph – The networkx graph containing the operations and data nodes

Returns:

A boolean indicating whether the data node can be deleted or not.

meshed.ext.gk.ready_to_schedule_operation(op, has_executed, graph)[source]

Determines if a Operation is ready to be scheduled for execution based on what has already been executed.

Parameters:

op – The Operation object to check
has_executed – set A set containing all operations that have been executed so far
graph – The networkx graph containing the operations and data nodes

Returns:

A boolean indicating whether the operation may be scheduled for execution based on what has already been executed.