antimatter.handlers#

Subpackages#

Submodules#

Package Contents#

Classes#

Datatype

Datatype is an enumeration of the compatible datatypes supported by

DataHandler

Abstract base DataHandler defining the supporting methods a handler for a

DictList

The DictList DataHandler supports a list of dictionaries.

Dictionary

The Dictionary DataHandler supports a single dictionary value with string

PandasDataFrame

The PandasDataFrame DataHandler supports a pandas DataFrame. There are

PytorchDataLoader

The PytorchDataLoader DataHandler supports a pytorch DataLoader. There are

ScalarHandler

The Scalar DataHandler supports a scalar value.

Functions#

factory(→ base.DataHandler)

Factory returns an instance of a DataHandler matching the provided Datatype.

class antimatter.handlers.Datatype#

Bases: str, enum.Enum

Datatype is an enumeration of the compatible datatypes supported by antimatter, plus the ‘Unknown’ default placeholder.

Unknown#
Scalar#
Dict#
DictList#
PandasDataframe#
PytorchDataLoader#
LangchainRetriever#
exception antimatter.handlers.HandlerFactoryError#

Bases: HandlerError

Error when creating a handler in the handler factory.

class antimatter.handlers.DataHandler#

Bases: abc.ABC

Abstract base DataHandler defining the supporting methods a handler for a Datatype must implement. A Datatype must support converting from its native type to the generic internal format and back. This conversion should be lossless so that the data added to a Capsule will behave the same when loaded back out.

abstract from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any#

from_generic takes data in its generic form, with a list of column names and a list of data rows, and converts it into the handler’s specific data type.

Parameters:
  • cols – list of column names for the data

  • generic_data – list of dictionaries of data

  • extra – extra information for the handler use when processing

Returns:

the data in the handler’s specific data format

abstract to_generic(data: Any) Tuple[List[str], List[List[bytes]], Dict[str, Any]]#

to_generic converts data from the handler’s specific data type into a generic form of a list of column names (if applicable), a list of data rows, and a dictionary containing any extra processing info.

Parameters:

data – the data in the handler’s specific data format

Returns:

the data in its generic form

field_converter_from_generic(ft: antimatter.fieldtype.fieldtypes.FieldType) Callable[[bytes], Any]#

field_converter_from_generic gets a field converter function for the given field type that can be used to convert fields from their generic string type to their specific type.

Note that these statement should be true for all implementations, given FieldType ft.

from_gen = field_converter_from_generic(ft) to_gen = field_converter_to_generic(ft)

generic_value == to_gen(from_gen(generic_value)) field_value == from_gen(to_gen(field_value))

Parameters:

ft – the FieldType to get the converter function for

Returns:

a function that can convert field values from generic form

field_converter_to_generic(ft: antimatter.fieldtype.fieldtypes.FieldType) Callable[[Any], bytes]#

field_converter_to_generic gets a field converter function for the given field type that can be used to convert fields from their specific type to their generic type.

Note that these statement should be true for all implementations, given FieldType ft.

from_gen = field_converter_from_generic(ft) to_gen = field_converter_to_generic(ft)

generic_value == to_gen(from_gen(generic_value)) field_value == from_gen(to_gen(field_value))

Parameters:

ft – the FieldType to get the converter function for

Returns:

a function that can convert field values to generic form

class antimatter.handlers.DictList#

Bases: antimatter.handlers.base.DataHandler

The DictList DataHandler supports a list of dictionaries.

from_generic(cols: List[str], generic_data: List[List[bytes]], extra: dict) List[Dict[str, Any]]#

from_generic takes the generic data and passes it on as a list of dictionaries

Parameters:
  • cols – the column names

  • generic_data – the capsule’s generic data format holding the row values

  • extra – extra data for the DataHandler

Returns:

the data in a dictionary list format

to_generic(data: List[Dict[str, Any]]) Tuple[List[str], List[List[bytes]], Dict[str, Any]]#

to_generic converts a list of dictionaries into the generic data format, which is essentially a no-op as DictList has the same format as generic

Parameters:

data – the list of dictionaries to pass across as generic format

Returns:

the data in its generic form

class antimatter.handlers.Dictionary#

Bases: antimatter.handlers.base.DataHandler

The Dictionary DataHandler supports a single dictionary value with string keys.

from_generic(cols: List[str], generic_data: List[List[bytes]], extra: dict) Dict[str, Any]#

from_generic expects at most one dictionary in the generic data list, and extracts and flattens this dictionary if it can be found

Parameters:
  • cols – the column names; should be the string key values in the dictionary

  • generic_data – the capsule’s generic data format holding the values of the single row

  • extra – extra data for the DataHandler

Returns:

the dictionary value held in the generic data format

to_generic(data: Dict[str, Any]) Tuple[List[str], List[List[bytes]], Dict[str, Any]]#

to_generic converts a single dictionary value into the generic data format, flattening the dictionary into a list and extracting the keys in the key:value pairs as the column names.

Parameters:

data – the dictionary value to wrap into a generic format

Returns:

the data in its generic form

class antimatter.handlers.PandasDataFrame#

Bases: antimatter.handlers.base.DataHandler

The PandasDataFrame DataHandler supports a pandas DataFrame. There are some restrictions on the underlying dataset which must be a two-dimensional data set, or a list of two-dimensional data sets.

from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any#

from_generic loads the generic data into a pandas DataFrame, passing any extra parameters transparently to the DataFrame constructor.

Parameters:
  • cols – the column names for the underlying data

  • generic_data – the data rows that are loaded into a pandas DataFrame

  • extra – extra data for the DataHandler, passed into the pandas DataFrame

Returns:

the pandas DataFrame built with the dataset

to_generic(df: Any) Tuple[List[str], List[List[bytes]], Dict[str, Any]]#

to_generic converts a pandas DataFrame into the generic data format, formatting the underlying data based on if the underlying data set is a list of two-dimensional records or a single two-dimensional record.

Parameters:

df – the DataFrame to extract generic format data from the underlying data set

Returns:

the data in its generic form

class antimatter.handlers.PytorchDataLoader#

Bases: antimatter.handlers.base.DataHandler

The PytorchDataLoader DataHandler supports a pytorch DataLoader. There are some restrictions on the underlying dataset, which must be iterable, producing two-dimensional dictionaries.

from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any#

from_generic loads the generic data as a dataset into the pytorch DataLoader, passing any extra parameters transparently to the DataLoader constructor

Parameters:
  • cols – the column names for the underlying data

  • generic_data – the capsule’s generic data format that is loaded into a pytorch DataLoader

  • extra – extra data for the DataHandler, passed into the pytorch DataLoader constructor

Returns:

the pytorch DataLoader built with the dataset

to_generic(dl: Any) Tuple[List[str], List[List[bytes]], Dict[str, Any]]#

to_generic converts a pytorch DataLoader into the generic data format, iterating through the DataLoader’s data set, expecting each iterated item to be a 2-dimensional dictionary.

Parameters:

dl – the DataLoader to extract generic format data from

Returns:

the data in its generic form

class antimatter.handlers.ScalarHandler#

Bases: antimatter.handlers.base.DataHandler

The Scalar DataHandler supports a scalar value.

from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any#

from_generic expects a single value in a list of lists and extracts this value if it can be found.

Parameters:
  • cols – ignored when converting from generic as the column is a static name.

  • generic_data – the generic data holder wrapping a single value.

  • extra – extra data for the DataHandler. Ignored when converting.

Returns:

the value held in the generic data format

to_generic(data: Any) Tuple[list, List[List[bytes]], Dict[str, Any]]#

to_generic converts a scalar value into the generic data format.

Parameters:

data – the scalar value to wrap into a generic format

Returns:

the data in its generic form

antimatter.handlers.factory(datatype: antimatter.datatype.datatypes.Datatype) base.DataHandler#

Factory returns an instance of a DataHandler matching the provided Datatype.

Parameters:

datatype – The Datatype to get a handler for.

Returns:

An implementation of the abstract DataHandler for handling data of the given type.