antimatter.handlers
#
Subpackages#
Submodules#
Package Contents#
Classes#
Datatype is an enumeration of the compatible datatypes supported by |
|
Abstract base DataHandler defining the supporting methods a handler for a |
|
The DictList DataHandler supports a list of dictionaries. |
|
The Dictionary DataHandler supports a single dictionary value with string |
|
The PandasDataFrame DataHandler supports a pandas DataFrame. There are |
|
The PytorchDataLoader DataHandler supports a pytorch DataLoader. There are |
|
The Scalar DataHandler supports a scalar value. |
Functions#
|
Factory returns an instance of a DataHandler matching the provided Datatype. |
- class antimatter.handlers.Datatype#
Bases:
str
,enum.Enum
Datatype is an enumeration of the compatible datatypes supported by antimatter, plus the ‘Unknown’ default placeholder.
- Unknown#
- Scalar#
- Dict#
- DictList#
- PandasDataframe#
- PytorchDataLoader#
- LangchainRetriever#
- exception antimatter.handlers.HandlerFactoryError#
Bases:
HandlerError
Error when creating a handler in the handler factory.
- class antimatter.handlers.DataHandler#
Bases:
abc.ABC
Abstract base DataHandler defining the supporting methods a handler for a Datatype must implement. A Datatype must support converting from its native type to the generic internal format and back. This conversion should be lossless so that the data added to a Capsule will behave the same when loaded back out.
- abstract from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any #
from_generic takes data in its generic form, with a list of column names and a list of data rows, and converts it into the handler’s specific data type.
- Parameters:
cols – list of column names for the data
generic_data – list of dictionaries of data
extra – extra information for the handler use when processing
- Returns:
the data in the handler’s specific data format
- abstract to_generic(data: Any) Tuple[List[str], List[List[bytes]], Dict[str, Any]] #
to_generic converts data from the handler’s specific data type into a generic form of a list of column names (if applicable), a list of data rows, and a dictionary containing any extra processing info.
- Parameters:
data – the data in the handler’s specific data format
- Returns:
the data in its generic form
- field_converter_from_generic(ft: antimatter.fieldtype.fieldtypes.FieldType) Callable[[bytes], Any] #
field_converter_from_generic gets a field converter function for the given field type that can be used to convert fields from their generic string type to their specific type.
Note that these statement should be true for all implementations, given FieldType ft.
from_gen = field_converter_from_generic(ft) to_gen = field_converter_to_generic(ft)
generic_value == to_gen(from_gen(generic_value)) field_value == from_gen(to_gen(field_value))
- Parameters:
ft – the FieldType to get the converter function for
- Returns:
a function that can convert field values from generic form
- field_converter_to_generic(ft: antimatter.fieldtype.fieldtypes.FieldType) Callable[[Any], bytes] #
field_converter_to_generic gets a field converter function for the given field type that can be used to convert fields from their specific type to their generic type.
Note that these statement should be true for all implementations, given FieldType ft.
from_gen = field_converter_from_generic(ft) to_gen = field_converter_to_generic(ft)
generic_value == to_gen(from_gen(generic_value)) field_value == from_gen(to_gen(field_value))
- Parameters:
ft – the FieldType to get the converter function for
- Returns:
a function that can convert field values to generic form
- class antimatter.handlers.DictList#
Bases:
antimatter.handlers.base.DataHandler
The DictList DataHandler supports a list of dictionaries.
- from_generic(cols: List[str], generic_data: List[List[bytes]], extra: dict) List[Dict[str, Any]] #
from_generic takes the generic data and passes it on as a list of dictionaries
- Parameters:
cols – the column names
generic_data – the capsule’s generic data format holding the row values
extra – extra data for the DataHandler
- Returns:
the data in a dictionary list format
- to_generic(data: List[Dict[str, Any]]) Tuple[List[str], List[List[bytes]], Dict[str, Any]] #
to_generic converts a list of dictionaries into the generic data format, which is essentially a no-op as DictList has the same format as generic
- Parameters:
data – the list of dictionaries to pass across as generic format
- Returns:
the data in its generic form
- class antimatter.handlers.Dictionary#
Bases:
antimatter.handlers.base.DataHandler
The Dictionary DataHandler supports a single dictionary value with string keys.
- from_generic(cols: List[str], generic_data: List[List[bytes]], extra: dict) Dict[str, Any] #
from_generic expects at most one dictionary in the generic data list, and extracts and flattens this dictionary if it can be found
- Parameters:
cols – the column names; should be the string key values in the dictionary
generic_data – the capsule’s generic data format holding the values of the single row
extra – extra data for the DataHandler
- Returns:
the dictionary value held in the generic data format
- to_generic(data: Dict[str, Any]) Tuple[List[str], List[List[bytes]], Dict[str, Any]] #
to_generic converts a single dictionary value into the generic data format, flattening the dictionary into a list and extracting the keys in the key:value pairs as the column names.
- Parameters:
data – the dictionary value to wrap into a generic format
- Returns:
the data in its generic form
- class antimatter.handlers.PandasDataFrame#
Bases:
antimatter.handlers.base.DataHandler
The PandasDataFrame DataHandler supports a pandas DataFrame. There are some restrictions on the underlying dataset which must be a two-dimensional data set, or a list of two-dimensional data sets.
- from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any #
from_generic loads the generic data into a pandas DataFrame, passing any extra parameters transparently to the DataFrame constructor.
- Parameters:
cols – the column names for the underlying data
generic_data – the data rows that are loaded into a pandas DataFrame
extra – extra data for the DataHandler, passed into the pandas DataFrame
- Returns:
the pandas DataFrame built with the dataset
- to_generic(df: Any) Tuple[List[str], List[List[bytes]], Dict[str, Any]] #
to_generic converts a pandas DataFrame into the generic data format, formatting the underlying data based on if the underlying data set is a list of two-dimensional records or a single two-dimensional record.
- Parameters:
df – the DataFrame to extract generic format data from the underlying data set
- Returns:
the data in its generic form
- class antimatter.handlers.PytorchDataLoader#
Bases:
antimatter.handlers.base.DataHandler
The PytorchDataLoader DataHandler supports a pytorch DataLoader. There are some restrictions on the underlying dataset, which must be iterable, producing two-dimensional dictionaries.
- from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any #
from_generic loads the generic data as a dataset into the pytorch DataLoader, passing any extra parameters transparently to the DataLoader constructor
- Parameters:
cols – the column names for the underlying data
generic_data – the capsule’s generic data format that is loaded into a pytorch DataLoader
extra – extra data for the DataHandler, passed into the pytorch DataLoader constructor
- Returns:
the pytorch DataLoader built with the dataset
- to_generic(dl: Any) Tuple[List[str], List[List[bytes]], Dict[str, Any]] #
to_generic converts a pytorch DataLoader into the generic data format, iterating through the DataLoader’s data set, expecting each iterated item to be a 2-dimensional dictionary.
- Parameters:
dl – the DataLoader to extract generic format data from
- Returns:
the data in its generic form
- class antimatter.handlers.ScalarHandler#
Bases:
antimatter.handlers.base.DataHandler
The Scalar DataHandler supports a scalar value.
- from_generic(cols: List[str], generic_data: List[List[bytes]], extra: Dict[str, Any]) Any #
from_generic expects a single value in a list of lists and extracts this value if it can be found.
- Parameters:
cols – ignored when converting from generic as the column is a static name.
generic_data – the generic data holder wrapping a single value.
extra – extra data for the DataHandler. Ignored when converting.
- Returns:
the value held in the generic data format
- to_generic(data: Any) Tuple[list, List[List[bytes]], Dict[str, Any]] #
to_generic converts a scalar value into the generic data format.
- Parameters:
data – the scalar value to wrap into a generic format
- Returns:
the data in its generic form
- antimatter.handlers.factory(datatype: antimatter.datatype.datatypes.Datatype) base.DataHandler #
Factory returns an instance of a DataHandler matching the provided Datatype.
- Parameters:
datatype – The Datatype to get a handler for.
- Returns:
An implementation of the abstract DataHandler for handling data of the given type.