antimatter.filetype.extract#

Module Contents#

Functions#

extract_from_file(→ Any)

Extracts data from a file based on the provided hint about the file's format.

extract_from_csv(→ List[Dict])

Extracts data from a CSV file.

extract_from_json(→ Any)

Extracts data from a JSON file.

extract_from_ndjson(→ List)

Extracts data from an NDJSON (Newline Delimited JSON) file.

extract_from_parquet(path)

Extracts data from a Parquet file.

extract_from_txt(→ str)

Extracts data from a text file.

antimatter.filetype.extract.extract_from_file(path: str, hint: str) Any#

Extracts data from a file based on the provided hint about the file’s format.

Parameters:
  • path – The path to the file.

  • hint – A hint about the file format. Supported hints are ‘csv’, ‘json’, ‘parquet’, and ‘txt’.

Returns:

The extracted data.

Raises:

errors.DataFormatError – If the file format hinted is not supported.

antimatter.filetype.extract.extract_from_csv(path: str) List[Dict]#

Extracts data from a CSV file.

This function sniffs the dialect of the CSV file and reads it into a list of dictionaries, where each dictionary represents a row with column headers as keys.

Parameters:

path – The path to the CSV file.

Returns:

A list of dictionaries representing the rows of the CSV file.

antimatter.filetype.extract.extract_from_json(path: str) Any#

Extracts data from a JSON file.

Parameters:

path – The path to the JSON file.

Returns:

The data parsed from the JSON file.

antimatter.filetype.extract.extract_from_ndjson(path: str) List#

Extracts data from an NDJSON (Newline Delimited JSON) file.

Each line of the file is a JSON object. The function reads each line and parses it as JSON.

Parameters:

path – The path to the NDJSON file.

Returns:

A list where each element is the data parsed from one line of the NDJSON file.

antimatter.filetype.extract.extract_from_parquet(path: str)#

Extracts data from a Parquet file.

Parameters:

path – The path to the Parquet file.

Returns:

The data parsed from the Parquet file as a pandas DataFrame.

antimatter.filetype.extract.extract_from_txt(path: str) str#

Extracts data from a text file.

Parameters:

path – The path to the text file.

Returns:

The content of the text file as a string.