antimatter.filetype.extract
#
Module Contents#
Functions#
|
Extracts data from a file based on the provided hint about the file's format. |
|
Extracts data from a CSV file. |
|
Extracts data from a JSON file. |
|
Extracts data from an NDJSON (Newline Delimited JSON) file. |
|
Extracts data from a Parquet file. |
|
Extracts data from a text file. |
- antimatter.filetype.extract.extract_from_file(path: str, hint: str) Any #
Extracts data from a file based on the provided hint about the file’s format.
- Parameters:
path – The path to the file.
hint – A hint about the file format. Supported hints are ‘csv’, ‘json’, ‘parquet’, and ‘txt’.
- Returns:
The extracted data.
- Raises:
errors.DataFormatError – If the file format hinted is not supported.
- antimatter.filetype.extract.extract_from_csv(path: str) List[Dict] #
Extracts data from a CSV file.
This function sniffs the dialect of the CSV file and reads it into a list of dictionaries, where each dictionary represents a row with column headers as keys.
- Parameters:
path – The path to the CSV file.
- Returns:
A list of dictionaries representing the rows of the CSV file.
- antimatter.filetype.extract.extract_from_json(path: str) Any #
Extracts data from a JSON file.
- Parameters:
path – The path to the JSON file.
- Returns:
The data parsed from the JSON file.
- antimatter.filetype.extract.extract_from_ndjson(path: str) List #
Extracts data from an NDJSON (Newline Delimited JSON) file.
Each line of the file is a JSON object. The function reads each line and parses it as JSON.
- Parameters:
path – The path to the NDJSON file.
- Returns:
A list where each element is the data parsed from one line of the NDJSON file.
- antimatter.filetype.extract.extract_from_parquet(path: str)#
Extracts data from a Parquet file.
- Parameters:
path – The path to the Parquet file.
- Returns:
The data parsed from the Parquet file as a pandas DataFrame.
- antimatter.filetype.extract.extract_from_txt(path: str) str #
Extracts data from a text file.
- Parameters:
path – The path to the text file.
- Returns:
The content of the text file as a string.