Data formats supported#

This notebook demonstrates loading data from and to various formats supported by the library. This includes popular formats such as pandas dataframes, scalars, list of dictionaries, pytorch dataloader, dictionaries, etc. Users will learn how to handle different data formats seamlessly within the library environment.

[ ]:
!pip install "antimatter[all]"
[17]:
import os
from antimatter import new_domain
from antimatter.builders import WriteContextBuilder, ReadContextBuilder, WriteContextHookMode
from antimatter.datatype.datatypes import Datatype
from antimatter.datatype.infer import infer_datatype

Register a domain and create a read/write context#

[18]:
# Either create a new domain or use an existing one
if True:
    sess = new_domain("test@antimatter.io")
    print ("domain: %s" % (sess.domain_id))
    # print(f"sess = Session(domain='{sess.domain_id}', api_key='{sess.api_key}')")
else:
    sess = Session(domain='<domain_id>', api_key='<api_key>')

file_name = "/tmp/testdata.capsule"
domain: dm-dXMzmre9wcx

Load the data#

DF#

[19]:
# Load dataset
import pandas as pd

data = [
    {"id":1,"first_name":"Amanda","last_name":"Jordan","email":"ajordan0@com.com","gender":"Female","ip_address":"1.197.201.2","cc":"6759521864920116","country":"Indonesia","birthdate":"3\\/8\\/1971","salary":49756.53,"title":"Internal Auditor","comments":"Hello friends, my name is Alice Johnson and I just turned 29 years old! \\ud83c\\udf89 I am looking forward to connecting with all of you. Feel free to drop me a line at alice.johnson@gmail.com or call me at 415-123-4567."},
    {"id":2,"first_name":"Albert","last_name":"Freeman","email":"afreeman1@is.gd","gender":"Male","ip_address":"218.111.175.34","cc":"","country":"Canada","birthdate":"1\\/16\\/1968","salary":150280.17,"title":"Accountant IV","comments":"Customer feedback: I recently visited your store at 5678 Pine Avenue, Dallas, TX 75201. My name is Jane Doe, age 43. I had a wonderful experience and the staff was very friendly. You can reach out to me at janedoe@yahoo.com for any further details."},
    {"id":3,"first_name":"Evelyn","last_name":"Morgan","email":"emorgan2@altervista.org","gender":"Female","ip_address":"7.161.136.94","cc":"6767119071901597","country":"Russia","birthdate":"2\\/1\\/1960","salary":144972.51,"title":"Structural Engineer","comments":"Booking Confirmation: Thank you, David Smith (DOB: 01\\/12\\/1978) for booking with us. We have received your payment through the credit card ending with 1234. Your booking ID is #67890. Please save this email for your records. For any queries, contact us at david.smith@hotmail.com."},
    {"id":4,"first_name":"Denise","last_name":"Riley","email":"driley3@gmpg.org","gender":"Female","ip_address":"140.35.109.83","cc":"3576031598965625","country":"China","birthdate":"4\\/8\\/1997","salary":90263.05,"title":"Senior Cost Accountant","comments":"Hi, I am Emily Brown, aged 33, and I recently moved to 123 Harmony Lane, Los Angeles, CA 90001. I am looking to make new friends in the neighborhood. Feel free to call me at 323-987-6543 or email me at emilybrown@aol.com."},
    {"id":5,"first_name":"Carlos","last_name":"Burns","email":"cburns4@miitbeian.gov.cn","gender":"","ip_address":"169.113.235.40","cc":"5602256255204850","country":"South Africa","birthdate":"","salary":123.0,"title":"","comments":"Urgent: My name is Sarah Lee, my SSN is 512-34-6789. I noticed some unauthorized transactions on my credit card number ending in 5678. I am 39 years old, and I urgently need assistance with this. Please contact me at 213-123-9876 or sarahlee@gmail.com."},
    {"id":6,"first_name":"Kathryn","last_name":"White","email":"kwhite5@google.com","gender":"Female","ip_address":"195.131.81.179","cc":"3583136326049310","country":"Indonesia","birthdate":"2\\/25\\/1983","salary":69227.11,"title":"Account Executive","comments":"Hello, I\'m Mark Thompson. I\\u2019m 36 years old, residing at 3456 Elm Street, Austin, TX 78701. If anyone nearby wants to connect, feel free to email me at mark.thompson@yahoo.com or call 512-345-6789."},
    {"id":7,"first_name":"Samuel","last_name":"Holmes","email":"sholmes6@foxnews.com","gender":"Male","ip_address":"232.234.81.197","cc":"3582641366974690","country":"Portugal","birthdate":"12\\/18\\/1987","salary":14247.62,"title":"Senior Financial Analyst","comments":"Hi, my name is Michael Martinez, I am 40 years old, and my SSN is 543-21-6789. Please contact me regarding my account details at 415-234-5678 or michael.martinez@hotmail.com."},
    {"id":8,"first_name":"Harry","last_name":"Howell","email":"hhowell7@eepurl.com","gender":"Male","ip_address":"91.235.51.73","cc":"","country":"Bosnia and Herzegovina","birthdate":"3\\/1\\/1962","salary":186469.43,"title":"Web Developer IV","comments":"Customer Feedback: I\'m Linda White, 32 years old. I had a great experience shopping online at your store. Reach me at 456 Elm Street, Phoenix, AZ 85001 or linda.white@gmail.com for further feedback."},
    {"id":9,"first_name":"Jose","last_name":"Foster","email":"jfoster8@yelp.com","gender":"Male","ip_address":"132.31.53.61","cc":"","country":"South Korea","birthdate":"3\\/27\\/1992","salary":231067.84,"title":"Software Test Engineer I","comments":"Hey, it\\u2019s Lisa Davis, I am 28 years old. I noticed a discrepancy in my latest bill. My address is 789 Pine Street, Miami, FL 33101. Please, get in touch at lisa.davis@aol.com or 305-123-4567."},
    {"id":10,"first_name":"Emily","last_name":"Stewart","email":"estewart9@opensource.org","gender":"Female","ip_address":"143.28.251.245","cc":"3574254110301671","country":"Nigeria","birthdate":"1\\/28\\/1997","salary":27234.28,"title":"Health Coach IV","comments":"Support Request: My name is Joseph Johnson. I am facing issues with my recent purchase. Reach me at 123-45-6789 or at joseph.johnson@hotmail.com for order number #56789 details."}
]

df = pd.DataFrame(data)

df.head()
[19]:
id first_name last_name email gender ip_address cc country birthdate salary title comments
0 1 Amanda Jordan ajordan0@com.com Female 1.197.201.2 6759521864920116 Indonesia 3\/8\/1971 49756.53 Internal Auditor Hello friends, my name is Alice Johnson and I ...
1 2 Albert Freeman afreeman1@is.gd Male 218.111.175.34 Canada 1\/16\/1968 150280.17 Accountant IV Customer feedback: I recently visited your sto...
2 3 Evelyn Morgan emorgan2@altervista.org Female 7.161.136.94 6767119071901597 Russia 2\/1\/1960 144972.51 Structural Engineer Booking Confirmation: Thank you, David Smith (...
3 4 Denise Riley driley3@gmpg.org Female 140.35.109.83 3576031598965625 China 4\/8\/1997 90263.05 Senior Cost Accountant Hi, I am Emily Brown, aged 33, and I recently ...
4 5 Carlos Burns cburns4@miitbeian.gov.cn 169.113.235.40 5602256255204850 South Africa 123.00 Urgent: My name is Sarah Lee, my SSN is 512-34...

Scalar data#

[20]:
scalar_data = df['comments'].iloc[0]
scalar_data
[20]:
'Hello friends, my name is Alice Johnson and I just turned 29 years old! \\ud83c\\udf89 I am looking forward to connecting with all of you. Feel free to drop me a line at alice.johnson@gmail.com or call me at 415-123-4567.'

List of Dicts#

[21]:
list_dict_data = df.to_dict('records')
list_dict_data[:2]
[21]:
[{'id': 1,
  'first_name': 'Amanda',
  'last_name': 'Jordan',
  'email': 'ajordan0@com.com',
  'gender': 'Female',
  'ip_address': '1.197.201.2',
  'cc': '6759521864920116',
  'country': 'Indonesia',
  'birthdate': '3\\/8\\/1971',
  'salary': 49756.53,
  'title': 'Internal Auditor',
  'comments': 'Hello friends, my name is Alice Johnson and I just turned 29 years old! \\ud83c\\udf89 I am looking forward to connecting with all of you. Feel free to drop me a line at alice.johnson@gmail.com or call me at 415-123-4567.'},
 {'id': 2,
  'first_name': 'Albert',
  'last_name': 'Freeman',
  'email': 'afreeman1@is.gd',
  'gender': 'Male',
  'ip_address': '218.111.175.34',
  'cc': '',
  'country': 'Canada',
  'birthdate': '1\\/16\\/1968',
  'salary': 150280.17,
  'title': 'Accountant IV',
  'comments': 'Customer feedback: I recently visited your store at 5678 Pine Avenue, Dallas, TX 75201. My name is Jane Doe, age 43. I had a wonderful experience and the staff was very friendly. You can reach out to me at janedoe@yahoo.com for any further details.'}]

Dict#

[22]:
dict_data = list_dict_data[0]
dict_data
[22]:
{'id': 1,
 'first_name': 'Amanda',
 'last_name': 'Jordan',
 'email': 'ajordan0@com.com',
 'gender': 'Female',
 'ip_address': '1.197.201.2',
 'cc': '6759521864920116',
 'country': 'Indonesia',
 'birthdate': '3\\/8\\/1971',
 'salary': 49756.53,
 'title': 'Internal Auditor',
 'comments': 'Hello friends, my name is Alice Johnson and I just turned 29 years old! \\ud83c\\udf89 I am looking forward to connecting with all of you. Feel free to drop me a line at alice.johnson@gmail.com or call me at 415-123-4567.'}

Pytorch data loader#

[23]:
from torch.utils.data import DataLoader
dl = DataLoader(list_dict_data)
dl
[23]:
<torch.utils.data.dataloader.DataLoader at 0x2823282f0>

Convert all the formats#

The blocks below will try to convert all the above data formats to each other.

[24]:
frm = to = [Datatype.PandasDataframe, Datatype.Scalar, Datatype.DictList, Datatype.PytorchDataLoader, Datatype.Dict]

Convert df to all the other formats#

[25]:
sess.encapsulate(data=df, write_context="default", path=file_name)

capsule = sess.load_capsule(path=file_name, read_context="default")
read_data = capsule.data()

# df & read_data should be the same.
assert type(df) == type(read_data)

for t in to:
    print(f"Converting from {type(df)} to {t}")
    c = capsule.data_as(t)

    # c & read_data should be the same.
    assert infer_datatype(c) == t
Converting from <class 'pandas.core.frame.DataFrame'> to Datatype.PandasDataframe
Converting from <class 'pandas.core.frame.DataFrame'> to Datatype.Scalar
Converting from <class 'pandas.core.frame.DataFrame'> to Datatype.DictList
Converting from <class 'pandas.core.frame.DataFrame'> to Datatype.PytorchDataLoader
Converting from <class 'pandas.core.frame.DataFrame'> to Datatype.Dict

Convert list of dicts to all the other formats#

[26]:
sess.encapsulate(data=list_dict_data, write_context="default", path=file_name)

capsule = sess.load_capsule(path=file_name, read_context="default")
read_data = capsule.data()

# list_dict_data & read_data should be the same.
assert type(list_dict_data) == type(read_data)

for t in to:
    print(f"Converting from {type(list_dict_data)} to {t}")
    c = capsule.data_as(t)

    # c & read_data should be the same.
    assert infer_datatype(c) == t
Converting from <class 'list'> to Datatype.PandasDataframe
Converting from <class 'list'> to Datatype.Scalar
Converting from <class 'list'> to Datatype.DictList
Converting from <class 'list'> to Datatype.PytorchDataLoader
Converting from <class 'list'> to Datatype.Dict

Convert scalar to all the other formats#

[27]:
sess.encapsulate(data=scalar_data, write_context="default", path=file_name)

capsule = sess.load_capsule(path=file_name, read_context="default")
read_data = capsule.data()

# scalar_data & read_data should be the same.
assert type(scalar_data) == type(read_data)

for t in to:
    print(f"Converting from {type(scalar_data)} to {t}")
    c = capsule.data_as(t)

    # c & read_data should be the same.
    assert infer_datatype(c) == t
Converting from <class 'str'> to Datatype.PandasDataframe
Converting from <class 'str'> to Datatype.Scalar
Converting from <class 'str'> to Datatype.DictList
Converting from <class 'str'> to Datatype.PytorchDataLoader
Converting from <class 'str'> to Datatype.Dict

Convert dictionary to all the other formats#

[28]:
sess.encapsulate(data=dict_data, write_context="default", path=file_name)

capsule = sess.load_capsule(path=file_name, read_context="default")
read_data = capsule.data()

# dict_data & read_data should be the same.
assert type(dict_data) == type(read_data)

for t in to:
    print(f"Converting from {type(dict_data)} to {t}")
    c = capsule.data_as(t)

    # c & read_data should be the same.
    assert infer_datatype(c) == t
Converting from <class 'dict'> to Datatype.PandasDataframe
Converting from <class 'dict'> to Datatype.Scalar
Converting from <class 'dict'> to Datatype.DictList
Converting from <class 'dict'> to Datatype.PytorchDataLoader
Converting from <class 'dict'> to Datatype.Dict

Convert pytorch dataloader to all the other formats#

[29]:
sess.encapsulate(data=dl, write_context="default", path=file_name)

capsule = sess.load_capsule(path=file_name, read_context="default")
read_data = capsule.data()

# dl & read_data should be the same.
assert type(dl) == type(read_data)

for t in to:
    print(f"Converting from {type(dl)} to {t}")
    c = capsule.data_as(t)

    # c & read_data should be the same.
    assert infer_datatype(c) == t
Converting from <class 'torch.utils.data.dataloader.DataLoader'> to Datatype.PandasDataframe
Converting from <class 'torch.utils.data.dataloader.DataLoader'> to Datatype.Scalar
Converting from <class 'torch.utils.data.dataloader.DataLoader'> to Datatype.DictList
Converting from <class 'torch.utils.data.dataloader.DataLoader'> to Datatype.PytorchDataLoader
Converting from <class 'torch.utils.data.dataloader.DataLoader'> to Datatype.Dict