Getting started#
This page should contain a short guide on what the plugin does and a short example on how to use the plugin.
Installation#
Use the following commands to install the plugin:
git clone https://github.com/janssenhenning/aiida-dataframe .
cd aiida-dataframe
pip install -e . # also installs aiida, if missing (but not postgres)
#pip install -e .[pre-commit,testing] # install extras for more features
verdi quicksetup # better to set up a new profile
verdi plugin list aiida.data # should now show your data plugins
Usage#
The plugin provides a Data plugin PandasFrameData
that is able to serialize and deserialize DataFrame
objects for the AiiDA
database (stored in HDF5 format in the File repository)
Example for storing a DataFrame:
import pandas as pd
import numpy as np
from aiida.plugins import DataFactory
PandasFrameData = DataFactory('dataframe.frame')
df = pd.DataFrame(
{
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo",
}
)
df_node = PandasFrameData(df)
df_node.store()
The underlying DataFrame is accessible using the df property of the Data node:
print(df_node.df.head())
Warning
Note on Mutability of DataFrame objects
Methods on pandas.DataFrame
objects return a new instance of the
object and do not mutate the original instance. This means that as soon as the
PandasFrameData
is initialized the associated
DataFrame essentially is fixed. Any operation on the dataframe on the
PandasFrameData
class will completely overwrite
and recreate the associated HDF5 file in it’s repository.
Some methods of pandas have an in_place option to mutate the original. This is
explicitly not supported if the pandas.DataFrame
is already associated
with a node the changes will be ignored if you load it from the database