aiida_dataframe.data package#
Submodules#
aiida_dataframe.data.dataframe module#
This module defines a AiiDA Data plugin for pandas DataFrames to be stored in the file repository as HDF5 files
- class aiida_dataframe.data.dataframe.PandasFrameData(df: DataFrame, filename: str | None = None, **kwargs: Any)[source]#
Bases:
SinglefileData
Data plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the
to_hdf()
method and stored in the file repository and are deserialized usingread_hdf()
The whole DataFrame can be retrieved by using the
df()
property The names of columns and indices are stored in attributes to be queryable through the database- Parameters:
df – pandas Dataframe
- DEFAULT_FILENAME = 'dataframe.h5'#
- __abstractmethods__ = frozenset({})#
- __annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'Optional[Logger]', '_plugin_type_string': 'ClassVar[str]', '_query_type_string': 'ClassVar[str]', '_updatable_attributes': 'Tuple[str, ...]'}#
- __init__(df: DataFrame, filename: str | None = None, **kwargs: Any) None [source]#
Construct a new instance and set the contents to that of the file.
- Parameters:
file – an absolute filepath or filelike object whose contents to copy. Hint: Pass io.BytesIO(b”my string”) to construct the SinglefileData directly from a string.
filename – specify filename to use (defaults to name of provided file).
- __module__ = 'aiida_dataframe.data.dataframe'#
- __parameters__ = ()#
- _abc_impl = <_abc._abc_data object>#
- _get_dataframe_from_repo() DataFrame [source]#
Get dataframe associated with this node from the file repository.
- static _hash_dataframe(df)[source]#
Return a hash corresponding to the Data inside the dataframe (not column names)
- _update_dataframe(df: DataFrame, filename: str | None = None) None [source]#
Update the stored HDF5 file. Raises if the node is already stored
- store(*args, **kwargs) PandasFrameData [source]#
Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using setitem on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA does not match up
Module contents#
Data types provided by plugin
Register data types via the “aiida.data” entry point in pyproject.toml.
- class aiida_dataframe.data.PandasFrameData(df: DataFrame, filename: str | None = None, **kwargs: Any)[source]#
Bases:
SinglefileData
Data plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the
to_hdf()
method and stored in the file repository and are deserialized usingread_hdf()
The whole DataFrame can be retrieved by using the
df()
property The names of columns and indices are stored in attributes to be queryable through the database- Parameters:
df – pandas Dataframe
- DEFAULT_FILENAME = 'dataframe.h5'#
- __abstractmethods__ = frozenset({})#
- __annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'Optional[Logger]', '_plugin_type_string': 'ClassVar[str]', '_query_type_string': 'ClassVar[str]', '_updatable_attributes': 'Tuple[str, ...]'}#
- __init__(df: DataFrame, filename: str | None = None, **kwargs: Any) None [source]#
Construct a new instance and set the contents to that of the file.
- Parameters:
file – an absolute filepath or filelike object whose contents to copy. Hint: Pass io.BytesIO(b”my string”) to construct the SinglefileData directly from a string.
filename – specify filename to use (defaults to name of provided file).
- __module__ = 'aiida_dataframe.data.dataframe'#
- __parameters__ = ()#
- _abc_impl = <_abc._abc_data object>#
- _get_dataframe_from_repo() DataFrame [source]#
Get dataframe associated with this node from the file repository.
- static _hash_dataframe(df)[source]#
Return a hash corresponding to the Data inside the dataframe (not column names)
- _update_dataframe(df: DataFrame, filename: str | None = None) None [source]#
Update the stored HDF5 file. Raises if the node is already stored
- store(*args, **kwargs) PandasFrameData [source]#
Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using setitem on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA does not match up