aiida_dataframe.data package#

Submodules#

aiida_dataframe.data.dataframe module#

This module defines a AiiDA Data plugin for pandas DataFrames to be stored in the file repository as HDF5 files

class aiida_dataframe.data.dataframe.PandasFrameData(df: DataFrame, filename: str | None = None, **kwargs: Any)[source]#

Bases: SinglefileData

Data plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the to_hdf() method and stored in the file repository and are deserialized using read_hdf()

The whole DataFrame can be retrieved by using the df() property The names of columns and indices are stored in attributes to be queryable through the database

Parameters:

df – pandas Dataframe

DEFAULT_FILENAME = 'dataframe.h5'#
__abstractmethods__ = frozenset({})#
__annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'Optional[Logger]', '_plugin_type_string': 'ClassVar[str]', '_query_type_string': 'ClassVar[str]', '_updatable_attributes': 'Tuple[str, ...]'}#
__init__(df: DataFrame, filename: str | None = None, **kwargs: Any) None[source]#

Construct a new instance and set the contents to that of the file.

Parameters:
  • file – an absolute filepath or filelike object whose contents to copy. Hint: Pass io.BytesIO(b”my string”) to construct the SinglefileData directly from a string.

  • filename – specify filename to use (defaults to name of provided file).

__module__ = 'aiida_dataframe.data.dataframe'#
__parameters__ = ()#
_abc_impl = <_abc._abc_data object>#
_get_dataframe() DataFrame[source]#

Get dataframe associated with this node.

_get_dataframe_from_repo() DataFrame[source]#

Get dataframe associated with this node from the file repository.

static _hash_dataframe(df)[source]#

Return a hash corresponding to the Data inside the dataframe (not column names)

_logger: Logger | None = <Logger aiida_dataframe.data.dataframe.PandasFrameData (WARNING)>#
_plugin_type_string: ClassVar[str] = 'data.dataframe.frame.PandasFrameData.'#
_query_type_string: ClassVar[str] = 'data.dataframe.frame.'#
_update_dataframe(df: DataFrame, filename: str | None = None) None[source]#

Update the stored HDF5 file. Raises if the node is already stored

property df: DataFrame#

Return the pandas DataFrame instance associated with this node

store(*args, **kwargs) PandasFrameData[source]#

Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using setitem on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA does not match up

Module contents#

Data types provided by plugin

Register data types via the “aiida.data” entry point in pyproject.toml.

class aiida_dataframe.data.PandasFrameData(df: DataFrame, filename: str | None = None, **kwargs: Any)[source]#

Bases: SinglefileData

Data plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the to_hdf() method and stored in the file repository and are deserialized using read_hdf()

The whole DataFrame can be retrieved by using the df() property The names of columns and indices are stored in attributes to be queryable through the database

Parameters:

df – pandas Dataframe

DEFAULT_FILENAME = 'dataframe.h5'#
__abstractmethods__ = frozenset({})#
__annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'Optional[Logger]', '_plugin_type_string': 'ClassVar[str]', '_query_type_string': 'ClassVar[str]', '_updatable_attributes': 'Tuple[str, ...]'}#
__init__(df: DataFrame, filename: str | None = None, **kwargs: Any) None[source]#

Construct a new instance and set the contents to that of the file.

Parameters:
  • file – an absolute filepath or filelike object whose contents to copy. Hint: Pass io.BytesIO(b”my string”) to construct the SinglefileData directly from a string.

  • filename – specify filename to use (defaults to name of provided file).

__module__ = 'aiida_dataframe.data.dataframe'#
__parameters__ = ()#
_abc_impl = <_abc._abc_data object>#
_get_dataframe() DataFrame[source]#

Get dataframe associated with this node.

_get_dataframe_from_repo() DataFrame[source]#

Get dataframe associated with this node from the file repository.

static _hash_dataframe(df)[source]#

Return a hash corresponding to the Data inside the dataframe (not column names)

_logger: Logger | None = <Logger aiida_dataframe.data.dataframe.PandasFrameData (WARNING)>#
_plugin_type_string: ClassVar[str] = 'data.dataframe.frame.PandasFrameData.'#
_query_type_string: ClassVar[str] = 'data.dataframe.frame.'#
_update_dataframe(df: DataFrame, filename: str | None = None) None[source]#

Update the stored HDF5 file. Raises if the node is already stored

property df: DataFrame#

Return the pandas DataFrame instance associated with this node

store(*args, **kwargs) PandasFrameData[source]#

Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using setitem on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA does not match up