aiida_dataframe.data package

Submodules

aiida_dataframe.data.dataframe module

This module defines a AiiDA Data plugin for pandas DataFrames to be stored in the file repository as HDF5 files

class aiida_dataframe.data.dataframe.PandasFrameData(df, filename=None, **kwargs)[source]

Bases: SinglefileData

Data plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the to_hdf() method and stored in the file repository and are deserialized using read_hdf()

The whole DataFrame can be retrieved by using the df() property The names of columns and indices are stored in attributes to be queryable through the database

Parameters:

df (DataFrame) – pandas Dataframe

DEFAULT_FILENAME = 'dataframe.h5'
__abstractmethods__ = frozenset({})
__annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '__plugin_type_string': 'ClassVar[str]', '__query_type_string': 'ClassVar[str]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'AiidaLoggerType', '_updatable_attributes': 'Tuple[str, ...]'}
__init__(df, filename=None, **kwargs)[source]

Construct a new instance and set the contents to that of the file.

Parameters:
  • file – an absolute filepath or filelike object whose contents to copy. Hint: Pass io.BytesIO(b”my string”) to construct the SinglefileData directly from a string.

  • filename (Optional[str]) – specify filename to use (defaults to name of provided file).

__module__ = 'aiida_dataframe.data.dataframe'
__parameters__ = ()
_abc_impl = <_abc._abc_data object>
_get_dataframe()[source]

Get dataframe associated with this node.

Return type:

DataFrame

_get_dataframe_from_repo()[source]

Get dataframe associated with this node from the file repository.

Return type:

DataFrame

static _hash_dataframe(df)[source]

Return a hash corresponding to the Data inside the dataframe (not column names)

_logger: AiidaLoggerType = <Logger aiida_dataframe.data.dataframe.PandasFrameData (WARNING)>
_update_dataframe(df, filename=None)[source]

Update the stored HDF5 file. Raises if the node is already stored

Return type:

None

property df: DataFrame

Return the pandas DataFrame instance associated with this node

If the node is already stored in the database, each access of this property will result in a deep copy of the dataframe being returned to avoid the df property coming out of sync with the underlying HDF file via e.g. in place modifications methods within pandas

fields = {'attributes': 'QbDictField(attributes.*) -> typing.Optional[typing.Dict[str, '                'typing.Any]]',  'computer': 'QbNumericField(computer) -> typing.Optional[int]',  'content': "QbField(attributes.content) -> <class 'bytes'>",  'ctime': 'QbNumericField(ctime) -> typing.Optional[datetime.datetime]',  'description': 'QbStrField(description) -> typing.Optional[str]',  'extras': 'QbDictField(extras.*) -> typing.Optional[typing.Dict[str, '            'typing.Any]]',  'filename': 'QbStrField(attributes.filename) -> typing.Optional[str]',  'label': 'QbStrField(label) -> typing.Optional[str]',  'mtime': 'QbNumericField(mtime) -> typing.Optional[datetime.datetime]',  'node_type': 'QbStrField(node_type) -> typing.Optional[str]',  'pk': 'QbNumericField(pk) -> typing.Optional[int]',  'process_type': 'QbStrField(process_type) -> typing.Optional[str]',  'repository_content': 'QbDictField(repository_content) -> '                        'typing.Optional[dict[str, bytes]]',  'repository_metadata': 'QbDictField(repository_metadata) -> '                         'typing.Optional[typing.Dict[str, typing.Any]]',  'source': 'QbDictField(attributes.source.*) -> typing.Optional[dict]',  'user': 'QbNumericField(user) -> typing.Optional[int]',  'uuid': 'QbStrField(uuid) -> typing.Optional[str]'}
store(*args, **kwargs)[source]

Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using __setitem__ on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA inside the dataframe does not match up

Return type:

PandasFrameData

Module contents

Data types provided by plugin

Register data types via the “aiida.data” entry point in pyproject.toml.

class aiida_dataframe.data.PandasFrameData(df, filename=None, **kwargs)[source]

Bases: SinglefileData

Data plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the to_hdf() method and stored in the file repository and are deserialized using read_hdf()

The whole DataFrame can be retrieved by using the df() property The names of columns and indices are stored in attributes to be queryable through the database

Parameters:

df (DataFrame) – pandas Dataframe

DEFAULT_FILENAME = 'dataframe.h5'
__abstractmethods__ = frozenset({})
__annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '__plugin_type_string': 'ClassVar[str]', '__query_type_string': 'ClassVar[str]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'AiidaLoggerType', '_updatable_attributes': 'Tuple[str, ...]'}
__init__(df, filename=None, **kwargs)[source]

Construct a new instance and set the contents to that of the file.

Parameters:
  • file – an absolute filepath or filelike object whose contents to copy. Hint: Pass io.BytesIO(b”my string”) to construct the SinglefileData directly from a string.

  • filename (Optional[str]) – specify filename to use (defaults to name of provided file).

__module__ = 'aiida_dataframe.data.dataframe'
__parameters__ = ()
__plugin_type_string: ClassVar[str]
__query_type_string: ClassVar[str]
_abc_impl = <_abc._abc_data object>
_get_dataframe()[source]

Get dataframe associated with this node.

Return type:

DataFrame

_get_dataframe_from_repo()[source]

Get dataframe associated with this node from the file repository.

Return type:

DataFrame

static _hash_dataframe(df)[source]

Return a hash corresponding to the Data inside the dataframe (not column names)

_logger: AiidaLoggerType = <Logger aiida_dataframe.data.dataframe.PandasFrameData (WARNING)>
_update_dataframe(df, filename=None)[source]

Update the stored HDF5 file. Raises if the node is already stored

Return type:

None

property df: DataFrame

Return the pandas DataFrame instance associated with this node

If the node is already stored in the database, each access of this property will result in a deep copy of the dataframe being returned to avoid the df property coming out of sync with the underlying HDF file via e.g. in place modifications methods within pandas

fields = {'attributes': 'QbDictField(attributes.*) -> typing.Optional[typing.Dict[str, '                'typing.Any]]',  'computer': 'QbNumericField(computer) -> typing.Optional[int]',  'content': "QbField(attributes.content) -> <class 'bytes'>",  'ctime': 'QbNumericField(ctime) -> typing.Optional[datetime.datetime]',  'description': 'QbStrField(description) -> typing.Optional[str]',  'extras': 'QbDictField(extras.*) -> typing.Optional[typing.Dict[str, '            'typing.Any]]',  'filename': 'QbStrField(attributes.filename) -> typing.Optional[str]',  'label': 'QbStrField(label) -> typing.Optional[str]',  'mtime': 'QbNumericField(mtime) -> typing.Optional[datetime.datetime]',  'node_type': 'QbStrField(node_type) -> typing.Optional[str]',  'pk': 'QbNumericField(pk) -> typing.Optional[int]',  'process_type': 'QbStrField(process_type) -> typing.Optional[str]',  'repository_content': 'QbDictField(repository_content) -> '                        'typing.Optional[dict[str, bytes]]',  'repository_metadata': 'QbDictField(repository_metadata) -> '                         'typing.Optional[typing.Dict[str, typing.Any]]',  'source': 'QbDictField(attributes.source.*) -> typing.Optional[dict]',  'user': 'QbNumericField(user) -> typing.Optional[int]',  'uuid': 'QbStrField(uuid) -> typing.Optional[str]'}
store(*args, **kwargs)[source]

Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using __setitem__ on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA inside the dataframe does not match up

Return type:

PandasFrameData