aiida_dataframe.data package¶
Submodules¶
aiida_dataframe.data.dataframe module¶
This module defines a AiiDA Data plugin for pandas DataFrames to be stored in the file repository as HDF5 files
- class aiida_dataframe.data.dataframe.PandasFrameData(df, filename=None, **kwargs)[source]¶
Bases:
SinglefileDataData plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the
to_hdf()method and stored in the file repository and are deserialized usingread_hdf()The whole DataFrame can be retrieved by using the
df()property The names of columns and indices are stored in attributes to be queryable through the database- Parameters:
df (
DataFrame) – pandas Dataframe
- DEFAULT_FILENAME = 'dataframe.h5'¶
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '__plugin_type_string': 'ClassVar[str]', '__query_type_string': 'ClassVar[str]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'AiidaLoggerType', '_updatable_attributes': 'Tuple[str, ...]'}¶
- __init__(df, filename=None, **kwargs)[source]¶
Construct a new instance and set the contents to that of the file.
- __module__ = 'aiida_dataframe.data.dataframe'¶
- __parameters__ = ()¶
- _abc_impl = <_abc._abc_data object>¶
- _get_dataframe_from_repo()[source]¶
Get dataframe associated with this node from the file repository.
- Return type:
- static _hash_dataframe(df)[source]¶
Return a hash corresponding to the Data inside the dataframe (not column names)
- _logger: AiidaLoggerType = <Logger aiida_dataframe.data.dataframe.PandasFrameData (INFO)>¶
- _update_dataframe(df, filename=None)[source]¶
Update the stored HDF5 file. Raises if the node is already stored
- Return type:
- property df: DataFrame¶
Return the pandas DataFrame instance associated with this node
If the node is already stored in the database, each access of this property will result in a deep copy of the dataframe being returned to avoid the df property coming out of sync with the underlying HDF file via e.g. in place modifications methods within pandas
- fields = {'attributes': 'QbDictField(attributes.*) -> typing.Optional[typing.Dict[str, ' 'typing.Any]]', 'computer': 'QbNumericField(computer) -> typing.Optional[int]', 'content': "QbField(attributes.content) -> <class 'bytes'>", 'ctime': 'QbNumericField(ctime) -> typing.Optional[datetime.datetime]', 'description': 'QbStrField(description) -> typing.Optional[str]', 'extras': 'QbDictField(extras.*) -> typing.Optional[typing.Dict[str, ' 'typing.Any]]', 'filename': 'QbStrField(attributes.filename) -> typing.Optional[str]', 'label': 'QbStrField(label) -> typing.Optional[str]', 'mtime': 'QbNumericField(mtime) -> typing.Optional[datetime.datetime]', 'node_type': 'QbStrField(node_type) -> typing.Optional[str]', 'pk': 'QbNumericField(pk) -> typing.Optional[int]', 'process_type': 'QbStrField(process_type) -> typing.Optional[str]', 'repository_content': 'QbDictField(repository_content) -> ' 'typing.Optional[dict[str, bytes]]', 'repository_metadata': 'QbDictField(repository_metadata) -> ' 'typing.Optional[typing.Dict[str, typing.Any]]', 'source': 'QbDictField(attributes.source.*) -> typing.Optional[dict]', 'user': 'QbNumericField(user) -> typing.Optional[int]', 'uuid': 'QbStrField(uuid) -> typing.Optional[str]'}¶
- store(*args, **kwargs)[source]¶
Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using __setitem__ on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA inside the dataframe does not match up
- Return type:
Module contents¶
Data types provided by plugin
Register data types via the “aiida.data” entry point in pyproject.toml.
- class aiida_dataframe.data.PandasFrameData(df, filename=None, **kwargs)[source]¶
Bases:
SinglefileDataData plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the
to_hdf()method and stored in the file repository and are deserialized usingread_hdf()The whole DataFrame can be retrieved by using the
df()property The names of columns and indices are stored in attributes to be queryable through the database- Parameters:
df (
DataFrame) – pandas Dataframe
- DEFAULT_FILENAME = 'dataframe.h5'¶
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '__plugin_type_string': 'ClassVar[str]', '__query_type_string': 'ClassVar[str]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'AiidaLoggerType', '_updatable_attributes': 'Tuple[str, ...]'}¶
- __init__(df, filename=None, **kwargs)[source]¶
Construct a new instance and set the contents to that of the file.
- __module__ = 'aiida_dataframe.data.dataframe'¶
- __parameters__ = ()¶
- __plugin_type_string: ClassVar[str]¶
- __query_type_string: ClassVar[str]¶
- _abc_impl = <_abc._abc_data object>¶
- _get_dataframe_from_repo()[source]¶
Get dataframe associated with this node from the file repository.
- Return type:
- static _hash_dataframe(df)[source]¶
Return a hash corresponding to the Data inside the dataframe (not column names)
- _logger: AiidaLoggerType = <Logger aiida_dataframe.data.dataframe.PandasFrameData (INFO)>¶
- _update_dataframe(df, filename=None)[source]¶
Update the stored HDF5 file. Raises if the node is already stored
- Return type:
- property df: DataFrame¶
Return the pandas DataFrame instance associated with this node
If the node is already stored in the database, each access of this property will result in a deep copy of the dataframe being returned to avoid the df property coming out of sync with the underlying HDF file via e.g. in place modifications methods within pandas
- fields = {'attributes': 'QbDictField(attributes.*) -> typing.Optional[typing.Dict[str, ' 'typing.Any]]', 'computer': 'QbNumericField(computer) -> typing.Optional[int]', 'content': "QbField(attributes.content) -> <class 'bytes'>", 'ctime': 'QbNumericField(ctime) -> typing.Optional[datetime.datetime]', 'description': 'QbStrField(description) -> typing.Optional[str]', 'extras': 'QbDictField(extras.*) -> typing.Optional[typing.Dict[str, ' 'typing.Any]]', 'filename': 'QbStrField(attributes.filename) -> typing.Optional[str]', 'label': 'QbStrField(label) -> typing.Optional[str]', 'mtime': 'QbNumericField(mtime) -> typing.Optional[datetime.datetime]', 'node_type': 'QbStrField(node_type) -> typing.Optional[str]', 'pk': 'QbNumericField(pk) -> typing.Optional[int]', 'process_type': 'QbStrField(process_type) -> typing.Optional[str]', 'repository_content': 'QbDictField(repository_content) -> ' 'typing.Optional[dict[str, bytes]]', 'repository_metadata': 'QbDictField(repository_metadata) -> ' 'typing.Optional[typing.Dict[str, typing.Any]]', 'source': 'QbDictField(attributes.source.*) -> typing.Optional[dict]', 'user': 'QbNumericField(user) -> typing.Optional[int]', 'uuid': 'QbStrField(uuid) -> typing.Optional[str]'}¶
- store(*args, **kwargs)[source]¶
Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using __setitem__ on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA inside the dataframe does not match up
- Return type: