aiida_dataframe.data package¶

Submodules¶

aiida_dataframe.data.dataframe module¶

This module defines a AiiDA Data plugin for pandas DataFrames to be stored in the file repository as HDF5 files

class aiida_dataframe.data.dataframe.PandasFrameData(df, filename=None, **kwargs)[source]¶

Bases: SinglefileData

Data plugin for pandas DataFrame objects. Dataframes are serialized to Hdf5 using the to_hdf() method and stored in the file repository and are deserialized using read_hdf()

The whole DataFrame can be retrieved by using the df() property The names of columns and indices are stored in attributes to be queryable through the database

Parameters:: df (DataFrame) – pandas Dataframe

DEFAULT_FILENAME = 'dataframe.h5'¶

__abstractmethods__ = frozenset({})¶

__annotations__ = {'_CLS_COLLECTION': 'Type[CollectionType]', '__plugin_type_string': 'ClassVar[str]', '__query_type_string': 'ClassVar[str]', '_export_format_replacements': 'Dict[str, str]', '_hash_ignored_attributes': 'Tuple[str, ...]', '_logger': 'AiidaLoggerType', '_updatable_attributes': 'Tuple[str, ...]'}¶

__init__(df, filename=None, **kwargs)[source]¶

Construct a new instance and set the contents to that of the file.

Parameters:

file – an absolute filepath or filelike object whose contents to copy. Hint: Pass io.BytesIO(b”my string”) to construct the SinglefileData directly from a string.
filename (Optional[str]) – specify filename to use (defaults to name of provided file).

__module__ = 'aiida_dataframe.data.dataframe'¶

__parameters__ = ()¶

_abc_impl = <_abc._abc_data object>¶

_get_dataframe()[source]¶

Get dataframe associated with this node.

Return type:: DataFrame

_get_dataframe_from_repo()[source]¶

Get dataframe associated with this node from the file repository.

Return type:: DataFrame

static _hash_dataframe(df)[source]¶: Return a hash corresponding to the Data inside the dataframe (not column names)

_logger: AiidaLoggerType = <Logger aiida_dataframe.data.dataframe.PandasFrameData (INFO)>¶

_update_dataframe(df, filename=None)[source]¶

Update the stored HDF5 file. Raises if the node is already stored

Return type:: None

property df: DataFrame¶

Return the pandas DataFrame instance associated with this node

If the node is already stored in the database, each access of this property will result in a deep copy of the dataframe being returned to avoid the df property coming out of sync with the underlying HDF file via e.g. in place modifications methods within pandas

fields = {'attributes': 'QbDictField(attributes.*) -> typing.Optional[typing.Dict[str, ' 'typing.Any]]', 'computer': 'QbNumericField(computer) -> typing.Optional[int]', 'content': "QbField(attributes.content) -> <class 'bytes'>", 'ctime': 'QbNumericField(ctime) -> typing.Optional[datetime.datetime]', 'description': 'QbStrField(description) -> typing.Optional[str]', 'extras': 'QbDictField(extras.*) -> typing.Optional[typing.Dict[str, ' 'typing.Any]]', 'filename': 'QbStrField(attributes.filename) -> typing.Optional[str]', 'label': 'QbStrField(label) -> typing.Optional[str]', 'mtime': 'QbNumericField(mtime) -> typing.Optional[datetime.datetime]', 'node_type': 'QbStrField(node_type) -> typing.Optional[str]', 'pk': 'QbNumericField(pk) -> typing.Optional[int]', 'process_type': 'QbStrField(process_type) -> typing.Optional[str]', 'repository_content': 'QbDictField(repository_content) -> ' 'typing.Optional[dict[str, bytes]]', 'repository_metadata': 'QbDictField(repository_metadata) -> ' 'typing.Optional[typing.Dict[str, typing.Any]]', 'source': 'QbDictField(attributes.source.*) -> typing.Optional[dict]', 'user': 'QbNumericField(user) -> typing.Optional[int]', 'uuid': 'QbStrField(uuid) -> typing.Optional[str]'}¶

store(*args, **kwargs)[source]¶

Store the node. Before the node is stored sync the HDF5 storage with the _df attribute on the node This catches changes to the node made by using __setitem__ on the dataframe e.g. df[“A”] = new_value This is only done if the hashes of the DATA inside the dataframe does not match up

Return type:: PandasFrameData

Module contents¶

Data types provided by plugin

class aiida_dataframe.data.PandasFrameData(df, filename=None, **kwargs)[source]¶