id_translation.dio.integration.dask#

Integration for Dask types.

Module Attributes

DaskT

Supported dask types.

PartitionT

Supported dask partition types.

PartitionIO

A dask partition IO implementation.

Functions

translate_partition(part, names, tmap, part_io)

Translation a single Dask partition.

Classes

DaskIO(*[, missing_as_nan, as_category])

Optional IO implementation for dask types.

class DaskIO(*, missing_as_nan=None, as_category=False)[source]#

Bases: DataStructureIO[DaskT, str, SourceType, IdType]

Optional IO implementation for dask types.

Parameters:
  • missing_as_nan – If set, unknown IDs will be NaN. Forwarded to PandasIO.

  • as_category – Set dtype=’category’ in the result. Forwarded to PandasIO.

Notes

Combining missing_as_nan=False with as_category=True can be unpredictable in distributed contexts.

classmethod extract(translatable, names)[source]#

Extract IDs from translatable.

Parameters:
  • translatable – Data to extract IDs from.

  • names – List of names in translatable to extract IDs for.

Returns:

A dict {name: ids}.

classmethod handles_type(arg)[source]#

Return True if the implementation handles data for the type of arg.

insert(translatable, names, tmap, copy)[source]#

Insert translations into translatable.

Parameters:
  • translatable – Data to translate. Modified iff copy=False.

  • names – Names in translatable to translate.

  • tmap – Translations for IDs in translatable.

  • copy – If True, modify contents of the original translatable. Otherwise, returns a copy.

Returns:

A copy of translatable if copy=True, None otherwise.

Raises:

NotInplaceTranslatableError – If copy=False for a type which is not translatable in-place.

classmethod names(translatable)[source]#

Extract names from translatable.

Parameters:

translatable – Data to extract names from.

Returns:

A list of names to translate. Returns None if names cannot be extracted.

property partition_io#

The PartitionIO implementation used by this instance.

priority = 1980#

Determines order in which IOs are considered (higher = earlier).

Set priority < 0 to disable.

class DaskT#

Supported dask types.

alias of TypeVar(‘DaskT’, ~dask.dataframe.dask_expr._collection.DataFrame, ~dask.dataframe.dask_expr._collection.Series)

PartitionIO#

A dask partition IO implementation.

alias of PandasIO[PartitionT, str, SourceType, IdType]

class PartitionT#

Supported dask partition types.

alias of TypeVar(‘PartitionT’, ~pandas.DataFrame, ~pandas.Series)

translate_partition(part, names, tmap, part_io)[source]#

Translation a single Dask partition.