Translation IO#

The id_translation.dio module defines how IDs are read and written to various data structures.

User-defined integrations#

The purpose of creating new integrations is typically to enable translation of a new data type. To get started, inherit from DataStructureIO or copy an existing integration. Don’t forget to register the implementation, or the Translator won’t be able to find it.

Automatic integration discovery#

You may add an entrypoint in the 'id_translation.dio' entrypoint group to automatically register custom implementations (as opposed to calling DataStructureIO.register() manually). The snippet below shows how the bundled integrations are registered using project entrypoints.

Entrypoints in pyproject.toml in the rsundqvist/id-translation project.#
[project.entry-points."id_translation.dio"]
# The name (e.g. 'pandas_io') is not important, but should be unique.
pandas_io = "id_translation.dio.integration.pandas:PandasIO"
dask_io = "id_translation.dio.integration.dask:DaskIO"
polars_io = "id_translation.dio.integration.polars:PolarsIO"

The loader will skip the integration if calling EntryPoint.load() raises an ImportError.

Selection process#

The Translator will call resolve_io() once per task. The first implementation whose DataStructureIO.handles_type()-method returns True will be used. The order in which implementations are considered is determined by the priority attribute.

Bundled implementations have priorities in the 1000 - 1999 range (inclusive); see the table below.

Ranking of built-in DataStructureIO implementations.#

Rank

Weight

Class

Comment

1

1999

PandasIO

Optional IO implementation for pandas types.

2

1990

PolarsIO

Optional IO implementation for polars types.

3

1980

DaskIO

Optional IO implementation for dask types.

4

1500

SingleValueIO

IO implementation for int, str and UUID types.

5

1100

SequenceIO

IO implementation for list, tuple and numpy.array types.

6

1010

SetIO

IO implementation for set types.

7

1000

DictIO

IO implementation for dict types.

New implementations default to priority=10_000 and are therefore considered first.