Translation IO#

The id_translation.dio module defines how IDs are read and written to various data structures.

Runtime arguments#

Relevant methods (e.g. Translator.translate()) accept an io_kwargs argument, which may be used to customize the behavior of the DataStructureIO implementation. Exceptions raised due to invalid io_kwargs arguments are logged and suppressed.

Arguments are implementation-specific. See PandasIO for an example.

User-defined integrations#

The purpose of creating new integrations is typically to enable translation of a new data type. To get started, inherit from DataStructureIO or copy an existing integration. Don’t forget to register the implementation, or the Translator won’t be able to find it.

Integrations may take initialization arguments (see Runtime arguments), but should not require them.

Automatic integration discovery#

You may add an entrypoint in the 'id_translation.dio' entrypoint group to automatically register custom implementations (as opposed to calling DataStructureIO.register() manually). The snippet below shows how the bundled integrations are registered using project entrypoints.

Entrypoints in pyproject.toml in the rsundqvist/id-translation project.#
[project.entry-points."id_translation.dio"]
# The name (e.g. 'pandas_io') is not important, but should be unique.
pandas_io = "id_translation.dio.integration.pandas:PandasIO"
dask_io = "id_translation.dio.integration.dask:DaskIO"
polars_io = "id_translation.dio.integration.polars:PolarsIO"

The loader will skip the integration if calling EntryPoint.load() raises an ImportError, or if the priority is negative.

Selection process#

The Translator will call resolve_io() once per task. The first implementation whose DataStructureIO.handles_type()-method returns True will be used. The order in which implementations are considered is determined by the priority attribute.

Bundled implementations have priorities in the 1000 - 1999 range (inclusive); see the table below.

Ranking of built-in DataStructureIO implementations.#

Rank

Weight

Class

Comment

1

1999

PandasIO [1]

Optional IO implementation for pandas types.

2

1990

PolarsIO [1]

Optional IO implementation for polars types.

3

1980

DaskIO [1]

Optional IO implementation for dask types.

4

1900

ArrowIO [2]

Optional IO implementation for pyarrow types.

5

1500

SingleValueIO

IO implementation for int, str and UUID types.

6

1100

SequenceIO

IO implementation for list, tuple and numpy.array types.

7

1010

SetIO

IO implementation for set types.

8

1000

DictIO

IO implementation for dict types.

New implementations default to priority=10_000, and are therefore considered first.

Footnotes