Translation IO#

The id_translation.dio module defines how IDs are read and written to various data structures.

User-defined integrations#

The purpose of creating new integrations is typically to enable translation of a new data type. To get started, inherit from DataStructureIO or copy an existing integration. Don’t forget to register the implementation, or the Translator won’t be able to find it.

Automatic integration discovery#

You may add an entrypoint in the 'id_translation.dio' entrypoint group to automatically register custom implementations (as opposed to calling DataStructureIO.register() manually). The snippet below shows how the bundled integrations are registered using project entrypoints.

Entrypoints in pyproject.toml in the rsundqvist/id-translation project.#

[project.entry-points."id_translation.dio"]
# The name (e.g. 'pandas_io') is not important, but should be unique.
pandas_io = "id_translation.dio.integration.pandas:PandasIO"
dask_io = "id_translation.dio.integration.dask:DaskIO"
polars_io = "id_translation.dio.integration.polars:PolarsIO"

The loader will skip the integration if calling EntryPoint.load() raises an ImportError, or if the priority is negative.

Selection process#

The Translator will call resolve_io() once per task. The first implementation whose DataStructureIO.handles_type()-method returns True will be used. The order in which implementations are considered is determined by the priority attribute.

Bundled implementations have priorities in the 1000 - 1999 range (inclusive); see the table below.

Ranking of built-in `DataStructureIO` implementations.#
Rank	Weight	Class	Comment
1	1999	`PandasIO`	Optional IO implementation for `pandas` types.
2	1990	`PolarsIO`	Optional IO implementation for `polars` types.
3	1980	`DaskIO`	Optional IO implementation for `dask` types.
4	1500	`SingleValueIO`	IO implementation for `int`, `str` and `UUID` types.
5	1100	`SequenceIO`	IO implementation for `list`, `tuple` and `numpy.array` types.
6	1010	`SetIO`	IO implementation for `set` types.
7	1000	`DictIO`	IO implementation for `dict` types.

New implementations default to priority=10_000 and are therefore considered first.