Primer: API implementation#

This notebook reconstructs the Translator showcased in the Translation primer using the API.

[1]:
import sys

import rics

import id_translation

# Print relevant versions
print(f"{id_translation.__version__=}")
print(f"{sys.version=}")
id_translation.__version__='1.0.1.dev1'
sys.version='3.14.0 (main, Oct  7 2025, 16:05:28) [GCC 13.3.0]'
[2]:
rics.configure_stuff()
👻 Configured some stuff just the way I like it!

Translatable data#

[3]:
import pandas as pd

bite_report = pd.read_csv("biting-victims-2019-05-11.csv")
bite_report
[3]:
human_id bitten_by
0 1904 1
1 1991 0
2 1991 2
3 1999 0

Name-to-source mapping#

[4]:
from id_translation.mapping import HeuristicScore, Mapper

score_function = HeuristicScore("equality", heuristics=["like_database_table"])
mapper = Mapper(score_function, overrides={"bitten_by": "animals"})

Translation format#

[5]:
translation_format = "[{title}. ]{name} (id={id})[ the {species}]"

Mapping#

Define heuristic function#

This will map to map id to animal_id when context="animals".

It will remap the correctly named id column in humans.csv as well, but this is not a problem since the best match will be used.

Create HeuristicScore instance#

This class will evaluate the original score function with and without all given heuristics (just one here), and pick the best score for each candidate.

[6]:
def smurf_column_heuristic(value, candidates, context):
    """Heuristic for matching columns that use the "smurf" convention."""
    return (
        # Handles plural form that ends with or without an s.
        f"{context[:-1]}_{value}" if context[-1] == "s" else f"{context}_{value}",
        candidates,  # unchanged
    )


smurf_score = HeuristicScore("equality", heuristics=[smurf_column_heuristic])

Create fetcher#

[7]:
from id_translation.fetching import PandasFetcher

fetcher = PandasFetcher(
    read_path_format="./sources/{}.csv",
    mapper=Mapper(smurf_score),
)

Moment of truth#

[8]:
from id_translation import Translator

translator = Translator(fetcher, fmt=translation_format, mapper=mapper)
translated_bite_report = translator.translate(bite_report)
translated_bite_report
2025-12-03T23:24:17.709 [id_translation.fetching:INFO] Finished initialization of 'PandasFetcher' in 6 ms: PandasFetcher(sources=['animals', 'humans'])
2025-12-03T23:24:17.711 [id_translation.Translator.map:INFO] Finished mapping of 2/2 names in 'DataFrame' in 1 ms: {'bitten_by': 'animals', 'human_id': 'humans'}.
2025-12-03T23:24:17.716 [id_translation.fetching:INFO] Finished fetching from 2 sources in 4 ms: ['humans' x ('id', 'name', 'title') x 3/3 IDs], ['animals' x ('id', 'name', 'species') x 3/3 IDs].
2025-12-03T23:24:17.719 [id_translation.Translator:INFO] Finished translation of 6 unique IDs (2 names) in 'DataFrame' in 9 ms.
[8]:
human_id bitten_by
0 Mr. Fred (id=1904) Morris (id=1) the dog
1 Mr. Richard (id=1991) Tarzan (id=0) the cat
2 Mr. Richard (id=1991) Simba (id=2) the lion
3 Dr. Sofia (id=1999) Tarzan (id=0) the cat
[9]:
expected = pd.read_csv("biting-victims-2019-05-11-translated.csv")
pd.testing.assert_frame_equal(translated_bite_report, expected)
[ ]: