Fetching data using SqlFetcher#
Translating using a SQL database. This notebook assumes that the Prepare for ``SqlFetcher`` demo-step from the PickleFetcher demo notebook has been completed.
[1]:
import sys
import rics
import id_translation
# Print relevant versions
print(f"{id_translation.__version__=}")
print(f"{sys.version=}")
rics.configure_stuff(id_translation_level="DEBUG")
id_translation.__version__='1.0.1.dev1'
sys.version='3.14.0 (main, Oct 7 2025, 16:05:28) [GCC 13.3.0]'
đź‘» Configured some stuff just the way I like it!
Create translator from config#
Click here to see the file.
[2]:
from id_translation import Translator
translator = Translator.from_config("config.toml")
translator
[2]:
Translator(online=True: fetcher=SqlFetcher('sqlite:///demo.db'))
[3]:
ENGINE = translator.fetcher.engine
Load database#
Using the SqlFetcher engine.
[4]:
from data import load_imdb
for source in ["name.basics", "title.basics"]:
df = load_imdb(source)[0]
df.to_sql(source.replace(".", "_"), ENGINE, if_exists="replace")
2025-12-03T23:27:41.196 [rics.utility.misc.get_local_or_remote:INFO] Local processed file path: '/home/dev/.id-translation/notebooks/cache/clean_and_fix_ids/name.basics.tsv.pkl'.
2025-12-03T23:27:42.476 [rics.utility.misc.get_local_or_remote:INFO] Local processed file path: '/home/dev/.id-translation/notebooks/cache/clean_and_fix_ids/title.basics.tsv.pkl'.
Make some data to translate#
[5]:
import pandas as pd
def first_title(seed=5, n=1000):
df = pd.read_sql("SELECT * FROM name_basics;", ENGINE).sample(n, random_state=seed)
df["firstTitle"] = df["knownForTitles"].str.split(",").str[0]
return df[["nconst", "firstTitle"]]
[6]:
translator.go_offline()
2025-12-03T23:27:42.939 [id_translation.fetching:DEBUG] Metadata for 'sqlite:///demo.db' created in 3 ms.
2025-12-03T23:27:42.940 [id_translation.fetching:INFO] Finished initialization of 'SqlFetcher' in 4 ms: SqlFetcher('sqlite:///demo.db', sources=['name_basics', 'title_basics'])
2025-12-03T23:27:42.941 [id_translation.Translator:DEBUG] Begin going offline with 2 sources provided by: SqlFetcher('sqlite:///demo.db', sources=['name_basics', 'title_basics'])
2025-12-03T23:27:42.942 [id_translation.fetching:DEBUG] Begin fetching all IDs for placeholders=('id', 'name', 'original_name', 'from', 'to') for 2/2: ['name_basics', 'title_basics'].
2025-12-03T23:27:42.943 [id_translation.fetching:DEBUG] Begin mapping of wanted placeholders={'name', 'id', 'from', 'to'} to actual placeholders={'knownForTitles', 'int_id_nconst', 'nconst', 'primaryProfession', 'birthYear', 'primaryName', 'deathYear', 'index'} for source='name_basics'.
2025-12-03T23:27:42.946 [id_translation.fetching.map:DEBUG] Computed 4x8 match scores in context='name_basics' in 51 ÎĽs:
candidates knownForTitles int_id_nconst nconst primaryProfession birthYear primaryName deathYear index
values
name -inf -inf -inf -inf -inf inf -inf -inf
id -inf -inf inf -inf -inf -inf -inf -inf
from -inf -inf -inf -inf inf -inf -inf -inf
to -inf -inf -inf -inf -inf -inf inf -inf
2025-12-03T23:27:42.947 [id_translation.fetching:DEBUG] Finished placeholder mapping for source='name_basics': {'name': 'primaryName', 'id': 'nconst', 'from': 'birthYear', 'to': 'deathYear'}.
2025-12-03T23:27:42.948 [id_translation.fetching:DEBUG] Begin mapping of wanted placeholders={'name', 'id', 'from', 'to'} to actual placeholders={'genres', 'titleType', 'tconst', 'startYear', 'int_id_tconst', 'endYear', 'runtimeMinutes', 'primaryTitle', 'originalTitle', 'isAdult', 'index'} for source='title_basics'.
2025-12-03T23:27:42.950 [id_translation.fetching.map:DEBUG] Computed 4x11 match scores in context='title_basics' in 41 ÎĽs:
candidates genres titleType tconst startYear int_id_tconst endYear runtimeMinutes primaryTitle originalTitle isAdult index
values
name -inf -inf -inf -inf -inf -inf -inf inf -inf -inf -inf
id -inf -inf inf -inf -inf -inf -inf -inf -inf -inf -inf
from -inf -inf -inf inf -inf -inf -inf -inf -inf -inf -inf
to -inf -inf -inf -inf -inf inf -inf -inf -inf -inf -inf
2025-12-03T23:27:42.952 [id_translation.fetching:DEBUG] Finished placeholder mapping for source='title_basics': {'name': 'primaryTitle', 'id': 'tconst', 'from': 'startYear', 'to': 'endYear'}.
2025-12-03T23:27:42.953 [id_translation.fetching:DEBUG] Begin mapping of wanted placeholders={'name', 'to', 'original_name', 'id', 'from'} to actual placeholders={'knownForTitles', 'int_id_nconst', 'nconst', 'primaryProfession', 'birthYear', 'primaryName', 'deathYear', 'index'} for source='name_basics'.
2025-12-03T23:27:42.956 [id_translation.fetching.map:DEBUG] Computed 5x8 match scores in context='name_basics' in 169 ÎĽs:
candidates knownForTitles int_id_nconst nconst primaryProfession birthYear primaryName deathYear index
values
name -inf -inf -inf -inf -inf inf -inf -inf
to -inf -inf -inf -inf -inf -inf inf -inf
original_name 0.00 0.08 -0.00 0.01 -0.00 0.18 0.02 0.02
id -inf -inf inf -inf -inf -inf -inf -inf
from -inf -inf -inf -inf inf -inf -inf -inf
2025-12-03T23:27:42.957 [id_translation.fetching:DEBUG] Finished placeholder mapping for source='name_basics': {'name': 'primaryName', 'from': 'birthYear', 'id': 'nconst', 'to': 'deathYear', 'original_name': None}.
2025-12-03T23:27:42.957 [id_translation.fetching:DEBUG] Begin fetching all IDs from source='name_basics'. Placeholders: ('nconst', 'primaryName', 'birthYear', 'deathYear').
2025-12-03T23:27:43.242 [id_translation.fetching:DEBUG] Finished fetching 202609 IDs from source='name_basics' in 285 ms. Placeholders: ('nconst', 'primaryName', 'birthYear', 'deathYear').
2025-12-03T23:27:43.242 [id_translation.fetching:DEBUG] Begin mapping of wanted placeholders={'name', 'to', 'original_name', 'id', 'from'} to actual placeholders={'genres', 'titleType', 'tconst', 'startYear', 'int_id_tconst', 'endYear', 'runtimeMinutes', 'primaryTitle', 'originalTitle', 'isAdult', 'index'} for source='title_basics'.
2025-12-03T23:27:43.244 [id_translation.fetching.map:DEBUG] Computed 5x11 match scores in context='title_basics' in 32 ÎĽs:
candidates genres titleType tconst startYear int_id_tconst endYear runtimeMinutes primaryTitle originalTitle isAdult index
values
name -inf -inf -inf -inf -inf -inf -inf inf -inf -inf -inf
to -inf -inf -inf -inf -inf inf -inf -inf -inf -inf -inf
original_name -inf -inf -inf -inf -inf -inf -inf -inf inf -inf -inf
id -inf -inf inf -inf -inf -inf -inf -inf -inf -inf -inf
from -inf -inf -inf inf -inf -inf -inf -inf -inf -inf -inf
2025-12-03T23:27:43.245 [id_translation.fetching:DEBUG] Finished placeholder mapping for source='title_basics': {'name': 'primaryTitle', 'to': 'endYear', 'original_name': 'originalTitle', 'id': 'tconst', 'from': 'startYear'}.
2025-12-03T23:27:43.245 [id_translation.fetching:DEBUG] Begin fetching all IDs from source='title_basics'. Placeholders: ('tconst', 'primaryTitle', 'originalTitle', 'startYear', 'endYear').
2025-12-03T23:27:43.378 [id_translation.fetching:DEBUG] Finished fetching 67334 IDs from source='title_basics' in 132 ms. Placeholders: ('tconst', 'primaryTitle', 'originalTitle', 'startYear', 'endYear').
2025-12-03T23:27:43.378 [id_translation.fetching:INFO] Finished fetching all IDs from 2/2 sources in 437 ms: ['name_basics' x ('id', 'name', 'from', 'to') x 202609 IDs], ['title_basics' x ('id', 'name', 'original_name', 'from', 'to') x 67334 IDs].
2025-12-03T23:27:43.379 [id_translation.fetching:DEBUG] Dispose 'sqlite:///demo.db'.
2025-12-03T23:27:43.380 [id_translation.Translator:INFO] Went offline with 2 sources in 444 ms: TranslationMap('name_basics': 202609 IDs, 'title_basics': 67334 IDs).
[6]:
Translator(online=False: cache=TranslationMap('name_basics': 202609 IDs, 'title_basics': 67334 IDs))
Get the name and the “first” appearance for actors#
In the IMDb list anyway. I have no idea how they’re ordered in “knownForTitles”.
[7]:
df = first_title()
df.head()
[7]:
| nconst | firstTitle | |
|---|---|---|
| 119545 | nm0852487 | tt0020182 |
| 162954 | nm1707684 | tt1369730 |
| 15831 | nm0102912 | tt0076843 |
| 99772 | nm0706681 | tt0072992 |
| 29547 | nm0201371 | tt0025323 |
Translate#
[8]:
translator.translate(df).head(5)
2025-12-03T23:27:43.892 [id_translation.dio:DEBUG] Using rank-0 (priority=1999) implementation 'id_translation.dio.integration.pandas.PandasIO' for translatable of type='pandas.DataFrame'.
2025-12-03T23:27:43.893 [id_translation.Translator:DEBUG] Begin translation of 'DataFrame'-type data. Names to translate: Derive based on type.
2025-12-03T23:27:43.894 [id_translation.Translator:DEBUG] Name extraction complete. Found names=['nconst', 'firstTitle'] for 'DataFrame'-type data.
2025-12-03T23:27:43.895 [id_translation.Translator.map:DEBUG] Begin name-to-source mapping of names=['nconst', 'firstTitle'] in 'DataFrame' against sources=['name_basics', 'title_basics'].
2025-12-03T23:27:43.896 [id_translation.Translator.map:DEBUG] Computed 2x2 match scores in context=None in 28 ÎĽs:
candidates name_basics title_basics
values
nconst inf -inf
firstTitle -inf inf
2025-12-03T23:27:43.897 [id_translation.Translator.map:INFO] Finished mapping of 2/2 names in 'DataFrame' in 3 ms: {'nconst': 'name_basics', 'firstTitle': 'title_basics'}.
2025-12-03T23:27:44.002 [id_translation.Translator:DEBUG] Failed to translate 888/1000 (88.80% <= max_fails=100.00%) of IDs for name='firstTitle' using source='title_basics'. Sample IDs: ['tt0020182', 'tt1369730', 'tt0076843', 'tt0072992', 'tt0025323', 'tt0055403', 'tt0433937', 'tt0232531', 'tt0010680', 'tt0497057'].
2025-12-03T23:27:44.111 [id_translation.Translator:DEBUG] Performance counters [ms]: {'map': 3.122, 'verify': 105.437, 'insert': 101.649, 'overhead': 8.921}
2025-12-03T23:27:44.112 [id_translation.Translator:INFO] Finished translation of 1979 unique IDs (2 names) in 'DataFrame' in 219 ms.
[8]:
| nconst | firstTitle | |
|---|---|---|
| 119545 | nm0852487:Jack Taylor *1896†1932 | tt0020182 not translated; default name=Title unknown |
| 162954 | nm1707684:Blaze Aleksoski *1933†2015 | tt1369730 not translated; default name=Title unknown |
| 15831 | nm0102912:Saax Bradbury *1943†1976 | tt0076843 not translated; default name=Title unknown |
| 99772 | nm0706681:Poul Rahbek *1911†1987 | tt0072992 not translated; default name=Title unknown |
| 29547 | nm0201371:Jean Darling *1922†2015 | tt0025323 not translated; default name=Title unknown |
[ ]: