Sakila DVD Rental Database#
This example translates a query against the DVD Rental Sample Database. Click here
to download.
Start the database#
Using Docker, start the database by running:
docker run -p 5002:5432 --rm rsundqvist/sakila-preload:postgres
For details about this image, see https://hub.docker.com/r/rsundqvist/sakila-preload.
# Credentials
CONNECTION_STRING = "postgresql+pg8000://postgres:{password}@localhost:5002/sakila"
PASSWORD = "Sofia123!"
Query#
1SELECT customer_id,
2 inventory.film_id,
3 film_category.category_id,
4 staff_id,
5 rental_date,
6 return_date
7FROM rental
8 LEFT JOIN inventory ON rental.inventory_id = inventory.inventory_id
9 LEFT JOIN film_category ON film_category.film_id = inventory.film_id
10ORDER BY rental.rental_id DESC;
The query above will tell us who rented what and when, what store they rented from and from whom.
# Download data to translate
url = CONNECTION_STRING.format(password=PASSWORD)
with (
open("query.sql") as query,
sqlalchemy.create_engine(url).connect() as con,
):
df = pandas.read_sql(query.read(), con)
sample = df.sample(10)
print(sample)
customer_id |
film_id |
category_id |
staff_id |
rental_date |
return_date |
|
|---|---|---|---|---|---|---|
708 |
313 |
797 |
12 |
1 |
2005-08-22 20:55:56 |
2005-08-27 18:52:56 |
1486 |
107 |
767 |
12 |
1 |
2005-08-21 16:22:59 |
2005-08-22 21:15:59 |
1833 |
54 |
7 |
5 |
2 |
2005-08-21 04:34:11 |
2005-08-27 10:30:11 |
4649 |
565 |
454 |
12 |
1 |
2005-08-02 18:56:28 |
2005-08-11 23:19:28 |
5023 |
419 |
487 |
4 |
2 |
2005-08-02 05:39:12 |
2005-08-08 00:09:12 |
7359 |
381 |
489 |
2 |
1 |
2005-07-29 16:31:32 |
2005-07-30 12:39:32 |
10609 |
26 |
773 |
15 |
2 |
2005-07-09 21:32:29 |
2005-07-15 00:27:29 |
12460 |
493 |
134 |
2 |
1 |
2005-07-06 04:24:42 |
2005-07-09 02:37:42 |
13654 |
261 |
479 |
8 |
1 |
2005-06-18 15:34:18 |
2005-06-19 16:22:18 |
13703 |
20 |
994 |
13 |
2 |
2005-06-18 11:46:26 |
2005-06-22 11:37:26 |
Configuration files#
The database has a few quirks, which are managed by configuration. See the Configuration page to learn more about config files.
################################################################################
# For help, see https://id-translation.readthedocs.io #
################################################################################
[translator]
fmt = "{id}:{name}[ {last_name}]"
# ------------------------------------------------------------------------------
# Name-to-source mapping configuration. Binds names to source, eg 'cute_animals'
# -> 'my_database.animals'. Overrides take precedence over scoring logic.
[translator.mapping]
on_unmapped = "raise"
score_function = "modified_hamming"
[[translator.mapping.score_function_heuristics]]
function = "like_database_table"
[[translator.mapping.filter_functions]]
function = "filter_names"
regex = ".*_id$"
remove = false # This is the default (like the built-in filter).
################################################################################
# Parallel fetching configuration.
################################################################################
[fetching.MultiFetcher]
max_workers = 2
################################################################################
# Fetching configuration.
################################################################################
[fetching.MemoryFetcher.data.category]
id = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
name = [
"Action", "Animation",
"Children", "Classics", "Comedy",
"Documentary", "Drama",
"Family", "Foreign",
"Games", "Horror", "Music", "New",
"Sci-Fi", "Sports", "Travel"
]
[fetching.SqlFetcher]
connection_string = "${DVDRENTAL_CONNECTION_STRING}"
password = "${DVDRENTAL_PASSWORD}"
blacklist_tables = [
"category", # Let's pretend this table is off-limits
"store", "rental", "inventory", "payment", # Tables without name columns
]
# ------------------------------------------------------------------------------
# Placeholder mapping configuration. Binds actual names in sources (such as
# column names in an SQL table) to wanted names, eg id = 'animal_id'.
[fetching.mapping]
on_unmapped = "ignore"
[[fetching.mapping.filter_functions]]
function = "filter_sources"
regex = ".*p2007.*"
remove = true
[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "first_{value}"
[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "{context}_{value}"
[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "{context}"
for_value = "name"
[fetching.mapping.overrides.address]
code = "postal_code"
[fetching.mapping.overrides.film]
name = "title"
To create a Translator, pass the configuration files to Translator.from_config().
# Create a Translator
os.environ.update(
DVDRENTAL_CONNECTION_STRING=CONNECTION_STRING,
DVDRENTAL_PASSWORD=PASSWORD,
)
translator = Translator.from_config(
"translation.toml",
extra_fetchers=["sql-fetcher.toml"],
)
print(translator)
Translator.#Translator(online=True: fetcher=MultiFetcher(max_workers=2, fetchers=[
MemoryFetcher(sources=['category'])
SqlFetcher(
Engine(postgresql+pg8000://postgres:***@localhost:5002/sakila),
blacklist={'category', 'store', 'inventory', 'rental', 'payment'}
)
]))
Translating#
Date columns should not be translated, so let’s make sure.
print(translator.map(df))
Translator.map().#{
'customer_id': 'customer',
'film_id': 'film',
'category_id': 'category',
'staff_id': 'staff',
}
Result#
All that’s left now is to translate the data.
translator.translate(df, copy=False)
print(df.loc[sample.index])
customer_id |
film_id |
category_id |
staff_id |
rental_date |
return_date |
|
|---|---|---|---|---|---|---|
708 |
313:DONALD MAHON |
797:SILENCE KANE |
12:Music |
1:Mike Hillyer |
2005-08-22 20:55:56 |
2005-08-27 18:52:56 |
1486 |
107:FLORENCE WOODS |
767:SCALAWAG DUCK |
12:Music |
1:Mike Hillyer |
2005-08-21 16:22:59 |
2005-08-22 21:15:59 |
1833 |
54:TERESA ROGERS |
7:AIRPLANE SIERRA |
5:Comedy |
2:Jon Stephens |
2005-08-21 04:34:11 |
2005-08-27 10:30:11 |
4649 |
565:JAIME NETTLES |
454:IMPACT ALADDIN |
12:Music |
1:Mike Hillyer |
2005-08-02 18:56:28 |
2005-08-11 23:19:28 |
5023 |
419:CHAD CARBONE |
487:JINGLE SAGEBRUSH |
4:Classics |
2:Jon Stephens |
2005-08-02 05:39:12 |
2005-08-08 00:09:12 |
7359 |
381:BOBBY BOUDREAU |
489:JUGGLER HARDLY |
2:Animation |
1:Mike Hillyer |
2005-07-29 16:31:32 |
2005-07-30 12:39:32 |
10609 |
26:JESSICA HALL |
773:SEABISCUIT PUNK |
15:Sports |
2:Jon Stephens |
2005-07-09 21:32:29 |
2005-07-15 00:27:29 |
12460 |
493:BRENT HARKINS |
134:CHAMPION FLATLINERS |
2:Animation |
1:Mike Hillyer |
2005-07-06 04:24:42 |
2005-07-09 02:37:42 |
13654 |
261:DEANNA BYRD |
479:JEDI BENEATH |
8:Family |
1:Mike Hillyer |
2005-06-18 15:34:18 |
2005-06-19 16:22:18 |
13703 |
20:SHARON ROBINSON |
994:WYOMING STORM |
13:New |
2:Jon Stephens |
2005-06-18 11:46:26 |
2005-06-22 11:37:26 |