Sakila DVD Rental Database#

This example translates a query against the DVD Rental Sample Database. Click here to download.

Start the database#

Using Docker, start the database by running:

docker run -p 5002:5432 --rm rsundqvist/sakila-preload:postgres

For details about this image, see https://hub.docker.com/r/rsundqvist/sakila-preload.

# Credentials
CONNECTION_STRING = "postgresql+pg8000://postgres:{password}@localhost:5002/sakila"
PASSWORD = "Sofia123!"

Query#

 1SELECT customer_id,
 2       inventory.film_id,
 3       film_category.category_id,
 4       staff_id,
 5       rental_date,
 6       return_date
 7FROM rental
 8         LEFT JOIN inventory ON rental.inventory_id = inventory.inventory_id
 9         LEFT JOIN film_category ON film_category.film_id = inventory.film_id
10ORDER BY rental.rental_id DESC;

The query above will tell us who rented what and when, what store they rented from and from whom.

# Download data to translate
url = CONNECTION_STRING.format(password=PASSWORD)
with (
    open("query.sql") as query,
    sqlalchemy.create_engine(url).connect() as con,
):
    df = pandas.read_sql(query.read(), con)

sample = df.sample(10)
print(sample)
Randomly sampled rows from the query. The first column is the record index in the query.#

customer_id

film_id

category_id

staff_id

rental_date

return_date

708

313

797

12

1

2005-08-22 20:55:56

2005-08-27 18:52:56

1486

107

767

12

1

2005-08-21 16:22:59

2005-08-22 21:15:59

1833

54

7

5

2

2005-08-21 04:34:11

2005-08-27 10:30:11

4649

565

454

12

1

2005-08-02 18:56:28

2005-08-11 23:19:28

5023

419

487

4

2

2005-08-02 05:39:12

2005-08-08 00:09:12

7359

381

489

2

1

2005-07-29 16:31:32

2005-07-30 12:39:32

10609

26

773

15

2

2005-07-09 21:32:29

2005-07-15 00:27:29

12460

493

134

2

1

2005-07-06 04:24:42

2005-07-09 02:37:42

13654

261

479

8

1

2005-06-18 15:34:18

2005-06-19 16:22:18

13703

20

994

13

2

2005-06-18 11:46:26

2005-06-22 11:37:26

Configuration files#

The database has a few quirks, which are managed by configuration. See the Configuration page to learn more about config files.

Translation configuration, mapping, and definition of the categories.#
################################################################################
# For help, see https://id-translation.readthedocs.io                          #
################################################################################
[translator]
fmt = "{id}:{name}[ {last_name}]"

# ------------------------------------------------------------------------------
# Name-to-source mapping configuration. Binds names to source, eg 'cute_animals'
# -> 'my_database.animals'. Overrides take precedence over scoring logic.
[translator.mapping]
on_unmapped = "raise"
score_function = "modified_hamming"

[[translator.mapping.score_function_heuristics]]
function = "like_database_table"

[[translator.mapping.filter_functions]]
function = "filter_names"
regex = ".*_id$"
remove = false  # This is the default (like the built-in filter).

################################################################################
# Parallel fetching configuration.
################################################################################
[fetching.MultiFetcher]
max_workers = 2

################################################################################
# Fetching configuration.
################################################################################
[fetching.MemoryFetcher.data.category]
id = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
name = [
    "Action", "Animation",
    "Children", "Classics", "Comedy",
    "Documentary", "Drama",
    "Family", "Foreign",
    "Games", "Horror", "Music", "New",
    "Sci-Fi", "Sports", "Travel"
]
Configuration for fetching SQL data.#
[fetching.SqlFetcher]
connection_string = "${DVDRENTAL_CONNECTION_STRING}"
password = "${DVDRENTAL_PASSWORD}"
blacklist_tables = [
    "category",  # Let's pretend this table is off-limits
    "store", "rental", "inventory", "payment",  # Tables without name columns
]

# ------------------------------------------------------------------------------
# Placeholder mapping configuration. Binds actual names in sources (such as
# column names in an SQL table) to wanted names, eg id = 'animal_id'.
[fetching.mapping]
on_unmapped = "ignore"

[[fetching.mapping.filter_functions]]
function = "filter_sources"
regex = ".*p2007.*"
remove = true

[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "first_{value}"
[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "{context}_{value}"
[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "{context}"
for_value = "name"

[fetching.mapping.overrides.address]
code = "postal_code"
[fetching.mapping.overrides.film]
name = "title"

To create a Translator, pass the configuration files to Translator.from_config().

# Create a Translator
os.environ.update(
    DVDRENTAL_CONNECTION_STRING=CONNECTION_STRING,
    DVDRENTAL_PASSWORD=PASSWORD,
)
translator = Translator.from_config(
    "translation.toml",
    extra_fetchers=["sql-fetcher.toml"],
)
print(translator)
String representation of the Translator.#
Translator(online=True: fetcher=MultiFetcher(max_workers=2, fetchers=[
  MemoryFetcher(sources=['category'])
  SqlFetcher(
    Engine(postgresql+pg8000://postgres:***@localhost:5002/sakila),
    blacklist={'category', 'store', 'inventory', 'rental', 'payment'}
  )
]))

Translating#

Date columns should not be translated, so let’s make sure.

print(translator.map(df))
{
    'customer_id': 'customer',
    'film_id': 'film',
    'category_id': 'category',
    'staff_id': 'staff',
}

Result#

All that’s left now is to translate the data.

translator.translate(df, copy=False)
print(df.loc[sample.index])
Translated data.#

customer_id

film_id

category_id

staff_id

rental_date

return_date

708

313:DONALD MAHON

797:SILENCE KANE

12:Music

1:Mike Hillyer

2005-08-22 20:55:56

2005-08-27 18:52:56

1486

107:FLORENCE WOODS

767:SCALAWAG DUCK

12:Music

1:Mike Hillyer

2005-08-21 16:22:59

2005-08-22 21:15:59

1833

54:TERESA ROGERS

7:AIRPLANE SIERRA

5:Comedy

2:Jon Stephens

2005-08-21 04:34:11

2005-08-27 10:30:11

4649

565:JAIME NETTLES

454:IMPACT ALADDIN

12:Music

1:Mike Hillyer

2005-08-02 18:56:28

2005-08-11 23:19:28

5023

419:CHAD CARBONE

487:JINGLE SAGEBRUSH

4:Classics

2:Jon Stephens

2005-08-02 05:39:12

2005-08-08 00:09:12

7359

381:BOBBY BOUDREAU

489:JUGGLER HARDLY

2:Animation

1:Mike Hillyer

2005-07-29 16:31:32

2005-07-30 12:39:32

10609

26:JESSICA HALL

773:SEABISCUIT PUNK

15:Sports

2:Jon Stephens

2005-07-09 21:32:29

2005-07-15 00:27:29

12460

493:BRENT HARKINS

134:CHAMPION FLATLINERS

2:Animation

1:Mike Hillyer

2005-07-06 04:24:42

2005-07-09 02:37:42

13654

261:DEANNA BYRD

479:JEDI BENEATH

8:Family

1:Mike Hillyer

2005-06-18 15:34:18

2005-06-19 16:22:18

13703

20:SHARON ROBINSON

994:WYOMING STORM

13:New

2:Jon Stephens

2005-06-18 11:46:26

2005-06-22 11:37:26