id_translation.mapping.filter_functions#

Functions that return a subset of candidates with which to continue the matching procedure.

Mapping of the current value is aborted if an empty set is returned. Functions such as filter_names() and filter_sources() use this to allow (or disallow) names and sources that match a given regex pattern.

Functions

filter_names(value, candidates, context, regex)

Filter names to translate based on regex.

filter_placeholders(value, candidates, ...)

Filter placeholders, as they appear in the source given by context, based on regex.

filter_sources(value, candidates, context, regex)

Filter sources based on regex.

filter_names(value, candidates, context, regex, remove=False, *, task_id=None)[source]#

Filter names to translate based on regex.

Analogous to the built-in filter()-function, filter_names keeps only the names (value) that match the given regex. This behavior may be reversed by setting the remove flag to True.

Parameters:
  • value – A name that should be mapped one of the sources in candidates.

  • candidates – Candidate sources.

  • context – Should be None. Always ignored, exists for compatibility.

  • regex – A regex pattern. Will be matched against the value.

  • remove – If True, remove matching values.

  • task_id – Used for logging.

Returns:

The original candidates if value matches the given regex. An empty set, otherwise.

Examples

Ensuring that untranslatable IDs are left as-is.

>>> sources = {"employees", "countries", "orders"}
>>> name = "employee_id"
>>> allowed = filter_names(
...     name,
...     candidates=sources,
...     context=None,
...     regex=".*_id$",
... )
>>> sorted(allowed)
['countries', 'employees', 'orders']

The call above kept the ‘employee_id’ name (by returning all candidate sources).

filter_placeholders(value, candidates, context, regex, remove=False, task_id=None)[source]#

Filter placeholders, as they appear in the source given by context, based on regex.

Parameters:
  • value – Target placeholder. Always ignored, exists for compatibility.

  • candidates – Available placeholders in the source named by context.

  • context – The source to which the candidates belong.

  • regex – A regex pattern. Will be matched against elements of the candidates.

  • remove – If True, remove matching values.

  • task_id – Used for logging.

Returns:

Placeholders that may be used.

Examples

Removing irrelevant but possibly confusing columns.

>>> actual_placeholders = {"id", "name", "old_id", "previous_id"}
>>> allowed = filter_placeholders(
...     value="ignored",
...     candidates=actual_placeholders,
...     context="ignored",
...     regex="^(old|previous).*",
...     remove=True,
... )
>>> sorted(allowed)
['id', 'name']
filter_sources(value, candidates, context, regex, remove=False, *, task_id=None)[source]#

Filter sources based on regex.

Analogous to the built-in filter()-function, filter_sources keeps only the sources (context) that match the given regex. This behavior may be reversed by setting the remove flag to True.

Parameters:
  • value – Target placeholder.

  • candidates – Available placeholders in the source named by context. Always ignored, exists for compatibility.

  • context – The source to which the candidates belong.

  • regex – A regex pattern. Will be matched against the context.

  • remove – If True, remove matching values.

  • task_id – Used for logging.

Returns:

The original candidates if context matches the given regex. An empty set, otherwise.

Examples

Avoiding uninteresting sources (for ID translation purposes).

>>> source = "some_metadata_table"
>>> allowed = filter_sources(
...     "id",
...     candidates={"id", "name", "some_other_column"},
...     context=source,
...     regex=".*metadata.*",
...     remove=True,
... )
>>> len(allowed)
0

The call above filtered out the ‘some_metadata_table’ source (by removing all candidates).

Notes

Returns immediately if value != ‘id’, to avoid unnecessary work. The