id_translation.mapping.filter_functions#

Functions that return a subset of candidates with which to continue the matching procedure.

Functions

filter_names(value, candidates, context, regex)

Filter names to translate based on regex.

filter_placeholders(value, candidates, ...)

Filter placeholders, as they appear in the source given by context, based on regex.

filter_sources(value, candidates, context, regex)

Filter sources based on regex.

filter_names(value, candidates, context, regex, remove=False)[source]#

Filter names to translate based on regex.

Analogous to the built-in filter()-function, filter_names keep only the names that match the given regex. This behavior may be reversed by setting the remove flag to True.

Parameters:
  • value – A name that should be mapped one of the sources in candidates.

  • candidates – Candidate sources.

  • context – Should be None. Always ignored, exists for compatibility.

  • regex – A regex pattern. Will be matched against the value.

  • remove – If True, remove matching values.

Returns:

The original candidates if value matches the given regex. An empty set, otherwise.

Examples

Ensuring that untranslatable IDs are left as-is.

>>> candidates, context = {"id", "name", "birth_date"}, None
>>> value = "employee_id"
>>> allowed = filter_names(
...     value,
...     candidates,
...     context,
...     regex=".*_id$",
...     remove=False,  # This is the default (like the built-in filter).
... )
>>> sorted(allowed)
['birth_date', 'id', 'name']

The expression used selects names that end with ‘_id’.

filter_placeholders(value, candidates, context, regex, remove=False)[source]#

Filter placeholders, as they appear in the source given by context, based on regex.

Parameters:
  • value – Target placeholder. Always ignored, exists for compatibility.

  • candidates – Available placeholders in the source named by context.

  • context – The source to which the candidates belong.

  • regex – A regex pattern. Will be matched against elements of the candidates.

  • remove – If True, remove matching values.

Returns:

Placeholders that may be used.

Examples

Removing irrelevant but possibly confusing columns.

>>> value, context = "ignored", "ignored"
>>> candidates = {"id", "name", "old_id", "previous_id"}
>>> allowed = filter_placeholders(
...     value,
...     candidates,
...     context,
...     regex="^(old|previous).*",
...     remove=True,
... )
>>> sorted(allowed)
['id', 'name']
filter_sources(value, candidates, context, regex, remove=False)[source]#

Filter sources based on regex.

Parameters:
  • value – Target placeholder. Return immediately if value != ‘id’ to avoid unnecessary work.

  • candidates – Available placeholders in the source named by context. Always ignored, exists for compatibility.

  • context – The source to which the candidates belong.

  • regex – A regex pattern. Will be matched against the context.

  • remove – If True, remove matching values.

Returns:

The original candidates if context does NOT match the given regex. An empty set, otherwise.

Examples

Avoiding uninteresting sources (for ID translation purposes).

>>> value, candidates = "id", {"ignored"}
>>> context = "some_metadata_table"
>>> allowed = filter_sources(
...     "id",
...     candidates,
...     context,
...     regex=".*metadata.*",
...     remove=True,
... )
>>> len(allowed) == 0
True

The expression used filters out sources that contain the word ‘metadata’.