id_translation.mapping.filter_functions#
Functions that return a subset of candidates with which to continue the matching procedure.
Mapping of the current value is aborted if an empty set is returned. Functions such as filter_names() and
filter_sources() use this to allow (or disallow) names and sources that match a given regex pattern.
Functions
|
Filter names to translate based on regex. |
|
Filter placeholders, as they appear in the source given by context, based on regex. |
|
Filter sources based on regex. |
- filter_names(value, candidates, context, regex, remove=False, *, task_id=None)[source]#
Filter names to translate based on regex.
Analogous to the built-in
filter()-function,filter_nameskeeps only the names (value) that match the given regex. This behavior may be reversed by setting the remove flag toTrue.- Parameters:
value – A name that should be mapped one of the sources in candidates.
candidates – Candidate sources.
context – Should be
None. Always ignored, exists for compatibility.regex – A regex pattern. Will be matched against the value.
remove – If
True, remove matching values.task_id – Used for logging.
- Returns:
The original candidates if value matches the given regex. An empty set, otherwise.
Examples
Ensuring that untranslatable IDs are left as-is.
>>> sources = {"employees", "countries", "orders"} >>> name = "employee_id" >>> allowed = filter_names( ... name, ... candidates=sources, ... context=None, ... regex=".*_id$", ... ) >>> sorted(allowed) ['countries', 'employees', 'orders']
The call above kept the ‘employee_id’ name (by returning all candidate sources).
- filter_placeholders(value, candidates, context, regex, remove=False, task_id=None)[source]#
Filter placeholders, as they appear in the source given by context, based on regex.
- Parameters:
value – Target placeholder. Always ignored, exists for compatibility.
candidates – Available placeholders in the source named by context.
context – The source to which the candidates belong.
regex – A regex pattern. Will be matched against elements of the candidates.
remove – If
True, remove matching values.task_id – Used for logging.
- Returns:
Placeholders that may be used.
Examples
Removing irrelevant but possibly confusing columns.
>>> actual_placeholders = {"id", "name", "old_id", "previous_id"} >>> allowed = filter_placeholders( ... value="ignored", ... candidates=actual_placeholders, ... context="ignored", ... regex="^(old|previous).*", ... remove=True, ... ) >>> sorted(allowed) ['id', 'name']
- filter_sources(value, candidates, context, regex, remove=False, *, task_id=None)[source]#
Filter sources based on regex.
Analogous to the built-in
filter()-function,filter_sourceskeeps only the sources (context) that match the given regex. This behavior may be reversed by setting the remove flag toTrue.- Parameters:
value – Target placeholder.
candidates – Available placeholders in the source named by context. Always ignored, exists for compatibility.
context – The source to which the candidates belong.
regex – A regex pattern. Will be matched against the context.
remove – If
True, remove matching values.task_id – Used for logging.
- Returns:
The original candidates if context matches the given regex. An empty set, otherwise.
Examples
Avoiding uninteresting sources (for ID translation purposes).
>>> source = "some_metadata_table" >>> allowed = filter_sources( ... "id", ... candidates={"id", "name", "some_other_column"}, ... context=source, ... regex=".*metadata.*", ... remove=True, ... ) >>> len(allowed) 0
The call above filtered out the ‘some_metadata_table’ source (by removing all candidates).
Notes
Returns immediately if value != ‘id’, to avoid unnecessary work. The