id_translation.mapping.heuristic_functions#

Functions which perform heuristics for score functions.

See also

The HeuristicScore class.

Functions

candidate_fstring_alias(value, candidates, ...)

Return candidates formatted by fstring.

force_lower_case(value, candidates, context)

Force lower-case in value and candidates.

like_database_table(name, tables, context, *)

Normalize name and tables to appear as base-form nouns.

short_circuit(value, candidates, context, *, ...)

Short-circuit value to the target candidate if the target and regex conditions are met.

smurf_columns(placeholder, columns, table, *)

Short-circuit placeholder to a matching smurf column.

value_fstring_alias(value, candidates, ...)

Return a value formatted by fstring.

Classes

NounTransformer([custom])

Naive utility class for transforming nouns to singular form.

class NounTransformer(custom=None)[source]#

Bases: object

Naive utility class for transforming nouns to singular form.

This class performs simple heuristics to convert nouns commonly used as database table names. It will quickly break either if given nouns that are already on singular form, or are not trivially convertible (see PLURAL_TO_SINGULAR_SUFFIXES) to singular form.

Note

For more complex use cases, consider using a language-processing framework such as inflect (PyPI) instead.

Pass plural_to_singular=<fully-qualified-name> to use your implementation in any function that accepts a plural_to_singular-argument.

def my_transform(plural: str) -> str:
    import inflect

    p = inflect.engine()
    return p.singular_noun(plural)


smurf_columns(..., plural_to_singular="__main__.my_transform")

Examples

>>> nt = NounTransformer(custom={"geese": "goose"})
>>> nt("city"), nt("cities")
('city', 'city')
>>> nt("country"), nt("countries")
('country', 'country')
>>> nt("geese"), nt("goose"), nt("species")
('goose', 'goose', 'species')

May break when given a noun that is already singular.

>>> nt("bus"), nt("news")
('bu', 'new')

See PLURAL_TO_SINGULAR_SUFFIXES for affected suffixes.

Notes

This is not Transformer implementation, in spite of the name.

IRREGULARS = {'exercises': 'exercise', 'phases': 'phase', 'species': 'species'}#

Known irregular plural-to-singular transformations.

PLURAL_TO_SINGULAR_SUFFIXES = (('ies', 'y'), ('ives', 'ife'), ('ves', 'f'), ('oes', 'o'), ('hes', 'h'), ('ses', 's'), ('xes', 'x'), ('s', ''))#

Plural-to-singular suffix mappings.

candidate_fstring_alias(value, candidates, context, *, fstring, **kwargs)[source]#

Return candidates formatted by fstring.

Note

This function modifies the candidates. The value is always returned as-is.

Parameters:
  • value – An element to find matches for. Not used (returned as given).

  • candidates – Potential matches for value.

  • context – Context in which the function is being called.

  • fstring – The format string to use. Can use value, context, and elements of candidates as placeholders.

  • **kwargs – Additional keyword placeholders in fstring.

Returns:

A tuple (value, formatted_candidates).

Raises:

ValueError – If fstring does not contain a placeholder ‘candidate’.

force_lower_case(value, candidates, context)[source]#

Force lower-case in value and candidates.

like_database_table(name, tables, context, *, plural_to_singular=True)[source]#

Normalize name and tables to appear as base-form nouns.

Parameters:
  • name – A name to find a translation source.

  • tables – Database tables used as possible translation sources.

  • context – Ignored.

  • plural_to_singular

    Convert plural-form to singular form. Pass a dict to specify custom transformations, backed by the default transformer. See NounTransformer for details. Set to False to disable.

    To use a custom transformer, pass a callable (str) -> str, or the fully qualified name of such a callable. The callable will be resolved using rics.misc.get_by_full_name(), then cached.

Returns:

A tuple (normalized_name, normalized_table_names).

See also

Examples

Remove ID suffixes and convert a variety of plural forms to singular forms.

>>> like_database_table("dog_id", ["dog", "dogs"], None)
('dog', ['dog', 'dog'])
>>> like_database_table("city_ids", ["city", "cities"], None)
('city', ['city', 'city'])
>>> like_database_table("CountryBitmask", ["Country", "COUNTRIES"], None)
('country', ['country', 'country'])

Inputs are coerced to lower case.

short_circuit(value, candidates, context, *, value_regex, target_candidate, task_id=None)[source]#

Short-circuit value to the target candidate if the target and regex conditions are met.

If target_candidate is in candidates and value matches the given value_regex, a single-element set {target_candidate} is returned which will trigger short-circuiting in the calling Mapper. If either of these conditions fail, an empty set is returned and the mapping procedure will continue.

Parameters:
  • value – A value to map.

  • candidates – Candidates for value.

  • context – Always ignored, exists for compatibility.

  • value_regex – A pattern match against value. Case-insensitive by default.

  • target_candidate – The candidate to short-circuit to.

  • task_id – Used for logging.

Returns:

A single-element set {target_candidate}, iff both conditions are met. An empty set otherwise.

Examples

Always match any bite victim-columns to the humans table (see the Translation primer page).

>>> short_circuit(
...     "first_bite_victim",
...     {"humans", "animals"},
...     None,
...     value_regex=".*_bite_victim$",
...     target_candidate="humans",
... )
{'humans'}

Short-circuiting will only trigger if the value_regex matches, and the target_candidate is present.

smurf_columns(placeholder, columns, table, *, plural_to_singular=False)[source]#

Short-circuit placeholder to a matching smurf column.

The smurf naming convention (or anti-pattern, depending on who you ask) refers the practice of including the name of the table in the column name, especially for the primary key ID column.

Typical columns one might encounter are country.country_id and cities.city_name. Note that, for the latter match to be made, you must pass plural_to_singular=True | dict.

Special handling is implemented for placeholder="name", which will match when the singular-form table name is found in columns.

Parameters:
  • placeholder – A Format placeholder.

  • columns – The columns of a database table.

  • table – A Translator source table to which the columns (or placeholders) belong.

  • plural_to_singular

    Convert plural-form to singular form. Pass a dict to specify custom transformations, backed by the default transformer. See NounTransformer for details. Set to False to disable.

    To use a custom transformer, pass a callable (str) -> str, or the fully qualified name of such a callable. The callable will be resolved using rics.misc.get_by_full_name(), then cached.

Returns:

A single-element set {column}, iff a match is found. An empty set otherwise.

Examples

Default translation Format ({id}:{name}) placeholders.

>>> smurf_columns("id", ["city_id", "city_name", "city"], "city")
{'city_id'}

Both plural and singular form table names are supported, but the plural-to-singular transformation must be explicitly enabled with plural_to_singular=True.

>>> smurf_columns(
...     "name", ["city_id", "city_name"], "cities", plural_to_singular=True
... )
{'city_name'}

Special handling for placeholder="name" when the table name is also a column.

>>> smurf_columns("name", ["city_id", "city_name", "city"], "city")
{'city'}

As with any short-circuiting function, an empty set is returned when no match is found.

>>> smurf_columns("id", ["dog_id", "bestie_name"], "bad_dogs")
set()

You may add custom mappings for irregular nouns.

>>> smurf_columns(
...     "id", ["goose_id"], "geese", plural_to_singular={"geese": "goose"}
... )
{'goose_id'}

Notes

This function acts similarly to chained calls to value_fstring_alias(), using fstring="{context}", for_value="name" and fstring="{context}_{value}", but is more powerful since it is able to preprocess the inputs.

value_fstring_alias(value, candidates, context, *, fstring, for_value=None, **kwargs)[source]#

Return a value formatted by fstring.

Note

This function modifies the value. Candidates are always returned as-is.

Parameters:
  • value – An element to find matches for.

  • candidates – Potential matches for value. Not used (returned as given).

  • context – Context in which the function is being called.

  • fstring – The format string to use. Can use value and context as placeholders.

  • for_value – If given, apply only if value == for_value. When for_value is given, fstring arguments which do not use the value as a placeholder key are permitted.

  • **kwargs – Additional keyword placeholders in fstring.

Returns:

A tuple (formatted_value, candidates).

Raises:

ValueError – If fstring does not contain a placeholder ‘value’ and for_value is not given.

Examples

Keys {value} and {context} are always available.

>>> value_fstring_alias("id", ["dog_id"], "dog", fstring="{context}_{value}")
('dog_id', ['dog_id'])

In cases such as these, consider using smurf_columns() instead, which will work both for table="dog" (as above), and with table="dogs".