id_translation.mapping.heuristic_functions#
Functions which perform heuristics for score functions.
See also
The HeuristicScore class.
Functions
|
Return candidates formatted by fstring. |
|
Force lower-case in value and candidates. |
|
Normalize name and tables to appear as base-form nouns. |
|
Short-circuit value to the target candidate if the target and regex conditions are met. |
|
Short-circuit placeholder to a matching smurf column. |
|
Return a value formatted by fstring. |
Classes
|
Naive utility class for transforming nouns to singular form. |
- class NounTransformer(custom=None)[source]#
Bases:
objectNaive utility class for transforming nouns to singular form.
This class performs simple heuristics to convert nouns commonly used as database table names. It will quickly break either if given nouns that are already on singular form, or are not trivially convertible (see
PLURAL_TO_SINGULAR_SUFFIXES) to singular form.Note
For more complex use cases, consider using a language-processing framework such as inflect (PyPI) instead.
Pass
plural_to_singular=<fully-qualified-name>to use your implementation in any function that accepts a plural_to_singular-argument.def my_transform(plural: str) -> str: import inflect p = inflect.engine() return p.singular_noun(plural) smurf_columns(..., plural_to_singular="__main__.my_transform")
Examples
>>> nt = NounTransformer(custom={"geese": "goose"}) >>> nt("city"), nt("cities") ('city', 'city') >>> nt("country"), nt("countries") ('country', 'country') >>> nt("geese"), nt("goose"), nt("species") ('goose', 'goose', 'species')
May break when given a noun that is already singular.
>>> nt("bus"), nt("news") ('bu', 'new')
See
PLURAL_TO_SINGULAR_SUFFIXESfor affected suffixes.Notes
This is not
Transformerimplementation, in spite of the name.- IRREGULARS = {'exercises': 'exercise', 'phases': 'phase', 'species': 'species'}#
Known irregular plural-to-singular transformations.
- PLURAL_TO_SINGULAR_SUFFIXES = (('ies', 'y'), ('ives', 'ife'), ('ves', 'f'), ('oes', 'o'), ('hes', 'h'), ('ses', 's'), ('xes', 'x'), ('s', ''))#
Plural-to-singular suffix mappings.
- candidate_fstring_alias(value, candidates, context, *, fstring, **kwargs)[source]#
Return candidates formatted by fstring.
Note
This function modifies the candidates. The value is always returned as-is.
- Parameters:
value – An element to find matches for. Not used (returned as given).
candidates – Potential matches for value.
context – Context in which the function is being called.
fstring – The format string to use. Can use value, context, and elements of candidates as placeholders.
**kwargs – Additional keyword placeholders in fstring.
- Returns:
A tuple
(value, formatted_candidates).- Raises:
ValueError – If fstring does not contain a placeholder ‘candidate’.
- like_database_table(name, tables, context, *, plural_to_singular=True)[source]#
Normalize name and tables to appear as base-form nouns.
- Parameters:
name – A name to find a translation source.
tables – Database tables used as possible translation sources.
context – Ignored.
plural_to_singular –
Convert plural-form to singular form. Pass a
dictto specify custom transformations, backed by the default transformer. SeeNounTransformerfor details. Set toFalseto disable.To use a custom transformer, pass a callable
(str) -> str, or the fully qualified name of such a callable. The callable will be resolved usingrics.misc.get_by_full_name(), then cached.
- Returns:
A tuple
(normalized_name, normalized_table_names).
See also
Examples
Remove ID suffixes and convert a variety of plural forms to singular forms.
>>> like_database_table("dog_id", ["dog", "dogs"], None) ('dog', ['dog', 'dog']) >>> like_database_table("city_ids", ["city", "cities"], None) ('city', ['city', 'city']) >>> like_database_table("CountryBitmask", ["Country", "COUNTRIES"], None) ('country', ['country', 'country'])
Inputs are coerced to lower case.
- short_circuit(value, candidates, context, *, value_regex, target_candidate, task_id=None)[source]#
Short-circuit value to the target candidate if the target and regex conditions are met.
If target_candidate is in candidates and value matches the given value_regex, a single-element set
{target_candidate}is returned which will trigger short-circuiting in the callingMapper. If either of these conditions fail, an empty set is returned and the mapping procedure will continue.- Parameters:
value – A value to map.
candidates – Candidates for value.
context – Always ignored, exists for compatibility.
value_regex – A pattern match against value. Case-insensitive by default.
target_candidate – The candidate to short-circuit to.
task_id – Used for logging.
- Returns:
A single-element set
{target_candidate}, iff both conditions are met. An empty set otherwise.
Examples
Always match any bite victim-columns to the humans table (see the Translation primer page).
>>> short_circuit( ... "first_bite_victim", ... {"humans", "animals"}, ... None, ... value_regex=".*_bite_victim$", ... target_candidate="humans", ... ) {'humans'}
Short-circuiting will only trigger if the value_regex matches, and the target_candidate is present.
- smurf_columns(placeholder, columns, table, *, plural_to_singular=False)[source]#
Short-circuit placeholder to a matching smurf column.
The smurf naming convention (or anti-pattern, depending on who you ask) refers the practice of including the name of the table in the column name, especially for the primary key ID column.
Typical columns one might encounter are
country.country_idandcities.city_name. Note that, for the latter match to be made, you must passplural_to_singular=True | dict.Special handling is implemented for
placeholder="name", which will match when the singular-form table name is found in columns.- Parameters:
placeholder – A
Formatplaceholder.columns – The columns of a database table.
table – A
Translatorsourcetable to which the columns (orplaceholders) belong.plural_to_singular –
Convert plural-form to singular form. Pass a
dictto specify custom transformations, backed by the default transformer. SeeNounTransformerfor details. Set toFalseto disable.To use a custom transformer, pass a callable
(str) -> str, or the fully qualified name of such a callable. The callable will be resolved usingrics.misc.get_by_full_name(), then cached.
- Returns:
A single-element set
{column}, iff a match is found. An empty set otherwise.
Examples
Default translation
Format({id}:{name}) placeholders.>>> smurf_columns("id", ["city_id", "city_name", "city"], "city") {'city_id'}
Both plural and singular form table names are supported, but the plural-to-singular transformation must be explicitly enabled with
plural_to_singular=True.>>> smurf_columns( ... "name", ["city_id", "city_name"], "cities", plural_to_singular=True ... ) {'city_name'}
Special handling for
placeholder="name"when the table name is also a column.>>> smurf_columns("name", ["city_id", "city_name", "city"], "city") {'city'}
As with any short-circuiting function, an empty set is returned when no match is found.
>>> smurf_columns("id", ["dog_id", "bestie_name"], "bad_dogs") set()
You may add custom mappings for irregular nouns.
>>> smurf_columns( ... "id", ["goose_id"], "geese", plural_to_singular={"geese": "goose"} ... ) {'goose_id'}
Notes
This function acts similarly to chained calls to
value_fstring_alias(), usingfstring="{context}", for_value="name"andfstring="{context}_{value}", but is more powerful since it is able to preprocess the inputs.
- value_fstring_alias(value, candidates, context, *, fstring, for_value=None, **kwargs)[source]#
Return a value formatted by fstring.
Note
This function modifies the value. Candidates are always returned as-is.
- Parameters:
value – An element to find matches for.
candidates – Potential matches for value. Not used (returned as given).
context – Context in which the function is being called.
fstring – The format string to use. Can use value and context as placeholders.
for_value – If given, apply only if
value == for_value. When for_value is given, fstring arguments which do not use the value as a placeholder key are permitted.**kwargs – Additional keyword placeholders in fstring.
- Returns:
A tuple
(formatted_value, candidates).- Raises:
ValueError – If fstring does not contain a placeholder ‘value’ and for_value is not given.
Examples
Keys
{value}and{context}are always available.>>> value_fstring_alias("id", ["dog_id"], "dog", fstring="{context}_{value}") ('dog_id', ['dog_id'])
In cases such as these, consider using
smurf_columns()instead, which will work both fortable="dog"(as above), and withtable="dogs".