id_translation.mapping#

Shared API of the mapping processes (e.g. names to sources).

Classes

Cardinality(*values)

Enumeration type for cardinality relationships.

DirectionalMapping([cardinality, ...])

A two-way mapping between hashable elements.

HeuristicScore(score_function, heuristics)

Callable wrapper for computing heuristic scores.

Mapper([score_function, ...])

Optimal value-candidate matching.

class Cardinality(*values)[source]#

Bases: Enum

Enumeration type for cardinality relationships.

Cardinalities are comparable using numerical operators, and can be thought of as comparing “preciseness”. The less ambiguity there is for a given cardinality, the smaller it is in comparison to the others. The hierarchy is given by 1:1 < 1:N = N:1 < M:N. Note that 1:N and N:1 are considered equally precise.

Examples

Comparing cardinalities

>>> from id_translation.mapping import Cardinality
>>> Cardinality.ManyToOne
<Cardinality.ManyToOne: 'N:1'>
>>> Cardinality.OneToOne
<Cardinality.OneToOne: '1:1'>
>>> Cardinality.ManyToOne < Cardinality.OneToOne
False
ManyToMany = 'M:N'#

Many-to-many relationship.

ManyToOne = 'N:1'#

Many-to-one relationship.

OneToMany = '1:N'#

One-to-many relationship.

OneToOne = '1:1'#

One-to-one relationship.

classmethod from_counts(left_count, right_count)[source]#

Derive a Cardinality from counts.

Parameters:
  • left_count – Number of elements on the left-hand side.

  • right_count – Number of elements on the right-hand side.

Returns:

A Cardinality.

Raises:

ValueError – For counts < 1.

property inverse#

Inverse cardinality. For symmetric cardinalities, self.inverse == self.

Returns:

Inverse cardinality.

See also

symmetric

property many_left#

Many-relationship on the left, True for N:1 and M:N.

property many_right#

Many-relationship on the right, True for 1:N and M:N.

property one_left#

One-relationship on the left, True for 1:1 and 1:N.

property one_right#

One-relationship on the right, True for 1:1 and N:1.

classmethod parse(arg, strict=False)[source]#

Convert to cardinality.

Parameters:
  • arg – Argument to parse.

  • strict – If True, arg must match exactly when it is given as a string.

Returns:

A Cardinality.

Raises:

ValueError – If the argument could not be converted.

property symmetric#

Symmetry flag. For symmetric cardinalities, self.inverse == self.

Returns:

Symmetry flag.

See also

inverse

class DirectionalMapping(cardinality=None, left_to_right=None, right_to_left=None, _verify=True)[source]#

Bases: Generic[HL, HR]

A two-way mapping between hashable elements.

Parameters:
  • cardinality – Explicit cardinality. Derive if None.

  • left_to_right – A left-to-right mapping of elements.

  • right_to_left – A right-to-left mapping of elements.

  • _verify – If False, input checks are disabled. Intended for internal use.

Raises:
property cardinality#

Cardinality with which this mapping was created.

Returns:

Cardinality with which this mapping was created.

flatten()[source]#

Return a flattened version of self as a dict.

Returns:

A dict {left: right}.

Raises:

CardinalityError – If cardinality is not OneToOne or ManyToOne.

property left#

Left-side elements in the mapping.

property left_to_right#

Left-to-right element mappings.

property reverse#

Reverse the mapping by swapping the sides.

Returns:

A copy with data identical to the calling instance, but with sides inversed compared to the caller.

property right#

Right-side elements in the mapping.

property right_to_left#

Right-to-left element mappings.

select_left(elements, exclude=False)[source]#

Perform a selection on left-side elements.

Parameters:
  • elements – Elements to select.

  • exclude – If True, return everything except the given elements.

Returns:

A new Mapping for the selection.

Raises:

KeyError – If any of the chosen elements do not exist and exclude=False.

select_right(elements, exclude=False)[source]#

Perform a selection on right-side elements.

Parameters:
  • elements – Elements to select.

  • exclude – If True, return everything except the given elements.

Returns:

A new instance for the selection.

Raises:

KeyError – If any of the chosen elements do not exist and exclude=False.

class HeuristicScore(score_function, heuristics)[source]#

Bases: Generic[ValueType, CandidateType, ContextType]

Callable wrapper for computing heuristic scores.

Instances are callable. Signature is given by ScoreFunction.

Short-circuiting:

A mechanism for forced matching. Score is set to +∞ for short-circuited candidates, and -∞ for the rest. No further matching will be performed after this point, so ensure that all desired candidates are returned by chosen filters.

Procedure:
  1. Trigger short-circuiting if there is an exact value-candidate match.

  2. All heuristics are applied and scores are computed.

  3. If no short-circuiting is triggered in step 2, yield max score for each candidate.

Parameters:
  • score_function – A ScoreFunction to wrap.

  • heuristics – Iterable of heuristics or tuples (heuristic, kwargs) to apply to the (value, candidates) inputs of score_function.

Heuristic types:
  • An AliasFunction, which accepts and returns a tuple (value, candidates) to be evaluated.

  • A FilterFunction, which accepts a tuple (value, candidates) and returns a subset of candidates. If any candidates are returned, short-circuiting is triggered.

Notes

  • Heuristic function input order = application order.

  • You may add mutate=True to the heuristics kwargs to forward to the modifications made by that function.

add_heuristic(heuristic, kwargs=None)[source]#

Add a new heuristic.

property score_function#

Return the underlying likeness score function.

class Mapper(score_function='disabled', score_function_kwargs=None, filter_functions=(), min_score=0.9, overrides=None, on_unmapped='ignore', on_unknown_user_override='raise', cardinality=Cardinality.ManyToOne)[source]#

Bases: Generic[ValueType, CandidateType, ContextType]

Optimal value-candidate matching.

For an introduction to mapping, see the Mapping primer page.

Parameters:
  • score_function – A callable which accepts a value k and an ordered collection of candidates c, returning a score s_i for each candidate c_i in c. Default: s_i = float(k == c_i). Higher=better match.

  • score_function_kwargs – Keyword arguments for score_function.

  • filter_functions – Function-kwargs pairs of filters to apply before scoring.

  • min_score – Minimum score s_i, as given by score(k, c_i), to consider k a match for c_i.

  • overrides – If a dict, assumed to be 1:1 mappings (value to candidate) which override the scoring logic. If rics.collections.dicts.InheritedKeysDict, the context passed to apply() is used to retrieve specific overrides.

  • on_unmapped – Action to take if mapping fails for any values.

  • on_unknown_user_override – Action to take if an UserOverrideFunction returns an unknown candidate. Unknown candidates, i.e. candidates not in the input candidates collection, will not be used unless ‘allow’ is chosen.

  • cardinality – Desired cardinality for mapped values. Derive for each matching if None.

apply(values, candidates, context=None, override_function=None, *, task_id=None, **kwargs)[source]#

Map values to candidates.

Parameters:
  • values – Iterable of elements to match to candidates.

  • candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.

  • context – Context in which mapping is being done.

  • override_function – A callable that takes inputs (value, candidates, context) that returns either None (let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by the on_unknown_user_override property.

  • task_id – Used for logging.

  • **kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the Mapper is initialized.

Returns:

A DirectionalMapping on the form {value: [matched_candidates..]}. May be turned into a plain dict {value: candidate} by using the DirectionalMapping.flatten() function (only if DirectionalMapping.cardinality is of type Cardinality.one_right).

Raises:
property cardinality#

Return upper cardinality bound during mapping.

compute_scores(values, candidates, context=None, override_function=None, task_id=None, **kwargs)[source]#

Compute likeness scores.

Parameters:
  • values – Iterable of elements to match to candidates.

  • candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.

  • context – Context in which mapping is being done.

  • override_function – A callable that takes inputs (value, candidates, context) that returns either None (let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by the on_unknown_user_override property.

  • task_id – Used for logging.

  • **kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the Mapper is initialized.

Returns:

A DataFrame of value-candidate match scores, with DataFrame.index=values and DataFrame.columns=candidates.

Raises:
  • BadFilterError – If a filter returns candidates that are not a subset of the original candidates.

  • UserMappingError – If override_function returns an unknown candidate and on_unknown_user_override != 'allow'

copy(**overrides)[source]#

Make a copy of this Mapper.

Parameters:

overrides – Keyword arguments to use when instantiating the copy. Options that aren’t given will be taken from the current instance. See the Mapper class documentation for possible choices.

Returns:

A copy of this Mapper with overrides applied.

property logger#

Return the Logger that is used by this instance.

property on_unknown_user_override#

Return the action to take if an override function returns an unknown candidate.

Returns:

Action to take if a user-defined override function returns an unknown candidate.

property on_unmapped#

Return the action to take if mapping fails for any values.

to_directional_mapping(scores, *, task_id=None)[source]#

Create a DirectionalMapping from match scores.

Parameters:
  • scores – A score matrix, where scores.index are values and score.columns are treated as the candidates.

  • task_id – Used for logging.

Returns:

A DirectionalMapping.

Modules

exceptions

Mapping errors.

filter_functions

Functions that return a subset of candidates with which to continue the matching procedure.

heuristic_functions

Functions which perform heuristics for score functions.

matrix

Functions and classes used by the Mapper for handling score matrices.

score_functions

Functions which return a likeness score.

types

Types used for mapping.