id_translation.mapping#
Shared API of the mapping processes (e.g. names to sources).
Classes
|
Enumeration type for cardinality relationships. |
|
A two-way mapping between hashable elements. |
|
Callable wrapper for computing heuristic scores. |
|
Optimal value-candidate matching. |
- class Cardinality(*values)[source]#
Bases:
EnumEnumeration type for cardinality relationships.
Cardinalities are comparable using numerical operators, and can be thought of as comparing “preciseness”. The less ambiguity there is for a given cardinality, the smaller it is in comparison to the others. The hierarchy is given by
1:1 < 1:N = N:1 < M:N. Note that1:NandN:1are considered equally precise.Examples
Comparing cardinalities
>>> from id_translation.mapping import Cardinality >>> Cardinality.ManyToOne <Cardinality.ManyToOne: 'N:1'> >>> Cardinality.OneToOne <Cardinality.OneToOne: '1:1'> >>> Cardinality.ManyToOne < Cardinality.OneToOne False
- ManyToMany = 'M:N'#
Many-to-many relationship.
- ManyToOne = 'N:1'#
Many-to-one relationship.
- OneToMany = '1:N'#
One-to-many relationship.
- OneToOne = '1:1'#
One-to-one relationship.
- classmethod from_counts(left_count, right_count)[source]#
Derive a Cardinality from counts.
- Parameters:
left_count – Number of elements on the left-hand side.
right_count – Number of elements on the right-hand side.
- Returns:
A
Cardinality.- Raises:
ValueError – For counts < 1.
- property inverse#
Inverse cardinality. For symmetric cardinalities,
self.inverse == self.- Returns:
Inverse cardinality.
See also
- property many_left#
Many-relationship on the left,
TrueforN:1andM:N.
- property many_right#
Many-relationship on the right,
Truefor1:NandM:N.
- property one_left#
One-relationship on the left,
Truefor1:1and1:N.
- property one_right#
One-relationship on the right,
Truefor1:1andN:1.
- classmethod parse(arg, strict=False)[source]#
Convert to cardinality.
- Parameters:
arg – Argument to parse.
strict – If
True, arg must match exactly when it is given as a string.
- Returns:
A
Cardinality.- Raises:
ValueError – If the argument could not be converted.
- class DirectionalMapping(cardinality=None, left_to_right=None, right_to_left=None, _verify=True)[source]#
-
A two-way mapping between hashable elements.
- Parameters:
cardinality – Explicit cardinality. Derive if
None.left_to_right – A left-to-right mapping of elements.
right_to_left – A right-to-left mapping of elements.
_verify – If
False, input checks are disabled. Intended for internal use.
- Raises:
ValueError – If both of left_to_right and right_to_left are
None.ValueError – If verification of two-sided input fails, and
verify=True.CardinalityError – If explicit cardinality <
cardinality, andverify=True.
- property cardinality#
Cardinality with which this mapping was created.
- Returns:
Cardinality with which this mapping was created.
- flatten()[source]#
Return a flattened version of self as a dict.
- Returns:
A dict
{left: right}.- Raises:
CardinalityError – If cardinality is not
OneToOneorManyToOne.
- property left#
Left-side elements in the mapping.
- property left_to_right#
Left-to-right element mappings.
- property reverse#
Reverse the mapping by swapping the sides.
- Returns:
A copy with data identical to the calling instance, but with sides inversed compared to the caller.
- property right#
Right-side elements in the mapping.
- property right_to_left#
Right-to-left element mappings.
- select_left(elements, exclude=False)[source]#
Perform a selection on left-side elements.
- Parameters:
elements – Elements to select.
exclude – If
True, return everything except the given elements.
- Returns:
A new Mapping for the selection.
- Raises:
KeyError – If any of the chosen elements do not exist and
exclude=False.
- select_right(elements, exclude=False)[source]#
Perform a selection on right-side elements.
- Parameters:
elements – Elements to select.
exclude – If
True, return everything except the given elements.
- Returns:
A new instance for the selection.
- Raises:
KeyError – If any of the chosen elements do not exist and
exclude=False.
- class HeuristicScore(score_function, heuristics)[source]#
Bases:
Generic[ValueType,CandidateType,ContextType]Callable wrapper for computing heuristic scores.
Instances are callable. Signature is given by
ScoreFunction.- Short-circuiting:
A mechanism for forced matching. Score is set to +∞ for short-circuited candidates, and -∞ for the rest. No further matching will be performed after this point, so ensure that all desired candidates are returned by chosen filters.
- Procedure:
Trigger
short-circuitingif there is an exact value-candidate match.All heuristics are applied and scores are computed.
If no
short-circuitingis triggered in step 2, yield max score for each candidate.
- Parameters:
score_function – A
ScoreFunctionto wrap.heuristics – Iterable of heuristics or tuples
(heuristic, kwargs)to apply to the(value, candidates)inputs of score_function.
- Heuristic types:
An
AliasFunction, which accepts and returns a tuple (value, candidates) to be evaluated.A
FilterFunction, which accepts a tuple (value, candidates) and returns a subset of candidates. If any candidates are returned,short-circuitingis triggered.
Notes
Heuristic function input order = application order.
You may add
mutate=Trueto the heuristics kwargs to forward to the modifications made by that function.
- property score_function#
Return the underlying likeness score function.
- class Mapper(score_function='disabled', score_function_kwargs=None, filter_functions=(), min_score=0.9, overrides=None, on_unmapped='ignore', on_unknown_user_override='raise', cardinality=Cardinality.ManyToOne)[source]#
Bases:
Generic[ValueType,CandidateType,ContextType]Optimal value-candidate matching.
For an introduction to mapping, see the Mapping primer page.
- Parameters:
score_function – A callable which accepts a value k and an ordered collection of candidates c, returning a score
s_ifor each candidate c_i in c. Default:s_i = float(k == c_i). Higher=better match.score_function_kwargs – Keyword arguments for score_function.
filter_functions – Function-kwargs pairs of filters to apply before scoring.
min_score – Minimum score s_i, as given by
score(k, c_i), to consider k a match for c_i.overrides – If a dict, assumed to be 1:1 mappings (value to candidate) which override the scoring logic. If
rics.collections.dicts.InheritedKeysDict, the context passed toapply()is used to retrieve specific overrides.on_unmapped – Action to take if mapping fails for any values.
on_unknown_user_override – Action to take if an
UserOverrideFunctionreturns an unknown candidate. Unknown candidates, i.e. candidates not in the input candidates collection, will not be used unless ‘allow’ is chosen.cardinality – Desired cardinality for mapped values. Derive for each matching if
None.
- apply(values, candidates, context=None, override_function=None, *, task_id=None, **kwargs)[source]#
Map values to candidates.
- Parameters:
values – Iterable of elements to match to candidates.
candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.
context – Context in which mapping is being done.
override_function – A callable that takes inputs
(value, candidates, context)that returns eitherNone(let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by theon_unknown_user_overrideproperty.task_id – Used for logging.
**kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the
Mapperis initialized.
- Returns:
A
DirectionalMappingon the form{value: [matched_candidates..]}. May be turned into a plain dict{value: candidate}by using theDirectionalMapping.flatten()function (only ifDirectionalMapping.cardinalityis of typeCardinality.one_right).- Raises:
MappingError – If any values failed to match and
on_unmapped='raise'.BadFilterError – If a filter returns candidates that are not a subset of the original candidates.
UserMappingError – If override_function returns an unknown candidate and
on_unknown_user_override != 'allow'MappingError – If passing
context=None(the default) when using context-sensitive overrides (typerics.collections.dicts.InheritedKeysDict).
- property cardinality#
Return upper cardinality bound during mapping.
- compute_scores(values, candidates, context=None, override_function=None, task_id=None, **kwargs)[source]#
Compute likeness scores.
- Parameters:
values – Iterable of elements to match to candidates.
candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.
context – Context in which mapping is being done.
override_function – A callable that takes inputs
(value, candidates, context)that returns eitherNone(let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by theon_unknown_user_overrideproperty.task_id – Used for logging.
**kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the
Mapperis initialized.
- Returns:
A
DataFrameof value-candidate match scores, withDataFrame.index=valuesandDataFrame.columns=candidates.- Raises:
BadFilterError – If a filter returns candidates that are not a subset of the original candidates.
UserMappingError – If override_function returns an unknown candidate and
on_unknown_user_override != 'allow'
- copy(**overrides)[source]#
Make a copy of this
Mapper.- Parameters:
overrides – Keyword arguments to use when instantiating the copy. Options that aren’t given will be taken from the current instance. See the
Mapperclass documentation for possible choices.- Returns:
A copy of this
Mapperwith overrides applied.
- property logger#
Return the
Loggerthat is used by this instance.
- property on_unknown_user_override#
Return the action to take if an override function returns an unknown candidate.
- Returns:
Action to take if a user-defined override function returns an unknown candidate.
- property on_unmapped#
Return the action to take if mapping fails for any values.
Modules
Mapping errors. |
|
Functions that return a subset of candidates with which to continue the matching procedure. |
|
Functions which perform heuristics for score functions. |
|
Functions and classes used by the |
|
Functions which return a likeness score. |
|
Types used for mapping. |