id_translation.fetching#
Fetching of translation data.
- Composite:
MultiFetcher: Solution for using multiple simple fetchers, e.g. multiple databases or file-system locations. Or a combination thereof!
- Simple fetchers:
SqlFetcher: Fetching from a single SQL database or schema.PandasFetcher: File-system fetching based on pandas read-functions. Valid URL schemes include http, ftp, s3, gs, and file.MemoryFetcher: In-memory solution, used primarily for testing.
- Base fetchers:
Fetcher: Top-level interface definition. Base for all fetching implementations.AbstractFetcher: Implements high-level operations such as placeholder mapping.
Fetchers may have additional dependencies.
Classes
|
Base class for retrieving translations from an external source. |
Interface for user-managed caching. |
|
|
Interface for fetching translations from an external source. |
|
Fetch from memory. |
|
Fetcher which combines the results of other fetchers. |
|
Fetcher implementation using |
|
Fetch data from a SQL source. |
- class AbstractFetcher(*, mapper=None, allow_fetch_all=True, selective_fetch_all=True, identifiers=None, optional=False, cache_access=None)[source]#
Bases:
Fetcher[SourceType,IdType]Base class for retrieving translations from an external source.
- Parameters:
mapper – A
Mapperinstance used to adapt placeholder names in sources to wanted names, i.e. the names of the placeholders that are in the translationFormatbeing used.allow_fetch_all – If
False, an error will be raised whenfetch_all()is called.selective_fetch_all – If
True, fetch only from thosesourcesthat contain the requiredplaceholders(after mapping). May reduce the number of sources retrieved.identifiers – A collection of hierarchical identifiers. If given, element zero of the identifiers is added to the
loggername for the fetcher.optional – If
True, this fetcher may be discarded if source/placeholder-enumeration fails in multi-fetcher mode.cache_access – A
CacheAccessinstance. Defaults to a NOOP-implementation (i.e. always fetch new data).
- property allow_fetch_all#
Flag indicating whether the
fetch_all()operation is permitted.
- assert_online()[source]#
Raise an error if offline.
- Raises:
ConnectionStatusError – If not online.
- property cache_access#
Return the
CacheAccessfor this fetcher.
- classmethod default_mapper_kwargs()[source]#
Return default
Mapperarguments forAbstractFetcherimplementations.
- classmethod default_score_function(value, candidates, context)[source]#
Compute score for candidates.
- fetch(ids_to_fetch, placeholders=(), required=(), task_id=None, enable_uuid_heuristics=False)[source]#
Retrieve placeholder translations from the source.
- Parameters:
ids_to_fetch – An iterable of
IdsToFetch.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
task_id – Used for logging.
enable_uuid_heuristics – Improves matching when
UUID-like IDs are in use.
- Returns:
A mapping
{source: PlaceholderTranslations}of translation elements.- Raises:
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.UnknownSourceError – For sources(s) that are unknown to the
Fetcher.ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
See also
🔑 This is a key event method. See Key Event Records for details.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
Formatdocumentation for details.
- fetch_all(placeholders=(), *, required=(), sources=None, task_id=None, enable_uuid_heuristics=False)[source]#
Fetch as much data as possible.
- Parameters:
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
sources – A subset of sources to fetch. Unknown sources are ignored. Set to
Noneto fetch all sources.task_id – Used for logging.
enable_uuid_heuristics – Improves matching when
UUID-like IDs are in use.
- Returns:
A mapping
{source: PlaceholderTranslations}of translation elements.
See also
🔑 This is a key event method. See Key Event Records for details.
- Raises:
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.ImplementationError – For errors made by the inheriting implementation.
- abstract fetch_translations(instr)[source]#
Retrieve placeholder translations from the source.
- Parameters:
instr – A single
FetchInstructionfor IDs to fetch. If IDs isNone, the fetcher should retrieve data for as many IDs as possible.- Returns:
Placeholder translation elements.
- Raises:
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
See also
🔑 This is a key event method. See Key Event Records for details.
- property identifiers#
A collection of hierarchical identifiers for this fetcher.
- final initialize_sources(task_id=None, *, force=False)[source]#
Perform source discovery.
- Parameters:
task_id – Used for logging.
force – If
True, perform full discovery even if sources are already known.
See also
🔑 This is a key event method. See Key Event Records for details.
Notes
This function is called implicitly before every translation task. Result should be cached.
- property logger#
Return the
Loggerthat is used by this instance.
- map_placeholders(source, placeholders, *, candidates=None, task_id=None)[source]#
Map placeholder names to the actual names seen in source.
This method calls
Mapper.apply(values=placeholders, candidates=candidates, context=source)using the localAbstractFetcher.mapperinstance.- Parameters:
source – The source to map placeholders for.
placeholders – Desired
placeholders.candidates – A subset of candidates (placeholder names) in source to map with placeholders.
task_id – Used for logging.
- Returns:
A dict
{wanted_placeholder_name: actual_placeholder_name_in_source}, where actual_placeholder_name_in_source will beNoneif the wanted placeholder could not be mapped to any of the candidates available for the source.- Raises:
UnknownSourceError – If source is not in
sources.
See also
🔑 This is a key event method. See Key Event Records for details.
- property mapper#
Return the
Mapperinstance used for placeholder name mapping.
- property online#
Return connectivity status. If
False, no new translations may be fetched.
- property optional#
Return
Trueif this fetcher has been marked as optional.In multi-fetcher mode, optional fetchers may be discarded if
sourcescannot be resolved (raises an exception). Default value isFalse.- Returns:
Optionality status.
- property placeholders#
Placeholders for all known Source names, such as
idorname.These are the (possibly unmapped) placeholders that may be used for translation.
- Returns:
A dict
{source: [placeholders..]}.
- property selective_fetch_all#
If set, reduce the amount of data fetched by
fetch_all().
- property sources#
A list of known Source names, such as
citiesorlanguages.
- class CacheAccess[source]#
Bases:
ABC,Generic[SourceType,IdType]Interface for user-managed caching.
To enable caching, implement the abstract methods of the
CacheAccessinterface and pass it to the fetcher. See the 🚀 examples page to get started.- property enabled#
Return the enabled status for this
CacheAccess.Returns
Trueby default. If this property isFalse, no other methods will be called.
- abstract load(instr)[source]#
Load cached translations.
If this method returns
None, theAbstractFetcherwill usefetch_translations()instead. The fetcher will then callstore()using instr and the newly fetched translations.- Parameters:
instr – A
FetchInstruction.- Returns:
Cached
PlaceholderTranslationsorNone.
- property parent#
Parent
Fetcherinstance.The owner, typically an
AbstractFetcher, should callset_parent()during initialization.- Returns:
The fetcher that owns this
CacheAccess.- Raises:
RuntimeError – If called before the parent is set.
- set_parent(parent)[source]#
Set parent instance.
- Parameters:
parent – A
Fetcher.- Raises:
RuntimeError – If a parent is already set.
- abstract store(instr, translations)[source]#
Store fetched translations.
Note
This method will never be called with translations that were returned by
load().In other words, this method will only be called if
CacheAccess.load(instr)returnsNone.Hint
The
CacheAccessis under no obligation to actually store translations.For example, implementations may choose only to cache data when the
FetchInstruction.fetch_all-property of the given instr isTrue.- Parameters:
instr – The
FetchInstructionwhich produced the translations.translations – A
PlaceholderTranslationsproduced byfetch_translations().
- class Fetcher[source]#
Bases:
Generic[SourceType,IdType],HasSources[SourceType]Interface for fetching translations from an external source.
- abstract property allow_fetch_all#
Flag indicating whether the
fetch_all()operation is permitted.
- abstract fetch(ids_to_fetch, placeholders=(), *, required=(), task_id=None, enable_uuid_heuristics=False)[source]#
Retrieve placeholder translations from the source.
- Parameters:
ids_to_fetch – An iterable of
IdsToFetch.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
task_id – Used for logging.
enable_uuid_heuristics – Improves matching when
UUID-like IDs are in use.
- Returns:
A mapping
{source: PlaceholderTranslations}of translation elements.- Raises:
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.UnknownSourceError – For sources(s) that are unknown to the
Fetcher.ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
See also
🔑 This is a key event method. See Key Event Records for details.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
Formatdocumentation for details.
- abstract fetch_all(placeholders=(), *, required=(), sources=None, task_id=None, enable_uuid_heuristics=False)[source]#
Fetch as much data as possible.
- Parameters:
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
sources – A subset of sources to fetch. Unknown sources are ignored. Set to
Noneto fetch all sources.task_id – Used for logging.
enable_uuid_heuristics – Improves matching when
UUID-like IDs are in use.
- Returns:
A mapping
{source: PlaceholderTranslations}of translation elements.
See also
🔑 This is a key event method. See Key Event Records for details.
- Raises:
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.ImplementationError – For errors made by the inheriting implementation.
- abstract initialize_sources(task_id=None, *, force=False)[source]#
Perform source discovery.
- Parameters:
task_id – Used for logging.
force – If
True, perform full discovery even if sources are already known.
See also
🔑 This is a key event method. See Key Event Records for details.
Notes
This function is called implicitly before every translation task. Result should be cached.
- abstract property online#
Return connectivity status. If
False, no new translations may be fetched.
- class MemoryFetcher(data, return_all=True, **kwargs)[source]#
Bases:
AbstractFetcher[SourceType,IdType]Fetch from memory.
This is essentially a thin wrapper for the
PlaceholderTranslationsclass.- Parameters:
data – A dict
{source: PlaceholderTranslations}to fetch from.return_all – If
False, return only the requested IDs and placeholders.**kwargs – See
AbstractFetcher.
- fetch_translations(instr)[source]#
Retrieve placeholder translations from the source.
- Parameters:
instr – A single
FetchInstructionfor IDs to fetch. If IDs isNone, the fetcher should retrieve data for as many IDs as possible.- Returns:
Placeholder translation elements.
- Raises:
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
See also
🔑 This is a key event method. See Key Event Records for details.
- property return_all#
If
True,fetch_translations()will filter by ID.
- class MultiFetcher(*children, max_workers=1, on_source_conflict='raise', fetcher_discarded_log_level='DEBUG')[source]#
Bases:
Fetcher[SourceType,IdType]Fetcher which combines the results of other fetchers.
- Parameters:
*children – Fetchers to wrap.
max_workers – Number of threads to use for fetching. Fetch instructions will be dispatched using a
ThreadPoolExecutor. Individual fetchers will be called at most once perfetch()orfetch_all()call made with theMultiFetcher.on_source_conflict – Action to take when multiple fetchers
claimthe same source.fetcher_discarded_log_level – Level used when discarding
optionalfetchers.
- property allow_fetch_all#
Flag indicating whether the
fetch_all()operation is permitted.
- property children#
Return child fetchers sorted by rank.
- fetch(ids_to_fetch, placeholders=(), *, required=(), task_id=None, enable_uuid_heuristics=False)[source]#
Retrieve placeholder translations from the source.
- Parameters:
ids_to_fetch – An iterable of
IdsToFetch.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
task_id – Used for logging.
enable_uuid_heuristics – Improves matching when
UUID-like IDs are in use.
- Returns:
A mapping
{source: PlaceholderTranslations}of translation elements.- Raises:
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.UnknownSourceError – For sources(s) that are unknown to the
Fetcher.ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
See also
🔑 This is a key event method. See Key Event Records for details.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
Formatdocumentation for details.
- fetch_all(placeholders=(), *, required=(), sources=None, task_id=None, enable_uuid_heuristics=False)[source]#
Fetch as much data as possible.
- Parameters:
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
sources – A subset of sources to fetch. Unknown sources are ignored. Set to
Noneto fetch all sources.task_id – Used for logging.
enable_uuid_heuristics – Improves matching when
UUID-like IDs are in use.
- Returns:
A mapping
{source: PlaceholderTranslations}of translation elements.
See also
🔑 This is a key event method. See Key Event Records for details.
- Raises:
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.ImplementationError – For errors made by the inheriting implementation.
- initialize_sources(task_id=None, *, force=False)[source]#
Perform source discovery.
Perform source discovery for all
children, discardingoptionalchildren that raise or do not return any sources when their respectiveFetcher.initialize_sources()methods are called.- Parameters:
task_id – Used for logging.
force – If
True, perform full discovery even if sources are already known.
See also
🔑 This is a key event method. See Key Event Records for details.
Notes
Calling this method multiple times will not recover previously discarded optional child fetchers.
- property online#
Return connectivity status. If
False, no new translations may be fetched.
- property placeholders#
Placeholders for all known Source names, such as
idorname.These are the (possibly unmapped) placeholders that may be used for translation.
- Returns:
A dict
{source: [placeholders..]}.
- class PandasFetcher(read_function=None, read_path_format='data/{}.csv', read_function_kwargs=None, **kwargs)[source]#
Bases:
AbstractFetcher[str,IdType]Fetcher implementation using
pandas.DataFrameas the data format.Fetch data from serialized frames. How this is done is determined by the read_function. This is typically a Pandas function such as
pandas.read_csv()orpandas.read_parquet(), but any function that accepts a string source as the first argument and returns apandas.DataFramecan be used.Hint
When using remote file systems,
sourcesare resolved using AbstractFileSystem.glob(). If resolution fails, consider overriding thefind_sources()-method.- Parameters:
read_function – A function
(str) -> DataFrame. Derive from read_path_format ifNone. Strings are resolved byget_by_full_name()(withdefault_module=pandas).read_path_format – A string on the form
protocol://path/to/sources/{}.<ext>, or a callable to apply to a source before passing them to read_function.read_function_kwargs – Additional keyword arguments for read_function.
**kwargs – See
AbstractFetcher.
See also
The official Pandas IO documentation
- fetch_translations(instr)[source]#
Retrieve placeholder translations from the source.
- Parameters:
instr – A single
FetchInstructionfor IDs to fetch. If IDs isNone, the fetcher should retrieve data for as many IDs as possible.- Returns:
Placeholder translation elements.
- Raises:
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
See also
🔑 This is a key event method. See Key Event Records for details.
- find_sources(task_id=None)[source]#
Resolve sources and their associated paths.
- Parameters:
task_id – Used for logging.
Sources are resolved in three steps:
Create glob pattern by calling
format_source()withsource='*'.Glob files using AbstractFileSystem.glob() (requires
fsspec) orPath.glob().Strip the directory and file suffix from the globbed paths to create source names.
- Returns:
A dict
{source: path}.
- class SqlFetcher(connection_string, password=None, whitelist_tables=None, blacklist_tables=(), schema=None, include_views=False, engine_kwargs=None, **kwargs)[source]#
Bases:
AbstractFetcher[str,IdType]Fetch data from a SQL source.
- Parameters:
connection_string – A SQLAlchemy connection string.
password – Password to insert into the connection string. Will be escaped to allow for special characters. If given, the connection string must contain a password key, eg;
dialect://user:{password}@host:port.whitelist_tables – The only tables the fetcher may access.
blacklist_tables – The only tables the fetcher may not access.
schema – Database schema to use. Typically needed only if schema is not the default schema for the user specified in the connection string.
include_views – If
True, the fetcher will discover and query views as well.engine_kwargs – A dict of keyword arguments for
sqlalchemy.create_engine().**kwargs – See
AbstractFetcher.
- Raises:
ValueError – If both whitelist_tables and blacklist_tables are given.
Notes
Inheriting classes may override on or more of the following methods to further customize operation.
create_engine(); initializes the SQLAlchemy engine. Callsparse_connection_string.parse_connection_string(); does basic URL encoding. Called bycreate_engine.select_where(); filter values on theid_columnof the current table.make_table_summary(); createsTableSummaryinstances.uuid_like(); determine ifcastingis needed.cast_id_column_to_uuid(); attempt to cast the id_column toUUID.
Overriding should be done with care, as methods may call each other internally.
- class TableSummary(name, columns, fetch_all_permitted, id_column)#
-
Brief description of a known table.
- columns#
A flag indicating that the FETCH_ALL-operation is permitted for this table.
- fetch_all_permitted#
A flag indicating that the FETCH_ALL-operation is permitted for this table.
- id_column#
The ID column of the table.
- name#
Name of the table.
- property allow_fetch_all#
Flag indicating whether the
fetch_all()operation is permitted.
- cast_id_column_to_uuid(id_column, *, ids_are_uuid_like)[source]#
Apply UUID heuristics to the ID column.
This function attempts cast the id_column to a suitable type by looking at the type of the column and the ids_are_uuid_like-flag.
If the column is already UUID-like (as determined by
get_metadata()), the column is always returned as-is.- Parameters:
id_column – The ID
sqlalchemy.sql.Columnof the table.ids_are_uuid_like – One of
Trueand'unknown'(neverFalse). The latter typically means thatfetch_all()was called, but could also be a normal “translation” call without IDs.
- Returns:
The id_column with or without a cast applied.
- classmethod create_engine(connection_string, password, engine_kwargs)[source]#
Factory method used by
__init__.For a more detailed description of the arguments and the behaviour of this function, see the
class docstring.- Parameters:
connection_string – A SQLAlchemy connection string.
password – Password to insert into the connection string.
engine_kwargs – A dict of keyword arguments for
sqlalchemy.create_engine().
- Returns:
A new
Engine.
- fetch_translations(instr)[source]#
Retrieve placeholder translations from the source.
- Parameters:
instr – A single
FetchInstructionfor IDs to fetch. If IDs isNone, the fetcher should retrieve data for as many IDs as possible.- Returns:
Placeholder translation elements.
- Raises:
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
See also
🔑 This is a key event method. See Key Event Records for details.
- make_table_summary(table, id_column)[source]#
Create a table summary.
This function is called as a part of the fetcher initialization process.
- Parameters:
table – The table (source) which is currently being processed.
id_column – The ID column of table
- Returns:
A summary object for table.
- property online#
Return connectivity status. If
False, no new translations may be fetched.
- classmethod parse_connection_string(connection_string, password)[source]#
Parse a connection string.
- select_where(select, *, ids, id_column, table)[source]#
User method for modifying SELECT statements.
The default implementation returns select as-is. Selection based on IDs is done before this method is called. Users may override this method to change what and which data is returned, e.g. by additional WHERE-clauses.
- Parameters:
select – A
sqlalchemy.sql.Selectelement. If returned as-is, all IDs in the table will be fetched.ids – Set of IDs to fetch. Will be
Noneiffetch_all()was called.id_column – The ID
sqlalchemy.sql.Columnof the table, from which ids are fetched.table – Table to select from.
- Returns:
The final statement object to use.
- uuid_like(id_column, ids)[source]#
Determine whether id_column should be passed to
cast_id_column_to_uuid().Note
Will not be called unless
Translator.enable_uuid_heuristicsisTrue.Only
Falsewill bypass callingcast_id_column_to_uuid().
- Return values:
True: Attempt to cast usingcast_id_column_to_uuid()withids_are_uuid_like=True.False: Do not cast;cast_id_column_to_uuid()will not be called.None: Attempt to cast usingcast_id_column_to_uuid()withids_are_uuid_like='unknown'.
- Parameters:
id_column – The ID
sqlalchemy.sql.Columnof the table.ids – Set of IDs to fetch. Will be
Noneiffetch_all()was called.
- Returns:
One of
True,FalseandNone. See above for explanation.
Modules
Errors and warnings related to fetching. |
|
Types related to translation fetching. |