A CacheAccess implementation#
A CacheAccess solution that stores data locally on disk. Click here to download the
full script.
Design goals#
We’ve arbitrarily decided on the following requirements:
Data should only be cached if the fetcher is performing a
fetch_all-operation.Cached data should be stored on disk using the feather format.
Cached data should have a timeout (TTL), measured in seconds.
We’ll create a new class, MyCacheAccess, to meet these requirements.
Implementation#
The new class needs to know where to store data and how long to keep it.
__init__ method.#def __init__(self, root: str, ttl: int) -> None:
super().__init__()
self._root = Path(root)
self._ttl = ttl # In seconds
self._root.mkdir(parents=True, exist_ok=True)
We can now start implementing the abstract methods in CacheAccess. We’ll start with
CacheAccess.store():
MyCacheAccess.store() method.#def store(
self,
instr: FetchInstruction[SourceType, IdType],
translations: PlaceholderTranslations[SourceType],
) -> None:
if not instr.fetch_all:
print(
f"Refuse caching of source={instr.source!r}"
" since FetchInstruction.fetch_all=False."
)
return
df = translations.to_pandas()
path = self._root / f"{translations.source}.ftr"
print(f"Store cache at path='{path}'.")
df.to_feather(path)
Requirement 1: If FetchInstruction.fetch_all is False, data should not be stored.
Otherwise, we use source as the file name and we convert the translations to a DataFrame using
PlaceholderTranslations.to_pandas(). Requirement 2: The frame is witten to disk using
pandas.DataFrame.to_feather().
We’re now ready to implement CacheAccess.load(), which will read, verify, and convert the stored data.
MyCacheAccess.load() method.#def load(
self,
instr: FetchInstruction[SourceType, IdType],
) -> PlaceholderTranslations[SourceType] | None:
path = self._root / f"{instr.source}.ftr"
if not path.exists():
print(f"Cache at path='{path}' does not exist.")
return None
age = self.age_in_seconds(path)
if age > self._ttl:
print(f"Reject cache ({age=} > ttl={self._ttl}) at path='{path}'.")
return None
print(f"Load cache (age={age} <= {self._ttl}=ttl) at path='{path}'.")
df = pd.read_feather(path)
return PlaceholderTranslations.from_dataframe(instr.source, df)
As per Requirement 3, we should only return data that is newer than ttl seconds. We’ll use the
modification time of the serialized data that is reported by the operating system.
MyCacheAccess.age_in_seconds() method.#@staticmethod
def age_in_seconds(path: Path) -> int:
timestamp = path.stat().st_mtime
modified = datetime.fromtimestamp(timestamp)
seconds = (datetime.now() - modified).total_seconds()
return round(seconds)
If the data is stale, we return None.
Hint
Returning None signals to the caller that data should be retrieved some other way; typically by using
AbstractFetcher.fetch_translations() instead.
The data is read using pandas.read_feather(), then converted using PlaceholderTranslations.from_dataframe().
Creating a cached fetcher#
All AbstractFetcher implementations accept an optional cache_access keyword argument.
Translator with a cached fetcher.#def create() -> Translator[str, str, int]:
cache_access = MyCacheAccess(root="./cache/", ttl=3600)
fetcher = MemoryFetcher(
data={"people": {1904: "Fred"}},
cache_access=cache_access,
)
return Translator(fetcher)
Using a CacheAccess with a MemoryFetcher doesn’t make much sense, but the caching procedure works
just the same as it would for e.g. a SqlFetcher.
Hint
To configure caching using TOML, add a [fetching.cache]-section.
The type key is required. Other keys are determined by the implementation.
[fetching.cache]
type = "__main__.MyCacheAccess"
root = "./cache/"
ttl = 3600
See the Configuration page for more information.
Caching in action#
We’ll use the create() function defined above to initialize new Translator instances.
translator = create()
print("person=", translator.translate(1904, "people"))
Initial creation. Data is retrieved from the source. There’s only one ID in the fetcher, but the cache implementation doesn’t know that. It refuses to store the data as per Requirement 1.
Cache at path='cache/people.ftr' does not exist.
Refuse caching of source='people' since FetchInstruction.fetch_all=False.
person= 1904:Fred
Using Translator.go_offline() without any explicit IDs will call fetch_all.
translator.go_offline()
print("person=", translator.translate(1904, "people"))
When going offline, the Translator will store translation data in-memory as a TranslationMap.
Cache at path='cache/people.ftr' does not exist.
Store cache at path='cache/people.ftr'.
person= 1904:Fred
By definition, a translator that is offline does not have a fetcher attached. The effects of this can be seen above: The
cache was updated, but it wasn’t loaded again for the translate() call. There is no way to reconnect
an offline Translator, so this instance will be limited to using it’s cache until it is destroyed.
Of course, deleting the MyCacheAccess instance doesn’t remove the files on disk.
print("person=", create().translate(1904, "people"))
If we create a new Translator and use it right away (or within ttl = 3600 seconds = 1 hour), the cached data will
be used.
Load cache (age=0 <= 3600=ttl) at path='cache/people.ftr'.
person= 1904:Fred
This concludes the example. Click here to download the full script.