CRUK-CI Genologics API Client

Caching

Using the in-memory cache with the client can massively increase the speed a script can run. It accumulates a number of entities from REST API calls so subsequent calls to retrieve these same entities need not pass through to the API. This also has the effect to simplifying user code, as the script can make the calls through the ClarityAPI interface without worrying about whether it already has some or all of the objects requested.

Like any client side caching, there are risks that an entity will be updated from elsewhere while the client holds an earlier copy of the object. This hasn't presented us much of a problem in EPP scripts (where this client is used a lot) as these run from beginning to end and then stop, so the lifetime of the cache is simply the duration of the script.

If used in a server environment, the cache can still be beneficial but it may need to be tweaked so objects don't live as long in the cache. We've found that most operations rarely hit the same objects from two different people (though this is a risk) and the performance gain much outweighs the risk of out of date information. The cache can be tuned or disabled as required by the application.

Enabling the cache

The cache is essentially an aspect around the main ClarityAPI implementation. It is configured with Spring and can be enabled for an application by adding a file to Spring's application context path:

/org/cruk/clarity/api/clarity-cache-context.xml

This configuration does require the general /org/cruk/clarity/api/clarity-client-context.xml configuration to be in Spring's application context path as it depends on beans defined there.

clarity-cache-context.xml is provided in the client's JAR file.

Tuning the cache

The cache is controlled by the file:

/org/cruk/clarity/api/ehcache.xml

This file is also provided in the client's JAR file. It provides caches for different entities that can be individually tuned for each class in the API. Any class that does not have a specific cache defined goes into the catch-all cache com.genologics.ri.LimsEntity.

If you do need to change the cache configuration, make a new org/cruk/clarity/api/ehcache.xml that appears in a folder or JAR before the client's JAR file in the class path.

As of release 2.31, the implementation of EhCache used has been moved from version 2 to version 3. This has resulted in a change in how the ehcache.xml file is structured. The file in this project provides an example, and the full documentation can be found on the (EhCache web site)[https://www.ehcache.org/documentation/3.0/xml.html].

Bulk fetch, create and update operations

Real world use has found that the bulk operations for fetching, creating and updating objects from the API has a “sweet spot” for the number of objects in an operation. This is a number that minimises the request time per object. For example, if one tries fetching 10,000 artifacts in one call to the API's “.../artifacts/batch/retrieve” end point it will take longer than making 20 calls for 500 artifacts (500 per call is the optimum number given by Genologics, though testing has shown this can vary between installations).

As of release 2.22, the client library will batch bulk operations with large numbers of objects into one or more calls to the relevant API end point transparently to the user. So we can request our 10,000 artifacts in one call to the client and let that fetch them in the most efficient manner.

The number of objects to retrieve or send in each batch can be tuned with the setBulkOperationBatchSize method on the client. The default is the recommended 500 per call to the API.

Artifacts and their state parameter

The Artifact class provides some complication to the cache by always having a state parameter attached to its URIs. This means a given artifact can be returned in different states depending on the state in the URI.

The cache can be put in one of three modes for dealing with Artifact objects:

EXACT: The version of the artifact requested by the state parameter must match the version in the cache exactly. If it is older or newer than the cached version, the requested version is fetched and replaces that already in the cache.
LATEST: If the version of the artifact requested by the state parameter is newer than the version in the cache, the requested version is fetched and replaces the version in the cache. If it is the same version or older, the newer version in the cache is returned.
ANY: The cached version of the artifact is returned regardless of the version requested by the state parameter.

If an artifact is requested without an explicit state in the URI, the artifact in the cache is returned regardless of its version or the mode of cache operation.

The default mode is LATEST, which is the way the cache behaved in releases of the client before 2.22. Modes can be set in the Spring configuration (clarity-cache-context.xml) or programmatically on the ClarityAPICache bean using the setStatefulBehaviour method.

Overriding the cache behaviour for a single fetch

In actual use, the only time the state parameter is important for an artifact is when one is looking for its QC flag value. No other fields have a history that is preserved by the state: they are always their current database value. Most of the time the cache behaviour is irrelevant, and the LATEST or ANY cache behaviours are most suitable for the performance gain they provide. This is not true when using the post process URI of the input/output maps to obtain the QC flags.

The overrideStateful method of ClarityAPI allows a single call to be made through the API that will fetch the artifact at either the exact version given in the URI or the latest available state.

When overrideStateful is set to EXACT, the client will call through to the Clarity API unless we already have exactly this version of the artifact in the cache. As the call returns the cache follows its normal rules for whether this version of the artifact is cached.

When overrideStateful is set to LATEST, the client will remove the state parameter from the artifact's URI before always calling through to the Clarity API. Again, as the call returns the cache follows its normal rules for whether this version of the artifact is cached. If the client is again instructed to fetch one of the artifact with this same override, the call will go through to the Clarity API again.

The stateful override behaviour is only relevant to artifacts and is in place for exactly one call to the API after the overrideStateful call. It will always be cleared after the next call to the API, regardless of whether it actually has an effect after that call (e.g. calling overrideStateful and then getServer will cause the override to be cleared), so one should put the call to overrideStateful immediately before the call to retrieve, load, loadAll or reload. This behaviour is local to the calling thread, so will not affect calls from other threads.