Caching
So you've got a performance problem, and you're pretty sure that it lies in a bottleneck between your database and your application server. You've used IronTrack SQL or some other tool to analyze the SQL sent between your application and the database, and you're pretty sure that there isn't much advantage to be squeezed from refining your queries. Instead, you feel certain that the problems are due to the amount of traffic between your application and the database. The solution in this case may be a cache. By storing the data in a cache instead of relying solely on the database, you may be able to significantly reduce the load on the database, and possibly to increase overall performance as well.
Understanding Caches
Generally speaking, anything you can do to minimize traffic between a database and an application server is probably a good thing. In theory, an application ought to be able to maintain a cache containing data already loaded from the database, and only hit the database when information has to be updated. When the database is hit, the changes may invalidate the cache.
First-Level and Second-Level Caches
Hibernate actually implements a simple session-level cache, useful on a per-transaction basis. This cache is primarily used to optimize the SQL generated by Hibernate. It is sometimes referred to as a first-level Hibernate cache. For more information on the relationship between a session and the underlying SQL, see Chapter 9.
The JVM and distributed cache discussed in this section is referred to as a second-level cache in other sources. Since you will never need to configure the first-level cache, the discussion in the rest of this chapter will refer to the second-level cache simply as "the cache."
Let's start by looking at Hibernate without a cache, as shown in Figure 10.8. Data is transferred between Hibernate and the database, and transactions are managed by the database. Hibernate assumes that the data in memory should be refreshed on every access (a reasonable assumption, especially if Hibernate does not have exclusive access to the database).
Figure 10.8. Hibernate without a Cache
Figure 10.9 shows Hibernate operating with a single JVM cache used to minimize traffic between Hibernate and the database. This will increase the performance of the application and minimize the load on the database, but at the cost of a bit more configuration complexity (described later in this chapter) and memory usage.
You may wonder how to use Hibernate to perform multithreaded object access and begin pondering strategies for sharing persistent objects across threads. The short answer is: don't! Instead, if you are interested in sharing object data across threads, simply use a cache, as shown in Figure 10.9. If you try to implement your own, the odds are good that you'll have to implement a complex, difficult-to-manage set of thread management, only to end up with cache and concurrency problems.
Figure 10.9. Hibernate with a Cache
Figure 10.10 illustrates a problem that may arise when you use a cache. If your application does not have exclusive access to the database (a common situation in an enterprise environment), your cache can easily become out of sync with the database. If a legacy application updates a record stored in the cache, there is no notification that the data is stale, and therefore the data in the cache will be incorrect.
Figure 10.10. Hibernate and a Legacy System
Multiple SessionFactory Objects
A JVM cache, as described here, is actually a SessionFactory-level cache (see Chapter 9 for more information on the scope of a Session Factory). There is normally no reason not to share a SessionFactory instance throughout your JVM instance, but if for some reason your application uses more than one SessionFactory, you're effectively building a multiple JVM system, and therefore will need to use a distributed cache.
Similarly, if you have multiple JVMs running on a single physical system, that still counts as a distributed system.
Unfortunately, there is no ideal solution to the problem of distributed object cache in conjunction with a legacy system. If your Hibernate application has a read-only view of the database, you may be able to configure some cache system to periodically expire data.
If you are able to control all the access to a particular database instance, you may be able to use a distributed cache to ensure that the data traffic is properly synchronized. An example of this is shown in Figure 10.11. Take care when choosing a distributed cache to ensure that the overhead of the cache traffic does not overwhelm the advantages of the cached data.
Figure 10.11. Hibernate and a Distributed Cache
As a final note, keep in mind that a distributed cache is only one of several possible solutions to a performance problem. Some databases, for example, support an internal distribution mechanism, allowing for the distribution complexity to be entirely subsumed by the database infrastructure (thereby letting the application continue to treat a multisystem database as a single data source).
Configuring a Cache
Applications that perform a large number of read operations in relation to the number of write operations generally benefit the most from the addition of a cache.
The type of cache that would be best depends on such factors as the use of JTA, transaction isolation-level requirements, and the use of clusters. Because of their broad possible needs and uses, Hibernate does not implement caches, but instead relies on a configurable third-party library.
Table 10.5. Supported Cache Environments
Cache |
Type |
URL |
EHCache (Easy Hibernate Cache) |
In Process |
|
OSCache (Open Symphony) |
In Process OR Cluster |
|
SwarmCache |
Cluster |
|
JBoss TreeCache |
Cluster |
Standard Caches
In addition to the open-source caches described above, you may wish to investigate Tangosol Coherence, a commercial cache. For more information, see http://hibernate.org/132.html and http://tangosol.com/.
Table 10.6 shows the proper setting for the hibernate.cache.provider_class property to be passed via the hibernate.properties file to enable the use of a cache.
Each cache offers different capabilities in terms of memory and disk-based cache storage and a wide variety of possible configuration options.
Table 10.6. Specifying a Cache
Cache |
Property Value |
EHCache |
net.sf.ehcache.hibernate.Provider (default) (Easy Hibernate Cache) |
OSCache (Open Symphony) |
net.sf.hibernate.cache.OSCacheProvider |
SwarmCache |
net.sf.hibernate.cache.Swarm CacheProvider |
JBoss TreeCache |
net.sf.hibernate.cache.TreeCache Provider |
Custom (User-Defined) |
Fully qualified class name pointing to a net.sf .hibernate.cache.CacheProvider implementation |
Regardless of which cache you choose, you will need to tell Hibernate what sort of cache rules should be applied to your data. This is defined using the cache tag (as described in Chapter 5). You can place the cache tag in your *.hbm.xml files or in the hibernate.cfg.xml file. Alternatively, you can configure cache settings programmatically using the Configuration object. Table 10.7 shows the values allowed for the usage attribute of the cache tag.
Table 10.7. Cache Options
Option |
Comment |
read-only |
Only useful if your application reads (but does not update) data in the database. Especially useful if your cache provider supports automatic, regular cache expiration. You should also set mutable=false for the parent class/collection tag (see Chapter 5). |
read-write |
If JTA is not used, ensure that Session.close() or Session.disconnect() is used to complete all transactions. |
nonstrict-read-write |
Does not verify that two transactions will not affect the same data; this is left to the application. |
|
If JTA is not used, ensure that Session.close() or Session.disconnect() is used to complete all transactions. |
transactional |
Distributed transaction cache. |
Conceptually, you are using the options in Table 10.7 to set the per-table read-write options for your data.
Some providers do not support every cache option. Table 10.8 shows which options the various providers support.
Table 10.8. Cache Options Supported by Provider
Cache |
read-only |
nonstrict-read-write |
read-write |
transactional |
EHCache |
Yes |
Yes |
Yes |
|
OSCache |
Yes |
Yes |
Yes |
|
SwarmCache |
Yes |
Yes |
|
|
JBoss TreeCache |
Yes |
|
|
Yes |
Java Transaction API (JTA)
According to Sun's documentation, JTA "specifies standard Java interfaces between a transaction manager and the parties involved in a distributed transaction system: the resource manager, the application server, and the transactional applications." In other words, JTA provides for transactions that span multiple application serversa powerful capability for scaling. Covering JTA is beyond the scope of this text (see http://java.sun.com/products/jta/), but you may wish to consult Chapter 9 for more information on transactions.
Using a Custom Cache
Understanding the interaction between a cache and your application can be very difficult. To help make it clearer, we have included below an example cache implementation that generates logging and statistics about your application's use of the cache (as generated by Hibernate).
Needless to say, don't use this custom cache in a production system.
Configuring the Custom Cache
For this test application, set the property hibernate.cache.provider_class=com.cascadetg.ch10.DebugHashtableCacheProvider in your hibernate.properties file.
Custom Cache Provider
Listing 10.3 shows the options for our simple cache provider. Note that the statistical details are tracked for the allocated caches.
Listing 10.3 Custom Cache Provider
package com.cascadetg.ch10; import java.util.Hashtable; public class DebugHashtableCacheProvider implements net.sf.hibernate.cache.CacheProvider { private static Hashtable caches = new Hashtable(); public static Hashtable getCaches() { return caches; } public static String getCacheDetails() { StringBuffer newResult = new StringBuffer(); java.util.Enumeration myCaches = caches.keys(); while (myCaches.hasMoreElements()) { String myCacheName = myCaches.nextElement() .toString(); newResult.append(myCacheName); newResult.append("\n"); DebugHashtableCache myCache = (DebugHashtableCache) caches.get(myCacheName); newResult.append(myCache.getStats()); newResult.append("\n\n"); } return newResult.toString(); } /** Creates a new instance of DebugHashtable */ public DebugHashtableCacheProvider() { } public net.sf.hibernate.cache.Cache buildCache(String str, java.util.Properties properties) { System.out.println("New Cache Created"); DebugHashtableCache newCache = new DebugHashtableCache(); caches.put(str, newCache); return newCache; } public long nextTimestamp() { return net.sf.hibernate.cache.Timestamper.next(); } }
Custom Cache Implementation
Listing 10.4 shows the implementation of our simple cache. It's a pretty dumb cacheit just uses a java.util.Hashtable as the backing store. Of more interest is the use of long values to keep track of the number of accesses to the various cache methods. This can be useful for understanding the kind of access a section of code is generating. For example, you may wish to consider a different approach if your code generates a tremendous number of reads relative to writes.
Listing 10.4 Custom Cache Implementation
package com.cascadetg.ch10; import net.sf.hibernate.cache.CacheException; import net.sf.hibernate.cache.Timestamper; import java.util.Hashtable; import java.util.Map; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; public class DebugHashtableCache implements net.sf.hibernate.cache.Cache { private static Log log = LogFactory .getLog(DebugHashtableCache.class); private Map hashtable = new Hashtable(5000); public void addStat(StringBuffer in, String label, long value) { in.append("\t"); in.append(label); in.append(" : "); in.append(value); in.append("\n"); } public String getStats() { StringBuffer result = new StringBuffer(); addStat(result, "get hits", get_hits); addStat(result, "get misses", get_misses); addStat(result, "put replacements", put_hits); addStat(result, "put new objects", put_misses); addStat(result, "locks", locks); addStat(result, "unlocks", unlocks); addStat(result, "remove existing", remove_hits); addStat(result, "remove unknown", remove_misses); addStat(result, "clears", clears); addStat(result, "destroys", destroys); return result.toString(); } long get_hits = 0; long get_misses = 0; long put_hits = 0; long put_misses = 0; long locks = 0; long unlocks = 0; long remove_hits = 0; long remove_misses = 0; long clears = 0; long destroys = 0; public Object get(Object key) throws CacheException { if (hashtable.get(key) == null) { log.info("get " + key.toString() + " missed"); get_misses++; } else { log.info("get " + key.toString() + " hit"); get_hits++; } return hashtable.get(key); } public void put(Object key, Object value) throws CacheException { log.info("put " + key.toString()); if (hashtable.containsKey(key)) { put_hits++; } else { put_misses++; } hashtable.put(key, value); } public void remove(Object key) throws CacheException { log.info("remove " + key.toString()); if (hashtable.containsKey(key)) { remove_hits++; } else { remove_misses++; } hashtable.remove(key); } public void clear() throws CacheException { log.info("clear "); clears++; hashtable.clear(); } public void destroy() throws CacheException { log.info("destroy "); destroys++; } public void lock(Object key) throws CacheException { log.info("lock " + key.toString()); locks++; } public void unlock(Object key) throws CacheException { log.info("unlock " + key.toString()); unlocks++; } public long nextTimestamp() { return Timestamper.next(); } public int getTimeout() { return Timestamper.ONE_MS * 60000; //ie. 60 seconds }
Cache Test Object
Listing 10.5 shows a simple mapping file used to test our object. In particular, note the use of the cache tag to indicate the type of cache management that should be performed.
Listing 10.5 Simple Performance Test Object Mapping File
<?xml version="1.0"?> <!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 2.0//EN" "http://hibernate.sourceforge.net/hibernate-mapping-2.0.dtd"> <hibernate-mapping> <class name="com.cascadetg.ch10.PerfObject" dynamic-update="false" dynamic-insert="false"> <cache usage="read-write" /> <id name="id" column="id" type="long" > <generator class="native" /> </id> <property name="value" type="java.lang.String" update="true" insert="true" column="comments" /> </class>
Listing 10.6 shows the source generated from the mapping file shown in Listing 10.5.
Listing 10.6 Simple Performance Test Object Java Source
package com.cascadetg.ch10; import java.io.Serializable; import org.apache.commons.lang.builder.EqualsBuilder; import org.apache.commons.lang.builder.HashCodeBuilder; import org.apache.commons.lang.builder.ToStringBuilder; /** @author Hibernate CodeGenerator */ public class PerfObject implements Serializable { /** identifier field */ private Long id; /** nullable persistent field */ private String value; /** full constructor */ public PerfObject(String value) { this.value = value; } /** default constructor */ public PerfObject() { } public Long getId() { return this.id; } public void setId(Long id) { this.id = id; } public String getValue() { return this.value; } public void setValue(String value) { this.value = value; } public String toString() { return new ToStringBuilder(this) .append("id", getId()) .toString(); } public boolean equals(Object other) { if ( !(other instanceof PerfObject) ) return false; PerfObject castOther = (PerfObject) other; return new EqualsBuilder() .append(this.getId(), castOther.getId()) .isEquals(); } public int hashCode() { return new HashCodeBuilder() .append(getId()) .toHashCode(); }
Testing the Cache
Listing 10.7 shows a simple program that tests the cache. If you wish to test this using a larger number of objects, simply change objects = 5 to a higher value.
Listing 10.7 Testing Cache Hits
package com.cascadetg.ch10; /** Various Hibernate-related imports */ import java.io.FileInputStream; import java.util.logging.LogManager; import net.sf.hibernate.*; import net.sf.hibernate.cfg.*; import net.sf.hibernate.tool.hbm2ddl.SchemaUpdate; import net.sf.hibernate.tool.hbm2ddl.SchemaExport; public class CacheTest { static long objects = 5; /** We use this session factory to create our sessions */ public static SessionFactory sessionFactory; /** * Loads the Hibernate configuration information, sets up * the database and the Hibernate session factory. */ public static void initialization() { System.out.println("initialization"); try { Configuration myConfiguration = new Configuration(); myConfiguration.addClass(PerfObject.class); new SchemaExport(myConfiguration).drop(true, true); // This is the code that updates the database to // the current schema. new SchemaUpdate(myConfiguration) .execute(true, true); // Sets up the session factory (used in the rest // of the application). sessionFactory = myConfiguration .buildSessionFactory(); } catch (Exception e) { e.printStackTrace(); } } public static void createObjects() { System.out.println(); System.out.println("createObjects"); Session hibernateSession = null; Transaction myTransaction = null; try { hibernateSession = sessionFactory.openSession(); for (int i = 0; i < objects; i++) { myTransaction = hibernateSession .beginTransaction(); PerfObject myPerfObject = new PerfObject(); myPerfObject.setValue(""); hibernateSession.save(myPerfObject); hibernateSession.flush(); myTransaction.commit(); } } catch (Exception e) { e.printStackTrace(); try { myTransaction.rollback(); } catch (Exception e2) { // Silent failure of transaction rollback } } finally { try { hibernateSession.close(); } catch (Exception e2) { // Silent failure of session close } } // Explicitly evict the local session cache hibernateSession.clear(); } public static void loadAllObjects() { System.out.println(); System.out.println("loadAllObjects"); Session hibernateSession = null; Transaction myTransaction = null; try { hibernateSession = sessionFactory.openSession(); myTransaction = hibernateSession.beginTransaction(); // In this example, we use the Criteria API. We // could also have used the HQL, but the // Criteria API allows us to express this // query more easily. // First indicate that we want to grab all of // the artifacts. Criteria query = hibernateSession .createCriteria(PerfObject.class); // This actually performs the database request, // based on the query we've built. java.util.Iterator results = query.list().iterator(); PerfObject myPerfObject; // Because we are grabbing all of the artifacts and // artifact owners, we need to store the returned // artifacts. java.util.LinkedList retrievedArtifacts = new java.util.LinkedList(); while (results.hasNext()) { // Note that the result set is cast to the // Animal object directly - no manual // binding required. myPerfObject = (PerfObject) results.next(); if (!retrievedArtifacts.contains(myPerfObject)) retrievedArtifacts.add(myPerfObject); } myTransaction.commit(); hibernateSession.clear(); } catch (Exception e) { e.printStackTrace(); try { myTransaction.rollback(); } catch (Exception e2) { // Silent failure of transaction rollback } } finally { try { if (hibernateSession != null) hibernateSession.close(); } catch (Exception e) { // Silent failure of session close } } } public static void main(String[] args) { initialization(); createObjects(); long timing = System.currentTimeMillis(); loadAllObjects(); System.out.println("Timing #1 : " + (System.currentTimeMillis() - timing)); timing = System.currentTimeMillis(); loadAllObjects(); System.out.println("Timing #2 : " + (System.currentTimeMillis() - timing)); timing = System.currentTimeMillis(); loadAllObjects(); System.out.println("Timing #3 : " + (System.currentTimeMillis() - timing)); timing = System.currentTimeMillis(); loadAllObjects(); System.out.println("Timing #4 : " + (System.currentTimeMillis() - timing)); timing = System.currentTimeMillis(); loadAllObjects(); System.out.println("Timing #5 : " + (System.currentTimeMillis() - timing)); System.out.println(DebugHashtableCacheProvider .getCacheDetails()); }
As can be seen from the output of the program shown in Listing 10.7, our simple application was able to cache the results from the first loadAllObjects() method, leading to lower timing values for the remaining access. This is reflected in the statistics for the cache, shown in terms of gets, puts, and so on.
Listing 10.8 Testing Cache Hits
initialization New Cache Created createObjects loadAllObjects Timing #1 : 40 loadAllObjects Timing #2 : 10 loadAllObjects Timing #3 : 0 loadAllObjects Timing #4 : 10 loadAllObjects Timing #5 : 0 com.cascadetg.ch10.PerfObject get hits : 20 get misses : 5 put replacements : 0 put new objects : 5 locks : 25 unlocks : 25 remove existing : 0 remove unknown : 0 clears : 0 destroys : 0