November 23, 2009
Welcome to Part 9 of my series on Caching Enhancements in ColdFusion 9. Today we're going to cover something called dependent template caching. Strange name, I know. If you remember back from Part 7, we said that by default, when you cache a web page or page fragment with the cfcache tag, that page/fragment goes into the cache and is retrieved for any subsequent visits to that page. The same content is returned for everyone. We also covered a method for caching pages based on unique URL parameters such that different versions of the same page (say a product display page) would be cached and retrieved from the cache based on URL parameters. But what about page/page fragments that vary based on other variables that aren't passed in via URL? This is where dependent template caching comes in.
Dependent template caching allows you to specify a variable or list of variables to "watch" for changes. If the value of one of these variables changes from the first page or fragment that was cached, ColdFusion will create a new variant for the page/fragment and store that in the cache as well. This is all handled by using the new dependsOn attribute of the cfcache tag in ColdFusion 9. If you are reading this and wondering where this might be useful, you aren't alone. When I first read about this feature in the ColdFusion docs, I misunderstood the intent of the attribute and how it's supposed to work. Here's what the Coldfusion 9 docs have to say about dependsOn:
A comma separated list of variables. If any of the variable values change, ColdFusion updates the cache. This attribute can take an expression that returns a list of variables.
I think the key to the misunderstanding people have about this feature is in the part that says "If any of the variable values change, ColdFusion updates the cache." To me, updating the cache means replacing an old/expired/changed value with a new one. It's a one for one swap of items in the cache. Out with the old and in with the new. But this isn't what happens when you use dependsOn. What the docs should say is that when a variable value changes, ColdFusion creates a new entry in the cache for the changed item so that both the original page/fragment as well as the new page/fragment are now in the cache. Here's a quick example to illustrate how this works:
<cfloop index="x" from="1" to="5">
<cfif x is 3>
<cfelse x is 5>
<cfcache action="serverCache" dependsOn="#y#" stripWhiteSpace="true">
I'm cached dynamic data: #now()# <br/>
<!--- dump what's in the template cache --->
If you run this code, you should see something that looks like this:
What this code does is create a page fragment and cache it within a loop. The cfcache tag is set to watch a variable called y for changes. The value of y is initially set to true. There's also some conditional code in the loop which waits for the third and fifth iterations of the loop to fire. We'll get to that in just a moment. For now, let's step through each iteration of the loop and discuss what's happening. During the first iteration of the loop, the fragment is added to the cache. During the 2nd iteration of the loop, the fragment is pulled from the cache and displayed. During the third iteration of the loop, the value of x is 3 and our cfif statement fires, updating the value of y to false. Because y is the value we set to watch in the dependsOn attribute of our cfcache tag and it has now changed from true to false, this signals ColdFusion to go ahead and cache the version of the loop output we're now on for iteration 3 of the loop. This is where we end up with a 2nd fragment in the cache, not an update to the existing fragment in the cache. The fourth iteration of the loop also displays the second cached fragment since the value of y is still false. For the fifth and final iteration of the loop, our conditional code within the loop fires again. This time it sets y back to false, a value which we already have a fragment stored in the cache. ColdFusion knows to go grab the fragment for false from the cache and displays it.
There's one other thing to note in here. I didn't think to include this in any of the previous posts on the template cache so I've decided to add it here. If you look at the end of the cfcache tag in our example, you'll notice a parameter you probably haven't seen before: stripWhiteSpace. This is an optional parameter that only works if you are using the template cache to cache page fragments. Setting it to true (it's false by default) tells Coldfusion to strip any unnecessary whitespace from the fragment before storing it in the cache.
While this is a good example of the mechanics of dependent caching, it's not really a practical example. For that, let's consider a real world example where you would want to make use of dependent caching. Say you have an application that requires authentication. The main landing page for the application is personalized based on who is logged in. In this case, you can't cache a single version of the main page as you wouldn't want it to say "Hello Tom" when Mary logs in. Sure you could solve this by passing the username along in the URL, but you probably don't want to do that – who wants to deal with all of the extra validation code to make sure someone doesn't go and change that URL variable to someone else's username. No, in this case, you would probably be using session variables in your application to maintain persistence, and session variables are a perfect use case for dependent caching. Here, we could set dependsOn to watch something that uniquely identifies a user and when that changes (a different user is logged in), the personalized version of the page for them could be added to the cache. Let's take a look at some simple code that implements this idea. The first thing we'll need is an Application.cfc file to setup session management and handle security basics for us:
<cfset this.name = "dependentCaching" />
<cfset this.sessionManagement = true>
<cffunction name="onRequestStart" eeturntype="boolean" output="false">
<cfif StructKeyExists( URL, "logout" )>
<cfset this.onSessionStart() />
<cfreturn true />
<cffunction name="onRequest" returnType="void" output="true">
<cfargument name="Page" type="string" required="true">
<cffunction name="onSessionStart" returnType="void" output="false">
<cfset session.loggedIn = false>
This code gives our application a name and turns on session management. If also has an onRequestStart() method that looks for a URL variable called logout, and if it finds one it fires off the onSessionStart() method, effectively logging the user out be changing the value of session.loggedIn to false.
The onRequest() method handles the check to see if a user is authenticated for a requested page. If session.loggedIn is true, the page they were requesting is included. Otherwise, we assume that the user is not logged in and include the login form instead.
The onSessionStart() method fires at the beginning of a user's session and sets their logged in status to false by default.
Remember that this is just a simple example and for that reason does not contain all of the code you would use to implement something like this in real life (validation checks, error handling, etc.).
The next file we need is our login.cfm page:
<cfset session.loggedIn = true>
<cfset session.userName = form.username>
<!--- send the user back to the main page --->
<cflocation url="index.cfm" addtoken="false" />
Name: <input type="text" name="userName"><br />
Password: <input type="password" name="password"><br />
<input type="submit" name="Submit" value="Submit">
This page is just a simple login form. It firsts checks to see if it was called via a form submit, and if so sets the value of session.loggedIn to true. It also sets another session variable to hold the user's username. After the variables are set, the user is redirected to the main landing page for the application (index.cfm) Again, if this were a real application we would have an actual login check here but for the purposes of this example we're just assuming that any username/password combo is valid.
If the user arrived at the page directly from the onRequest() method of our Application.cfm page, we know that they have not yet logged in so we display a login form for them. When they submit this, the page submits to itself and the code we previously discussed fires, logging the user in and redirecting them to the main application page. Here's the code for the main index.cfm page:
<p>This is your personalized page.</p>
<p>Timestamp: #timeFormat(now(), 'hh:mm:ss')#</p>
There's not a whole lot going on here. All we do is set a cfcache tag at the top of the page telling ColdFusion that we want to cache the contents of the entire page. A timespan of 5 minutes is set just to keep the example from staying in the cache forever. Notice we also set dependsOn=session.username". This is where the magic happens. What we've done is told ColdFusion is that every time a different user tries to call this page, it should first check the cache to see if there's already a page stored for this user and if so, grab and use that version. If not, it should generate a new version of the page and cache that value for later use.
If you want to see this in action, go ahead and open the index.cfm page in your browser. You should be redirected to the login form. You can enter anything you like for the username and password. Once you submit the login form, you should be redirected to a personalized version of the index.cfm page. Note the value of the timestamp.
Now go ahead and click on the logout link. This will clear your session and cause the login form to display again. Try logging in using a different username this time. After submitting, you'll again be redirected to a personalized version of index.cfm.
Logout again and repeat the process again but this time use the username you entered the first time. When you submit and are taken to the index.cfm page you should notice that the value of the timestamp is the same as the first time you logged in as this user. This is because ColdFusion saw that session.userName changed and found a page in the cache that corresponded to the username you logged in with (the username becomes part of the key for the page in the template cache). If you want to see that there are two distinct pages in the cache, just create a new ColdFusion page in the same directory as the rest of your application and add dump the template cache using this code:
You'll end up with something like this:
As you can see, each individual user has their own copy of the index.cfm page in the cache thanks to dependsOn.
I hope these examples were straightforward and useful enough to demonstrate the usage and power of dependent caching in ColdFusion 9. This is the last post on the template cache I have planned for the series. In Part 10, we'll start to take a look at the object cache in ColdFusion 9 before moving on to more advanced topics.
November 21, 2009
When using the template cache in ColdFusion 9, you have two main options for getting pages and page fragments out of the cache – time based expiry and flushing. This blog post covers both. You should not that all of the examples here cache full pages. In most cases, the techniques discussed can be applied equally to page fragments.
Expiring Items in the Template CacheIt's also possible to set expiry periods for items in the template cache. Here, you have two optional parameters built in to the cfcache tag to help you expire pages and fragments from the cache based on time periods. The two parameters you can use are idletime and timespan. Idletime lets you specify a period of time after which to flush the cache if the cached item has not been accessed. In other words, if the cached item hasn't been accessed in the time period specified by idletime, the item will be removed from the cache. Here's an example that caches a page and will flush it after 30 seconds of inactivity:
I'm dynamic. The time is currently #timeFormat(now(),'hh:mm:ss')# </cfoutput>
If you run this code and then run it again, you'll see that the timestamp for the page is cached. Now wait for 30 seconds or so and reload again. You should see that the time has updated.
The timespan parameter lets you specify a period of time after which the cached item should be flushed regardless of whether it's ever been accessed or not. This basically lets you say "keep this page in the cache for 30 seconds". Here's an example:
This code sets a timespan of 30 seconds. If you run the code then run it again, you'll see that the timestamp gets cached. Go ahead and keep hitting the reload button on your browser every few seconds. After 30 seconds have gone by, the timestamp will update as the old content is flushed from the cache.
Flushing Items from the Template CacheIn addition to time based expiry, you can also manually flush both pages and page fragments from the template cache using the cfcache tag. Here's an example:
There are a couple of things to note here. First, when you call this code, it flushes all templates in the template cache for your application, not just the page you call it from. Let me repeat that. Using the cfcache tag with action="flush" causes the template cache to flush all of the content for the current application. You need to be very careful using this as it's possible that on a high traffic site with a large template cache you could bring your server down when hundreds or thousands of requests hit at the same time for uncached data, causing your system to queue up requests while it's busy rebuilding the cache. In most cases what you'll want to do is to flush a single page or perhaps a group of related pages from the template cache.
So, how do you go about flushing a single page or a group of related pages from the template cache? There's another parameter of the cfcache tag you can use called expireURL just for this purpose. Let's consider an example of how we would use this:
<cfcache action="flush" expireURL="*.cfm?productID=#URL.productID#">
<cfparam name="URL.productID" default="0">
<cfcache action="serverCache" usequerystring="true" timespan="#createTimeSpan(0,0,5,0)#">
Welcome to the page for product ID #URL.productID#<br />
Timestamp: #timeFormat(now(), 'hh:mm:ss')#
Go ahead and run this example in your browser. You should have something on your screen that looks like this:
Welcome to the page for product ID 0
Reloading the page should return the same timestamp over and over as the template is cached after the first call. Now try adding the URL parameter ?productID=72 to the URL string in your browser and reload the page a few times to get the new version of the page into the cache. Try changing the value of productID a few more times, each time reloading the page so that we get a handful of pages in the template cache. Now try dumping the contents of the template cache by running this code in a separate ColdFusion file:
You should see a bunch of different entries – one for each unique productID you provided in the URL. Now go ahead and call the code we've been working with for this example, only this time append ?productID=72&flush to the URL. Go ahead and reload the page with those URL parameters a couple times. Notice the timestamp updating each time you reload? That's because passing in the URL parameter flush calls the ColdFusion page to call the cfcache tag with expireURL="*.cfm?productID=#URL.productID#". The expireURL parameter lets you specify a wild carded URL pattern such that only pages that match the pattern get flushed. This allows you to get as generic or as specific as you want in determining exactly what pages to flush. In this case we're telling ColdFusion to go ahead and flush any cached .cfm pages that have the URL parameter ?productID equal to the value that we pass in on the URL.
If you dump the contents of the template cache you'll notice that there's still an entry for the item(s) we just flushed. That's because the code in the example we've been running repopulates the cache immediately after the flush. If you didn't want the cache repopulated immediately after flushing the content, you could just as easily locate the code to flush the cache elsewhere and just remove the item.
If you want to remove a group of related pages from the cache, say all of the pages that have a productID, you could modify the code to wildcard the productID parameter like so:
This should result in the removal of all pages from the cache that have a productID URL parameter. Note that if you run this right now in ColdFusion 9.0 it will not work. There is a bug in ColdFusion 9.0 with the wildcard feature. If you put the flush code in a separate template and run it multiple times, what you'll see is that one page at a time is flushed from the cache instead of all of the pages that match the URL pattern at once. Credit goes to Aaron West for catching this bug.
That about wraps up everything I have to say about expiring and flushing pages and page fragments from the template cache. In Part 9 we'll talk about one last feature of the template cache – Dependent Caching.
November 20, 2009
In Part 5 of this series, we introduced the template cache with a quick example that showed how to cache page fragments. As we mentioned then, the template cache can be used to cache page fragments as well as entire web pages. This post is broken up into three sections that explore the three typical use cases of the template cache: Caching entire wen pages, caching entire web pages with url parameters and caching page fragments.
Caching Entire Web PagesTo cache an entire web page, simply place a single cfcache tag at the top of you page. Here's an example:
I'm some text on a page<br />
I'm dynamic. The time is currently #timeFormat(now(),'hh:mm:ss')# <br />
I'm some more test on a page
With an open cfcache tag at the beginning of the page, this code tells ColdFusion to go ahead and write the entire page to the template cache. If you haven't modified your cache configuration from the default ColdFusion install, this item will live in the cache for 24 hours (86400 seconds) unless you restart your JVM by restarting your ColdFusion server or you manually flush the cache first, which you can do by using the cfcache tag with action="flush" which we'll talk about in the next part of this series.
As I mentioned in Part 5 of this series, the ColdFusion template cache acts as a black box. You don't really get to see how it magically does gets and puts of your content or how it generates keys for the data going into the cache. What I didn't mention before is that there's an undocumented function in ColdFusion 9.0 called getAllTemplateCacheIDs() that will show you all of the keys for items that ColdFusion puts in the template cache. There isn't really a lot you can do with this information since neither the cfcache tag nor any of the cache functions allow you to specify a key for items in the template cache. However, the function is useful for showing how ColdFusion handles general ColdFusion pages, pages with URL parameters and page fragments differently within the cache.
When ColdFusion takes a regular old ColdFusion page and caches it in the template cache, it generates a unique key for it based on the URL and a UUID appended to it. Here's an example of a file called cache_entire_page.cfm:
I'm dynamic. The time is currently #timeFormat(now(),'hh:mm:ss')# </cfoutput>
If you run this page, ColdFusion puts it in the template cache as you would expect. To see the key that ColdFusion automatically generates for it, create a new ColdFusion page with this code in it:
If you run that newly created page, you should see a dump of all the keys in the template cache for your application. It should look something like this:
Caching Pages with Unique URLsBy default, URL parameters for ColdFusion pages stored in the template cache are ignored. If you want to see this in action, try adding ?id=10 to the URL of the cache_entire_page.cfm file we used for our previous example. After you've added the URL parameter and reloaded the page, go ahead and dump the contents of the template cache again. What you'll notice is that it contains the same exact key as the page without the URL parameter. Obviously this isn't good if your application has what it considers to be unique pages that need to be cached individually based on URL parameters. Luckily, the cfcache tag gives you a way to handle this. All you need to do is add useQueryString="true" to your cfcache tag and that will tell ColdFusion to treat pages that have URL parameters as unique pages when it caches them. Go ahead and save this code as a new ColdFusion page:
<cfcache action="serverCache" usequerystring="true">
Welcome to the page for product ID #URL.productID#<br />
Timestamp: #timeFormat(now(), 'hh:mm:ss')#
Now try calling this page with no URL parameter passed to it. Right after you've done that, append ?productID=55 to the URL and call it again. Once you've done this, run the sump of all the keys in the template cache and you should have output that looks like this:
What you should notice right away is that there are now two new keys being returned for the page you just called twice. The first key looks just like the key from our last example. It contains just the template name and a UUID. The other key, however, contains something new. In addition to the URL for the template and the appended UUID, it also contains the query string as part of the key name. As you can see, by using the useQueryString parameter of the cfcache tag, we've instructed ColdFusion to treat pages with URL parameters as unique when it places them in the template cache.
Caching Page FragmentsWhen ColdFusion caches a page fragment, it uses a combination of the page's file name as well as the position of the actual code block that's surrounded by cfcache tags in your ColdFusion file. Each page fragments gets its own entry in the cache since fragments can be independently expired or flushed from the cache. Consider the following example:
This is a fragment
And so is this #now()#
If you execute this, you'll get some output as well as a dump of all of the cached items in the template cache. At this point, there are two items in the cache:
Now say you weren't happy that the two lines of output on your page were all run together and you wanted to put a line break between each one. You might modify your code and add in a simple paragraph tag or line break like so:
This is a fragment
And so is this #now()#
Go ahead and run the code once you've inserted the
tag. What you'll now see is that you have three items in the cache:
What's happening is that ColdFusion is using the position of the code in your page as part of the ID for the cached item. When you inserted the
tag, the position of the fragment you cached also changed by moving down a few lines, and ColdFusion assumed you had added a new page fragment that you wanted to cache.
This may or may not be a big deal to people as CF will still pull the correct cached item every time. The gotcha is that if you have a lot of caching going on, and you're making a lot of changes in a development environment, it's possible to fill your cache up with a lot of junk pretty quickly – just something to be aware of with the template cache when working with page fragments.
Now that we've covered the basics of the template cache, keep an eye out for my next post where I'll cover the ins and outs of updating items in the template cache including techniques for time based expiry, cache flushing, and dependent caching.
November 17, 2009
In Part 5 of this series, we mentioned that Ehcache could be configured at runtime via ColdFusion code as well as by using an XML configuration file called ehcache.xml.
On a Java EE install of ColdFusion 9, you can find the ehcache.xml file located here:
There are two main sections of the file you need to be concerned with for basic configuration. The first section you should take a look at is the DiskStore configuration:
This tag tells Ehcache where it should write cache files if you have the cache configured to overflow to disk or to persist to disk. We'll get into the specifics of those options, but for now it's important to know that by default Ehcache will use your Java temp directory to store cache files if it is configured to do so. On Windows, the Java temp directory is located in c:/windows/temp. You can change the value here to any drive/directory on your system if you wish to use a location other than the Java temp directory.
The next section to take a look at comes at the end of the ehcache.xml file. Skip on down to the very end where you should see a block of XML that looks like this:
This block of XML tells ColdFusion how to configure all of the Object and Template caches that are automatically created for your application. When a cache is automatically created, the name for the cache is also automatically created using the convention appnameOBJECT for object caches and appnameTEMPLATE for template caches. Each cache has a number configurable parameters:
- maxElementsInMemory: Sets the max number of objects that will be created in memory. Once this limit is reached, the cache will either overflow to disk (if overflowToDisk is set to true), or the appropriate eviction policy will be executed against the cache to make enough room for the new item(s) being added.
- eternal: Sets whether elements are eternal. If eternal is set to true, timeouts are ignored and the element is never expired.
- timeToIdleSeconds: Sets the time to idle for an element before it expires.
- timeToLiveSeconds: Sets the time to live for an element before it expires.
- overflowToDisk: Sets whether elements can overflow to disk when the memory store has reached the maxElementsInMemory limit.
- diskSpoolBufferSizeMB: Specifies the spool buffer size for the DiskStore, if enabled. Writes are made to the spool buffer before they are asynchronously written to the DiskStore.
- maxElementsOnDisk: Sets the max number of objects that will be maintained in the DiskStore
- diskPersistent: Whether the disk store persists between restarts of the Virtual Machine.
- diskExpiryThreadIntervalSeconds: Specifies the interval (in seconds) between runs of the disk expiry thread.
- memoryEvictionPolicy: Policy to enforce upon reaching the maxElementsInMemory limit (LRU, LFU, FIFO).
If you make changes to any of these parameters, ColdFusion will apply them to any new caches that it automatically creates. If you have disk persistence or overflow to disk turned on, two files will be written to your file system per cache, an index file and a data file. For an object cache you would get appnameOBJECT.index and appnameOBJECT.data.
If you want to see what properties have been set for your application's cache, you can do so using the cacheGetProperties() function. The function takes a single optional parameter that specifies the type of cache to return the properties for. Options are Template or Object. If you don't specify the cache type to return properties for, ColdFusion returns them for both cache types. Here's an example that dumps the properties for both the default Object and Template caches:
This will result in output that looks like this:
As you can see from the screen shot, the structure keys correlate to parameters form the ehcache.xml file with two notable exceptions. Both diskSpoolBufferSizeMB and diskExpiryThreadIntervalSeconds are not reported on as properties that can be changed programmatically at runtime.
If you wish to change any of these properties programmatically, you can do so using the cacheSetProperties() function. This function takes a single argument – a structure containing all of the properties that should be configured. You can configure any of the following parameters:
- objectType: Specifies the cache type: Object, Template, or All
- diskStore: Supposed to specify the location of the DiskStore for disk based caching but this is currently not working as of ColdFusion 9.0.
- diskPersistent: Whether the disk store persists between restarts of the Virtual Machine. True|False
- eternal: Sets whether elements are eternal. If eternal, timeouts are ignored and the element is never expired. True|False
- maxElementsInMemory: Sets the max number of objects that will be created in memory. Integer
- maxElementsOnDisk: Sets the max number of objects that will be maintained in the DiskStore. Integer
- memoryEvictionPolicy: Policy to enforce upon reaching the maxElementsInMemory limit: LRU, LFU, FIFO
- overflowToDisk: Sets whether elements can overflow to disk when the memory store has reached the maxElementsInMemory limit: True|False
- timeToIdleSeconds: Sets the time to idle for an element before it expires: Integer number of seconds
- timeToLiveSeconds: Sets the time to live for an element before it expires: Integer number of seconds
Remember, this only applies to caches automatically created by ColdFusion. If a cache doesn't yet exist and you call cacheSetProperties(), ColdFusion will automatically create the cache for you based on whether you are setting properties for an Object cache, a Template cache, or both.
The following code shows how to build the structure of parameters necessary to configure a cache and set the cache properties using the cacheSetProperties() function:
<cfset myProps.diskstore = "c:/temp"> <!--- in the docs, but not currently implemented --->
<cfset myProps.diskpersistent = "true">
<cfset myProps.eternal = "false">
<cfset myProps.maxelementsinmemory = "5000">
<cfset myProps.maxelementsondisk = "100000">
<cfset myProps.memoryevictionpolicy = "LRU">
<cfset myProps.objecttype = "Object">
<cfset myProps.overflowtodisk = "true">
<cfset myProps.timetoidoleconds = "86400">
<cfset myProps.timetolivesecond = "86400">
<!--- update the cache properties --->
It's also possible to create more caches than just the default template and object caches that ColdFusion creates automatically for you. This can be achieved by defining them in your ehcache.xml file or at runtime using the cfcache tag. To configure a new cache region in your ehcache.xml file, you would do so like this (place this before or after the defaultCache block in your ehcache.xml file):
As you can see, the only difference between this code and the code for the default cache is that you give the cache region a name using the name parameter.
You should know that neither cacheGetProperties() nor cacheSetProperties() can be used to configure the properties for a custom cache in ColdFusion 9.0. Hopefully this is a feature that will be added in a future version of ColdFusion.
If you want to read from or write to a custom cache, you can only do so using the cfcache tag. Here's an example:
the custom object cache --->
<!--- if the item isn't there, it'll return null.
In that case, run the query and cache the
results and rerun the data from the db instead --->
<cfquery name="getArtists" datasource="cfartgallery">
<!--- dump the query from cache --->
<!-- dump the cache meta data --->
This code is almost identical to the code we wrote in Part 5 where we introduced the object cache (don't worry about the extra metadata we're pulling using cfcache. We'll cover that later). The only real difference is that here we specify a name for our cache in both cfcache tags by defining it in the key attribute. Key allows us to specify a custom name for our cache. If a cache by that name hasn't been configured in your ehcache.xnl file, ColdFusion will automatically create it using the parameters set in the default cache settings. Remember, you can only work with custom caches using the cfcache tag. None of the cache functions in ColdFusion 9.0 allow you to specify the cache name you want to apply the function to. Go ahead and run the code a few times to verify it's pulling from the custom cache.
There's a lot more advanced stuff that can be configured in the ehcache.xml file such as cache clustering. These topics deserve their own posts, which we'll get to soon. For now, thanks for sticking with the series and I hope you've learned a little more about the ins and outs of cache configuration in ColdFusion 9.
September 21, 2009
So far in this series, we've covered why you would want to cache, what to cache and when, and basic caching architectures. In part 4 of this series, we're going to talk about caching strategies and eviction policies.
A caching strategy is nothing more than an architectural decision on how you're going to manage putting data in and retrieving data from your cache and the corresponding relationship between the cache and your backend data source. There are two main caching strategies you need to be aware of, deterministic and non-deterministic.
A non-deterministic caching strategy involves first looking in the cache for the object or data you want to retrieve. If it's there, your application uses the cached copy. If it's not there, you must then query the backend system for the object or data you want to retrieve. This is by far the most popular caching strategy as it's relatively simple to implement and is very flexible.
A deterministic caching strategy is one in which you always go to the cache for the object or data that you need. It's assumed that if it's not in the cache, then it doesn't exist. This strategy requires that your cache be pre-populated with data as there's no mechanism for a cache miss to query the backend system for the missing object or data.
Both deterministic and non-deterministic caching strategies have their pros and cons. For non-deterministic caching, the upside is that it's simple to implement in code and you have a lot of flexibility in how you do this. The downside to this caching strategy is an issue called stampeding requests, otherwise known as the dog pile. This occurs, usually under load, when a cache miss results in multiple threads simultaneously querying the backend system for the missing cache data. Under this scenario, it's very easy to overwhelm the backend system with requests as the database struggles to fetch the data and repopulate the cache. There are various ways that you can code around this, which we'll discuss later on in another blog post. For now, it's just important to realize that it can happen.
For the purposes of the rest of this discussion as well as the rest of the series, we'll be focusing on non-deterministic caching. That said; let's now turn our attention to cache eviction algorithms. Think of a cache like a box. A box has a limit on how much stuff it can hold before things start falling out when you try to pile on more. A cache is the same way when it comes to the objects and data you store in it – eventually it runs out of room.
Cache eviction policies can be broken down in to two categories: time based and cost based. Time based policies let you associate a time period or an expiration date for individual cache items. This lets you do things like keep an item in the cache for 6 hours, or 30 days, or until December 15, 2040 at 10:00pm. When a request is made to a cache that contains items with time based expirations, the cache first checks to see if the item is expired. If it is, the item is evicted from the cache and is not returned to the operation that called it (most caches simply return null).
Cost based eviction policies work a little differently. A cost based eviction policy doesn't kick in until a cache is full and needs to kick some items out (evict) before allowing new ones in. Most caches give you several cost based eviction policies to choose from. In this scenario, when you attempt to put a new item in the cache, the cache first looks to see if it's full. If it is, it runs whatever cost based eviction policy has been set for the cache and evicts the appropriate item(s). The following are some of the most common cost based eviction policies you'll encounter:
First In First Out (FIFO): The first item that was placed in the cache is the first item to be evicted when it becomes full. It's essential to remember that the first item in the cache is not necessarily the least important. If the first item in your cache is also the most frequently accessed item you might want to think twice about implementing an eviction policy that would result in evicting it from the cache first in the event the cache fills up.
Least Recently Used (LRU): This policy implements an algorithm to track which items in the cache are the least frequently accessed. Various cache providers implement this algorithm in different ways but the result is that the items in the cache that haven't been used in a while are evicted first.
Less Frequently Used (LFU): This algorithm is unique to Ehcahe. It uses a random sampling of items in the cache and picks the item with the lowest number of hits to evict. The Ehcache documentation claims that an element in the lowest quartile of use is evicted 99.99% of the time with this algorithm. In a cache that follows a Pareto distribution (20% of the items in the cache account for 80% of the requests) this algorithm may offer better performance than LRU. For more detailed discussion of various cache eviction algorithms, see the cache algorithms page on Wikipedia.
That's about it for this post on caching strategies and eviction policies. In Part 5 of this series, we'll finally start to take a look at caching in ColdFusion including what's always been there and what's new in ColdFusion 9.
A quick little plug: If you're heading to Adobe MAX 2009 in LA this October and want to know more about caching in ColdFusion 9, check out my session on Advanced ColdFusion Caching Strategies where I'll be covering a lot of what's already been discussed on my blog as well as a whole bunch of new material. I hope to see you there!
September 3, 2009
Welcome to Part 3 in my series on Caching Enhancements in ColdFusion 9. In Part 2, we talked about caching granularity. This time around, were going to spend some time discussing caching architectures. When talking about caching architectures, it's important to understand the type of cache being referred to. Basically, caches come in two flavors: in-process and out-of-process.
An in-process cache operates in the same process as its host application server. As I mentioned in Part 1 of this series, the new caching functionality in ColdFusion 9 is based on an implementation of Ehcache. Because Ehcache is an in-process caching provider that means that the cache operates in the same JVM as the ColdFusion server. The biggest advantage to an in-process cache is that it's lightning fast as data/object serialization is generally not required when writing to or reading from the cache. On the other side of the coin, in-process caches have limitations that you need to be aware of when it comes to system memory - particularly if you're on a 32-bit platform or a system that's light on RAM. On 32-bit systems, the JVM is typically limited to between 1.2GB and 2GB of RAM, depending on platform (although some 32-bit JVM's running on 64-bit systems may be able to use up to 4GB of RAM). Because you have to share this with your application server, that leaves considerably less RAM available to your cache.
In-process caches can be scaled up by adding more RAM, but not out by adding more servers as each cache is local to the application server's JVM it's deployed with. We'll discuss this in more depth when we talk about clustered caching. When using an in-process cache you always need to be aware of the number of items you'll be caching and how much RAM they take up to avoid a sudden spike in cache evictions if the available memory to both your application server and cache tops out. Fortunately for ColdFusion, Ehcache can be configured so that it fails over from RAM based storage to disk in the event that the cache fills up.
Out-of-process caches, like their name suggests, run outside of the same process as the application server. In the Java world, they run inside their own JVM. Out-of-process caches tend to be highly scalable on both 32-bit and 64-bit platforms as they scale both out and up. If you need to scale an out-of-process cache, you simply install more instances of the cache on any machines with spare RAM on your network. The main drawback to out-of-process caches is speed. Data and objects being written to and read from an out-of-process cache must be serialized and deserialized. Although the overhead for doing so is relatively small, it's still considerable enough to have an impact on performance.
Although Ehacahe itself is not an out-of-process cache, it does come with something called Ehcache Server which is available as a WAR file that can be run with most popular web containers or standalone. The Ehcache server has both SOAP and REST based web services API's for cache reads/writes. Another example of an out-of-process cache is the ever popular Memcached.
Now that we've covered the basics of in-process and out-of-process caches, it's time to make things a little more complicated by adding distributed caching and cache clustering to the mix. My experience over the last few years with caching has been that the term distributed tends to be a catch-all for what most would consider a true distributed cache as well as for a clustered cache. Confused yet? Let me attempt to clarify. Most of you are probably already familiar with how clustering works. In the application server world, you take an application server such as ColdFusion and you deploy it on two or more identically configured machines (or you can deploy multiple instances to one or more machines) which you then tie together through hardware and/or software. The result is that you are able to distribute load to your application across multiple servers which allows you to scale your application out. Need to be able to support more users? Add more servers to the cluster. It's the same for caching. If you have an in-process cache, you can't make the cache hold more items
When it comes to cache clustering, the primary reason for doing so is usually that you already have or are planning to deploy your application on a cluster. If you have a clustered application that needs to make use of caching, the first problem you face is that each application server has its own in-process cache which is local to the server. If Server A writes a piece of data to its in-process cache, that data is not available to Server B. This might not be a big deal for some clustered applications that implement sticky sessions, have light load or have data that doesn't necessarily need to be synchronized, but it becomes a serious problem for clusters that are configured for failover, have heavier load, or have cached data that needs to be in synch across every server in the cluster. In these instances, standalone in-process caching doesn't work well. The solution is to cluster your in-process caches as well as your application server. In the case of ColdFusion 9, the underlying Ehcache implementation fully supports caching. When configured, each local cache automatically replicates its content via RMI, JMS, JGroups, TerraCotta, or other plugable mechanisms to all other caches specified in the configuration. There's a small amount of latency while the data replicates but it's negligible in all but the most extreme use cases. I have set this up, tested, and verified it works with the ColdFusion 9 implementation. I'll put up a detailed post of exactly how to do this in a future blog post. The important thing to understand here is that clustering of in-process caches gets you redundancy, but the limit on the size of a single cache is still the limiting factor on scalability (e.g. if the cache you want to cluster has a limit of 500MB of data, clustering the cache between two servers means you are still limited to that 500MB of data in the cache, only now it's stored on two different servers).
Distributed caching differs from clustered caching in that a distributed cache is essentially one gigantic out-of-process cache spread across multiple machines. If you think of a clustered cache as comparable to a clustered application server then a distributed cache is much like a computing grid. Whereas a clustered cache gets you redundancy, a distributed cache gets you horizontal scalability with respect to how much data or how many objects can be put in the cache. Different distributed caching providers handle the exact caching mechanics differently, but the basics remain the same. If you need redundancy in a distributed cache, many distributed caching providers, including Ehcache Server let you cluster distributed cache nodes. The following diagram illustrates how a distributed, out-of-process cache cluster using Ehcache Server might look.
You should note that this is just one of many possible configurations. Using a combination of hardware and software it's possible to build out some pretty sophisticated caching architectures depending on your performance, scalability and redundancy requirements. It's even possible to create hybrid in-process/out-of-process architectures using solutions such as Terracotta.
That's about it for caching architectures. If you want to learn more, a fantastic resource is the website High Scalability. I hope you continue to find this series helpful. In Part 4 we'll cover our last foundation topic - the basics of caching strategies, before moving into ColdFusion 9's specific Ehcache implementation.
July 29, 2009
In Part 1 of this series, I talked about what caching is and why you would want to consider it as part of your application design. In this post, I'm going to spend some time talking about caching granularity. Caching granularity is just a fancy way of saying "what to cache". Before we go further, let's take a look at various caching opportunities you have when architecting an application:
As you can see, there are quite a few places where you can implement caching. For the purposes of our discussion, we're going to focus only on caching at the ColdFusion application server level. We'll take a look at what you can cache within your applications as well as the pros and cons associated with each item. There are 5 basic items to consider for caching at the application server level:
Data - Most ColdFusion developers have cached data at some point or another. In it's simplest form, caching data is nothing more than taking a simple value like a username or some other data type such as a structure or list and sticking it in a shared scope variable in the application, session, client or server scope.
- Easy to implement
- Easy to invalidate individual data elements
- Most data still needs to be manipulated before it can be rendered - especially values stored in lists, arrays and structs.
Query Result Sets - Another popular technique familiar to ColdFusion developers is query caching. I don't think I know a ColdFusion developer who doesn't make regular use of this feature. This was one of the earliest caching enhancements made to ColdFusion and it's dead simple to implement. In fact, it's as simple as simple as adding one of two possible attributes to the cfquery tag (cachedwithin or cachedafter). Here's an example that caches query results for 60 minutes:
- Simple to implement
- Will provide performance gain in many cases
- ColdFusion 8 added support for cfstoredproc and cfqueryparam
- No visibility into the cache
- Difficult to invalidate single cached queries
- Clearing the entire query cache does it for the entire server
- Recordsets still need to be processed before being displayed. This can have serious consequences for CPU and memory.
- Storage of an entire recordset when only partial data will be used
- Cache miss results in re-execution of the query. Can lead to the "dog-pile effect", which we'll cover in a later post.
It's also possible to cache query results in ColdFusion by assigning the result set of a cfquery operation to a shared scope variable such as a session or application variable. There are also additional pros and cons to using this method:
- Allows for more granular control over cached items
- Requires programmatic cache management
Objects - Objects in ColdFusion can refer to native CFC based objects or those instantiated through other technologies such as COM, CORBA and Java. Until ColdFusion 9, the only way to natively cache an object in ColdFusion was to place it in a shared scope variable such as an application or session variable. We'll talk about how this changes in ColdFusion 9 in a later post. For now, consider the pros and cons of caching objects.
- Objects can represent complex relationships that may be impossible or at the least very expensive to compute at the data tier
- Objects may need to be serialized/deserialized depending on the caching mechanism being used.
- Requires programmatic cache management
Partial Page Content (Fragments) - Caching partial page content is something that's always been possible in ColdFusion but has never been elegant - until ColdFusion 9. Prior to version 9, you could cache part of a page by using the cfsavecontent tag and caching the enclosed content in a shared scope variable such as an application or session variable. There are also several custom tag based solutions that achieved the same thing (always storing in a shared scope variable).
- Allows you to cache sections or fragments of content
- Multiple cached fragments can be used within a single page.
- Works well in situations where pages are made up of customized content, but the content itself is not necessarily unique
- Requires programmatic cache management
Entire Web Pages - The final type of content to consider caching is the entire web page generated by ColdFusion. In terms of pure performance, this is the most desirable item to cache. Realistically, though, it's often impossible to cache entire web pages because of the amount of dynamic content on a a page, or because the page is updated too frequently. Caching entire ColdFusion generated pages goes back pretty far in the language history and has been supported via the cfcache tag. The main issue in versions of ColdFusion prior to ColdFusion 9 has been that the cfcache tag has always cached full pages to disk for server side caching. While this would be ok for static files served up by your web server, disk based caches are relatively slow for application servers when compared to RAM based caches. A secondary issue with cfcache pre-ColdFusion 9 is that there was not fine grained control over the cache making cache management difficult at best. All that changes in ColdFusion 9, of course.
- Provides for the fastest performance
- Won't work for pages with lots of customized content (see partial page caching)
- May be problematic if the page content is updated too frequently
Now that we've discussed what you can cache, here are a few additional tips worth considering:
Cache as close to the final state as possible
- E.g. don't cache a recordset if you'll ultimately use it to build a dropdown box
- Cache entire pages whenever possible
Cache to static files whenever possible and let your web server serve the files
- Works well for content that rarely changes
- For dynamic sites, look to other options
Be mindful of cache size
- May limit what/how much you can cache
I hope this has given you a good overview of the types of things that can be cached in ColdFusion. The next post in this series will introduce caching architectures.
July 21, 2009
One ColdFusion 9 feature I haven't heard much buzz about but I think has the potential to really enhance high performance and large scale ColdFusion applications is caching. ColdFusion has always had caching capability, but more often than not they've been black boxed, giving the developer limited control and visibility over the process. All that changes in ColdFusion 9 with a major overhaul of the cfcache tag. The biggest single enhancement here is the implementation of the popular distributed caching provider Ehcache under the covers. What this means is that ColdFusion now implements one of the most popular and certainly one of the fastest caching mechanisms available for Java.
Before I get too deeply into configuration and code, I want to take a little time to talk about caching theory, strategy, and patterns. Ehcache changes the caching game in ColdFusion, and a lot of the knowledge we have as ColdFusion developers about caching is no longer relevant. Some of it in fact is just plain problematic, and I hope to shed some light on those issues and talk about how Ehcache helps solve those problems as well as gotchas to look out for when implementing large caching systems.
Just so that we're all on the same page, let's start with a definition of caching as found on Wikipedia:
"...a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (owing to longer access time) or to compute, compared to the cost of reading the cache."
There's two important concepts here. First is that cached data is duplicate data. The second is that we're going to duplicate it where it would otherwise be expensive to compute it or fetch it relative to how quickly it can be grabbed from the cache. Keep these two things in mind as we continue through this post.
When a lot of people talk about caching, they talk about it in terms of performance. You may want to cache a particular web page because it's slow to load, or perhaps you want to cache the stats shown on a particular page because it takes a long time to run the query that crunches the numbers you're going to display. These are both valid cases where using cached data can speed up the performance of your application. What I find to be a more compelling use case, though, is caching for scalability. What I mean by caching for scalability is using cached data to reduce the load on critical resources such as the database, app server, web server, network, or client. At each of these phases there's an opportunity to use cached data to allow you to do more with less. What's really cool here is that a byproduct of caching for scalability tends to be increased application performance.
Let's look at an example involving the database. Say for example your database is capable of handling 100 requests per second. Now what if you need to be able to handle more requests? One option would be to throw more hardware at the problem - increase the amount of memory available to the server, add more processors, or maybe even add a 2nd or 3rd database server to cluster and distribute the load. That's certainly one option, but it's also expensive and potentially complicated to manage. A second option would be to cache the data you're requesting. Let's assume you're able to cache the data such that you achieve a hit ration of 90% (hitRatio = hits/(hits+misses)). That is, 9 out of every 10 requests for data go to the cache instead of to the database (certainly doable in most circumstances). What you've now gone ahead and done is effectively reduced your database load to 10 requests per second. This means that the same database with the addition of a cache is now able to scale by a factor of 10. That's a pretty significant increase in scalability.
That's it for Part 1 of this series. Stay tuned for Part 2 where I'll discuss what to cache and why. If you're planning to be at Adobe MAX 2009, stop by my session on Advanced ColdFusion Caching where I'll be talking about this as well as all of the great new Caching features in ColdFusion 9 in a lot more depth.
July 14, 2009
In a previous entry, I introduced ColdFusion 9's new RAM based virtual file system (VFS). One question that came up over and over again was how long files stored in the VFS persist. That's a pretty straightforward question. The answer, however, isn't quite as simple. Right now, as of the initial public beta of ColdFusion 9, files saved to the VFS persist until server restart. Obviously this isn't an ideal situation in all cases.
There are a few items you should consider when working with the VFS in ColdFusion 9 when it comes to file and directory persistence as well as security. First, a directory must first be created in the VFS before you can write to it.
<cfdirectory action="create" directory="ram://myDir/mySubDir">
<cffile action="write" output="#content#" file="ram://myDir/mySubDir/foo.txt"/>
By default, any directory and any file in the VFS can be read or written to by any .cfm page or CFC. If you need to create a secure VFS environment, you can do so using sandbox security through the ColdFusion Administrator. I won't go into the details here as it's covered in the beta documentation.
The next issue to be aware of is that currently directories and files in the VFS persist until the server is restarted. I'd be surprised if ColdFusion 9 ships this way as I see it as risky to allow anyone on a server (especially a shared server) create as many files/directories in RAM as they want to. Sandboxing prevents people from gaining access to files and directories that they shouldn't have access to but it doesn't prevent them from denying access to system resources by unnecessarily leaving virtual files littered around the server. I don't know how the ColdFusion Engineering team is planning to deal with this, but I would think at a minimum they would provide a server wide setting in the ColdFusion Administrator letting a server admin specify how long files/directories should be allowed to persist in RAM before the server runs a job to delete them. Which brings me to another point - if you're planning to read/write files to the VFS you need to make sure you always verify the existence of a directory or file before you try to read it - especially if Adobe adds a server-wide way for an Admin to specify a timeout for virtual files.
If you want to see what files are currently stored in the VFS on your server, you can use the cfdirectory tag like so:
To delete files, you can either use the cffile tag with action="delete" or you can use the cfdirectory tag to wipe out an entire directory worth of files at one time.
I hope this helps clear up some questions that aren't directly answered in the current beta documentation. If you have other questions about the new VFS, let me know.
July 13, 2009
One of the great new features in ColdFusion 9 that I haven't seen much press about is it's Virtual File System. The virtual file system is essentially a RAM disk (remember back to DOS?). This allows you to do three really cool things. First. you can now write files such as images, spreadsheets, etc. to memory instead of disk before serving them back to the browser. Here's an example from the beta docs that shows this in use for writing a JPG file to memory and serving it up:
<cffile action="write" output="#myImage#" file="ram://a.jpg">
<cfoutput>a.jpg Doesn't exists</cfoutput>
The second thing this lets you do is write dynamic .cfm files to memory and execute them. Again from the beta docs, to write a file you would do something like this:
How you use/execute an in-memory file depends on whether the tag/function you are using requires a relative or absolute path. For tags/functions that require a relative path, you need to first create a mapping for ram:// in the ColdFusion Administrator. Once you've done that, you simple use the mapping in the relative URL. For example if you create a mapping called /inmemory, you would use it within cfinclude like this:
For tags/functions that take an absolute path, the syntax is straightforward. From the beta docs:
The third thing you can do with the virtual file system is write and execute CFCs in memory. To write a CFC to the virtual file system you do the following, from the beta docs:
You execute the CFC like so:
There are some limitations to the ram based file system. First and foremost, you can't write Application.cfm or Application.cfc to memory. Additionally, paths are case-sensitive.
The full list of tags that support the virtual file system are as follows:
Supported file functions:
So, what do you all think? I think this opens up a lot of interesting possibilities, especially in terms of performance improvement.