Rob Brooks-Bilson
Tech, Photography, Stuff
Tech, Photography, Stuff
August 9, 2010
I submitted a bug for this but the system seems to have swallowed it up without giving me the bug number (probably because I submitted as cf 9.0.1).
I've found what I consider to be a serious bug an issue with ColdFusion's
ehache implementation and query objects that may bite you if you're not aware of how it works. If I take a query object and stick it in cache, then perform an operation on the original query
object such as adding a column or performing certain query of query
operations, ColdFusion is treating the query object that's in cache as
if it were a copy by reference to the original query object. That is,
if I make a change to the original query object, ColdFusion is also
applying the change to the version that's in cache.
As far as I'm concerned, cache represents a boundary that ColdFusion
should not implicitly cross. With most caching systems I've worked with in the past (such as memcached), the cache always acted as a dumb key/value store. Unless I perform an explicit cachePut(), I wouldn't expect that CF would update values in the cache. Here's
two code snippets that reproduces the case. The first uses the
cfartgallery data source:
The second example shows how this can be done with a query of a query, based on a different bug Ray found previously (see http://www.coldfusionjedi.com/index.cfm/2009/8/28/Another-example-of-the-QofQ-Bug). In this sample, note that the date gets reformatted by the query of a query and both the original query and the cached version get updated:
The work around for this problem is to use the duplicate() function to make a clone of the query object before doing the cachePut(). Although this works, there are other potential consequences from having two copies of the query around, so be careful.
Update: It looks like this is actually expected behavior in Ehcache. Unfortunately, it's not documented in the ColdFusion documentation anywhere, but Ehcache actually has two configurable parameters (as of v. 2.10) called copyOnRead and copyOnWrite that determine whether values returned from the cache are by reference or copies of the original values. By default, items are returned by reference. Unfortunately we can't take advantage of these parameters right now as CF 9.0.1 implements Ehcache 2.0.
I can live with this, but it's not what I expected as I've always viewed Ehcache as a "dumb" key-value store and certainly didn't expect this behavior. Even if ColdFusion was running v 2.10 of Ehcache, it's still not something we could easily configure. Since ColdFusion currently doesn't have a cacheNew() function for creating new user-defined cache regions, the only way to turn this functionality off would be to hard-code your cache configuration in your ehcache.xml file for each user-defined region where you want to disable copyOnRead and copyOnWrite.
8/9/10 4:25 PM
Hibernate's caching API used to have this identical problem - it's a common caching bug. I'm not too familiar with how ColdFusion uses Ehcache but you might also be able to fix it via the Ehcache configuration using copyOnWrite="true" - more here: http://ehcache.org/documentation/configuration.htm...
8/9/10 5:06 PM
Alex,
Thanks for the pointer. I don't know why I never noticed that in the docs before. I wonder if it was added after Ehcache 1.6?
In any case, I think the CF Docs are going to need to be updated to reflect this or I have a feeling it will end up biting people who expect Ehcache to be a "dumb" key/value store.
I'm also glad it's configurable - I just wish that CF allowed you to configure user-defined cache regions ahead of time so you could set those parameters without having to hard-code your cache regions in the ehcache.xml file.
8/9/10 10:23 PM
Thanks for catching this. This could be a BIG SNAFU if you didn't know about it.
8/10/10 6:34 PM
Rob, this is also due to the nature of strong references the objects have. If copy on write is enabled a duplication of the object will have to be made and you have to prepare for those as the entire object graph will be duplicated in order to set it as a new entity on the caching engine.
"The default implementation uses serialization to copy elements" as per ehcache or you could build your own copying strategy.
To me this is normal behavior for objects as you are putting references of them into the cache. If you put a string that is unmodifiable due to its immutable nature, so no problem there on simple elements.
But yes, I agree the ColdFusion documentation should expose the fact that object references ARE placed into the cache unless the user configures the cache to explicitly do duplications and express the concern about object graphs.
You also have to consider the performance hit you get when you enable copy-on-write and copy-on-Read due to the serialization of the objects.
8/10/10 7:00 PM
Luis, makes perfect sense (now). My stumbling block was that I didn't realize that Ehcache was designed to cache objects that way. It makes total sense since they don't have to be serialized first, though. Thanks for the additional and thorough explanation.
I generally wouldn't need copyOnRead/Write, but in this case I was just trying to put together some quick examples from an existing codebase that I don't have the time to rewrite. I solved my problem with duplicate() - again, something I normally wouldn't do but it served my narrow purpose.
8/10/10 7:04 PM
Totally understandable Rob!! The docs definitely need this kinda input for sure.
By the way, I will be posting the CacheBox documentation soon and would love your input and if you are willing, to maybe write a few chapters about caching concepts and topology. Please ping me if you are interested, I can bribe you with t-shirts and candy.
8/21/10 2:33 PM
I think this is almost a case of knowing too much about caching :) By that, I mean that I would suspect a lay-person (ie. someone who is not used to caching) to think of this as the expected behavior. Unless someone really understands what is going on under the covers, I would think that using cachePut() is no real different that using something like:
session.key = value;
.... to perform caching. After all, I think that's how a lot of people think about caching (as in, I need to "cache" it in the session scope or the application scope).
That said, it never hurts to have better documentation either.