Summary
The SharePoint CrossListQueryCache class provides a highly scalable way to run site wide queries. It provides all the power of SPSiteDataQuery without the performance implications. However it has some major gotchas that you should be aware of before using.
I'm always looking for ways to improve application performance (and with SharePoint I have more opportunities than time). I was focused on some code we use to pull list items. This code is in the critical path for our runtime performance and scalability. It is the perfect place for cached data query. So I started looking at different ways to do data querying in SharePoint.
My first stop was on Google where I found some information about the SharePoint SPSiteDataQuery object. According to Eric Shupps (The SharePoint Cowboy :D) SPSiteDataQuery is great for small data sets and small to medium traffic environments. Eric actually recommends using the good ole GetListItems method. Well we are using GetListItems today, so I was not encouraged. Waldek Mastykarz put together a really nice posting that compares several different approaches for querying lists (just what I needed).
Even though I had read the warnings about SPSiteDataQuery I decided to check it out (for some reason I really like to repeat the mistakes of others ;-). I really like the functionality provided by the class. I used Reflector to look under the covers. I didn't seen anything crazy going on (makes a call to COM method called CrossListQuery... I didn't bother chasing it to the SQL layer). I ran some performance tests with it, but didn't see a compelling reason to dump GetListItems for this routine.
Next I jumped into the CrossListQueryInfo/CrossListQueryCache routines. It is worth mentioning that these are in the Microsoft.SharePoint.Publishing library which means you have to have SharePoint 2007 (WSS v3 does not include this library). What I found shocked and delighted me (hmm, doesn't sound quite right).
First the shock...
According to the documentation CrossListQueryCache was built specifically for what I needed to do. Query one or more lists and store the results in cache so subsequent calls are at near lightening speed.
But when I ran my tests I was seeing no better performance than SPSiteDataQuery (all you haters saying CrossListQueryInfo.UseCache must be set to true, bite your tongue and keep reading). I assumed I was doing something wrong, so I cracked open the worlds best documents (the source code).
I used Reflector (one of the best dam tools ever developed) to look at the routine I was using, CrossListQueryCache.GetSiteData(SPWeb). Well according to the code there is no caching going on, infact this routine is nothing more than a wrapper for SPSiteDataQuery. So a little more checking and I discovered that something strange was afoot in the CrossListQueryCache library.
There are 4 overloads for CrossListQueryCache.GetSiteData. Two of the overloads cache the results and two do not.
| CrossListQueryCache Routine | Uses Cache |
| GetSiteData (SPSite) | Yes |
| GetSiteData (SPWeb) | No |
| GetSiteData (SPSite, String) | Yes |
| GetSiteData (SPWeb, SPSiteDataQuery) | No |
So unless you are using one of the overloads that supports caching (and have the CrossListQueryInfo.UseCache property set to true) you might as well use SPSiteDataQuery.
There is one other gotcha with the overloads that support cache. Your query cannot contain "<UserId/>" tag. If it does then cache is not used.
Next the delight...
When you use the GetSiteData routines that support caching you see major performance improvements. In fact my testing showed that the performance of CrossListQueryCache was as good as the Content By Query Control (not surprising since they are both caching their results).
CrossListQueryInfo clqi = new CrossListQueryInfo();
clqi.Query = "<OrderBy><FieldRef Ascending=\"FALSE\" Name=\"Title\" /></OrderBy><Where><BeginsWith><FieldRef Name=\"Title\" /><Value Type=\"Text\">Test</Value></BeginsWith></Where>";
clqi.ViewFields = "<FieldRef Name=\"Title\" /><FieldRef Name=\"ContentType\" />";
clqi.Lists = "<Lists BaseType=\"1\" />";
clqi.Webs = "<Webs Scope=\"SiteCollection\" />";
clqi.UseCache = true;
CrossListQueryCache clqc = new CrossListQueryCache(clqi);
DataTable tbl = clqc.GetSiteData(SPContext.Current.Web);
The code above (which does not use caching since it passes in the Web) averaged 0.0500 seconds.
That same code slightly modified (see below) averaged 0.0008 seconds.
CrossListQueryInfo clqi = new CrossListQueryInfo();
clqi.Query = "<OrderBy><FieldRef Ascending=\"FALSE\" Name=\"Title\" /></OrderBy><Where><BeginsWith><FieldRef Name=\"Title\" /><Value Type=\"Text\">Test</Value></BeginsWith></Where>";
clqi.ViewFields = "<FieldRef Name=\"Title\" /><FieldRef Name=\"ContentType\" />";
clqi.Lists = "<Lists BaseType=\"1\" />";
clqi.Webs = "<Webs Scope=\"SiteCollection\" />";
clqi.UseCache = true;
CrossListQueryCache clqc = new CrossListQueryCache(clqi);
DataTable tbl = clqc.GetSiteData(SPContext.Current.Site, CrossListQueryCache.ContextUrl());
Slightly Different Results
As I said the performance is great, but one thing really bothered me. The DataTable results did not match. Everything about the code is identical (same query, same CrossListQueryInfo configuration), but different results.
I discovered that the routines that actually use cache contain some extra code that does a check to see if the item is checked-in. If it is then it shows it, if not then it does not. While the code that does not use cache returns all items that match the query (even those that are not checked-in).
I was worried that the routines were not using the same technique to query across the site lists, so I did some digging with Reflector. What I found is that they both use the SPWeb.GetSiteData to execute the query. The routines that cache the results do a little extra work to trim the results. Just take a look at CachedArea.GetCrossListQuery if you want to see what is happening.
Anyway the end result was that I found that if used correctly the CrossListQueryCache object operates with the same efficiency of the Content Query Control. It provides an out of the box way to have your GetSiteData results cached. I highly recommended it if you are using SharePoint 2007 and not WSS 3.
6 comments:
Thanks for the informative article Jeff.
Thanks for a good post!! It helped me a lot =).
I had two mysterious issues when using GetSiteData(SPSite). The first one being that a checked in document was getting returned by the method but once that document was "Approved" it no longer got returned by the GetSiteData(SPSite) method! The second issue was that the dates where returned in web site timezone instead of the timezone set for the currently logged in user.
When I changed to GetSiteData(SPWeb) both issues where solved!
Apparently we can get either caching or correct functionality from SharePoint, but not both =)
Again, thanks a lot!
Awesome post Jeff. I'm loving it.
Great article and thanks a lot! Very informative!!!
Jeff, During your performance tests did you try see the implications of querying large amounts of data say over a few hundred sites (above 500+) across site collections and the impact it has on memory consumption
Thanks for the post on this. I also found that setting the CrossListQueryInfo's WebUrl property will keep it from using the cache (even when calling your noted GetSiteData methods with an SPSite object).
Post a Comment