Sharepoint – Clean a huge Document Library
February 20, 2009 — Florent ClairambaultI recently had to clean a huge (113 000 rows) document library. The first question that must come in mind is : Why was it so big ? Well, someone thought it was beter to store data in lots of XML files instead of in a classic list.
The second question is : How ?
First, you have to understand that you can’t use a DeleteAll() method, it doesn’t exist. You must fetch data by little packet of rows. If you try to get everything, you will have a OutOfMemoryException.
I can’t give you the final app as I am not sure that I have the right to, but I can give you the core sourcecode :
First of all, you want to avoid the memory exception problem. So you will cut your results in little packets of 100 rows, that’s pretty simple :
1 2 3 4 5 6 | // We get everything but we limit the result to 100 rows SPQuery q = new SPQuery(); q.RowLimit = 100; // We get the results SPListItemCollection coll = list.GetItems( q ); |
Then, what you could do is delete each item one by one :
1 2 3 | foreach( SPListItem item in cColl ) { item.Delete(); } |
But it’s soooo freaking slow. On a production server, we had something like 1 delete per second.
What you need to use is a CAML batch delete, this method builds a CAML delete batch :
1 2 3 4 5 6 7 8 9 10 11 12 | private static String BuildBatchDeleteCommand( SPList list, SPListItemCollection coll ) { StringBuilder sbDelete = new StringBuilder(); sbDelete.Append( "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Batch>" ); // We prepare a String.Format with a String.Format, this is why we have a {{0}} string command = String.Format( "<Method><SetList Scope=\"Request\">{0}</SetList><SetVar Name=\"ID\">{{0}}</SetVar><SetVar Name=\"Cmd\">Delete</SetVar><SetVar Name=\"owsfileref\">{{1}}</SetVar></Method>", list.ID ); foreach ( SPListItem item in coll ) { sbDelete.Append( string.Format( command, item.ID.ToString(), item.File.ServerRelativeUrl ) ); } sbDelete.Append( "</Batch>" ); return sbDelete.ToString(); } |
With that method, we could delete 100 rows in 2/3 seconds. Still, you can’t expect magic with Sharepoint, everything around Sharepoint is slow.
The final main code should look a little bit like that :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | // While there's something left while ( list.ItemCount > 0 ) { // We get everything but we limit the result to 100 rows SPQuery q = new SPQuery(); q.RowLimit = 100; // We get the results SPListItemCollection coll = list.GetItems( q ); // We process a CAML batch String batch = BuildBatchDeleteCommand( list, coll ); // We execute it web.ProcessBatchData( batch ); list.Update(); } |
Note SPWeb::ProcessBatchData returns a String, if your code doesn’t work, it could help you find why. This is the only way, because a wrong batch won’t throw any exception. Think about it…
loading...