Sharepoint – Clean a huge Document Library

I recently had to clean a huge (113 000 rows) document library. The first question that must come in mind is : Why was it so big ? Well, someone thought it was beter to store data in lots of XML files instead of in a classic list.

The second question is : How ?

First, you have to understand that you can’t use a DeleteAll() method, it doesn’t exist. You must fetch data by little packet of rows. If you try to get everything, you will have a OutOfMemoryException.

I can’t give you the final app as I am not sure that I have the right to, but I can give you the core sourcecode :

First of all, you want to avoid the memory exception problem. So you will cut your results in little packets of 100 rows, that’s pretty simple :

1
2
3
4
5
6
// We get everything but we limit the result to 100 rows
SPQuery q = new SPQuery();
q.RowLimit = 100;
 
// We get the results
SPListItemCollection coll = list.GetItems( q );

Then, what you could do is delete each item one by one :

1
2
3
foreach( SPListItem item in cColl ) {
   item.Delete();
}

But it’s soooo freaking slow. On a production server, we had something like 1 delete per second.

What you need to use is a CAML batch delete, this method builds a CAML delete batch :

1
2
3
4
5
6
7
8
9
10
11
12
private static String BuildBatchDeleteCommand( SPList list, SPListItemCollection coll ) {
	StringBuilder sbDelete = new StringBuilder();
	sbDelete.Append( "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Batch>" );
 
	// We prepare a String.Format with a String.Format, this is why we have a {{0}}
	string command = String.Format( "<Method><SetList Scope=\"Request\">{0}</SetList><SetVar Name=\"ID\">{{0}}</SetVar><SetVar Name=\"Cmd\">Delete</SetVar><SetVar Name=\"owsfileref\">{{1}}</SetVar></Method>", list.ID );
	foreach ( SPListItem item in coll ) {
		sbDelete.Append( string.Format( command, item.ID.ToString(), item.File.ServerRelativeUrl ) );
	}
	sbDelete.Append( "</Batch>" );
	return sbDelete.ToString();
}

With that method, we could delete 100 rows in 2/3 seconds. Still, you can’t expect magic with Sharepoint, everything around Sharepoint is slow.

The final main code should look a little bit like that :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// While there's something left
while ( list.ItemCount > 0 ) {
 
	// We get everything but we limit the result to 100 rows
	SPQuery q = new SPQuery();
	q.RowLimit = 100;
 
	// We get the results
	SPListItemCollection coll = list.GetItems( q );
 
	// We process a CAML batch
	String batch = BuildBatchDeleteCommand( list, coll );
 
	// We execute it
	web.ProcessBatchData( batch );
 
	list.Update();
}

Note SPWeb::ProcessBatchData returns a String, if your code doesn’t work, it could help you find out why. This is the only way, because a wrong batch won’t throw any exception. Keep that in mind.

4 thoughts on “Sharepoint – Clean a huge Document Library”

  1. Hello,

    Thank you for this post, I need to use your code in order to delete all versions of a document in a SharePoint Document Library.

    What will the CAM Query look like?

    Thank you.

  2. I have applied the same logic, but for me its taking 15-20 mins to delete 100 files,

    Can share which architecture you have used?
    2-Tier
    3-Tier
    Supporting Hardware config.

Leave a Reply

Your email address will not be published. Required fields are marked *