MOSS 2007 : Managing search properties

In Microsoft Office Sharepoint Server, you have a powerful search service. It can’t say I like everything in sharepoint but the MOSS Search Engine is really amazing. It enables to search fulltext on everything and also to filter precisely your results by searching on columns at the same time.

But the MOSS search engine isn’t as easy as searching directly in CAML. You have to prepare managed properties (from your lists columns) to be able to search on them.

All the code below is just a simplified, more understandable code for what you need to build to suit your needs. This code will need some rework to be used.

How to map columns to managed properties

So, to deploy a (WSP) solution which maps the right columns, you need to :

  • (0. Have a SSP for your site collection)
  • 1. Create the list programaticaly
  • 2. Add the site collection to the content sources
  • 3. Put some data in the list (you have to fill every column with some rubish data)
  • 4. Start a crawl and wait for it (you need to loop on the CrawlStatus of ContentSource object)
  • 5. Create all the mapped columns from the cawled columns in the “Sharepoint” category (not really hard). The best way to do that is to use the fields of your list. So that when you will change some fields in your list, you won’t have to update your column mapping code.

If you still haven’t undertand it, the way data are organized in MOSS search engine is. data (field “Data” for instance) –> crawled property (“ows_Data”) –> managed property( “owsData”, “Data” or any name you want). This is done like this because you might want to map multiple crawled properties with one managed property.

Step 1 is really easy. If you don’t know how to do that, you should stop right here.

Step 2 : Creating a Content Site programaticaly and adding a 15 minutes schedule to it :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// With a given web SPWeb...
var site = web.Site;
var sspSchema = new Schema( SearchContext.GetContext( site ) );
var sspContent = new Content( SearchContext.GetContext( site ) );
 
// We choose an "unique" name for this content source, we wouldn't want to add it twice
var name = String.Format( "Web - {0}", web.Url );
 
{ // We check if content site doesn't exist yet
	foreach ( ContentSource cs in sspContent.ContentSources ) {
		if ( cs.Name == name )
			return;
	}
}
 
{ // We add the content site
	var cs = sspContent.ContentSources.Create(
		typeof( SharePointContentSource ),
		name
	);
	cs.StartAddresses.Add( new Uri( web.Url ) );
 
	var schedule = new DailySchedule( SearchContext.GetContext( site ) ) {
		RepeatDuration = 1440,
		RepeatInterval = 15,
		StartHour = 8,
		StartMinute = 00
	};
 
	cs.IncrementalCrawlSchedule = schedule;
	cs.Update();
}

Step 3 is pretty easy too. I don’t have to show you anything. The only difficulty you might have is that for document libraries, inserting data is done differently. Here is how you can manage it :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// for a given list SPList 
 
var rand = new Random();
 
SPListItem item;
 
// If it's a document library
if ( list.BaseTemplate == SPListTemplateType.DocumentLibrary ) {
	var fileUrl = String.Format( "{0}/Data_{1}.txt", list.RootFolder.Name, rand.Next() );
 
	var content =  UTF8Encoding.UTF8.GetBytes( String.Format( "Random data file ! {0}", rand.Next() ) );
 
        // New "items" are inserted by adding new files
	var file = web.Files.Add( fileUrl, content );
 
	file.Update();
 
	item = file.Item;
 
} else {
	item = list.Items.Add();
}
 
// from here, the item SPListItem can be handled the same way between these two lists

Step 4, to start a crawl on every content source and wait for it :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// For a defined site SPSite...
var sspContent = new Content( SearchContext.GetContext( site ) );
 
foreach ( ContentSource cs in sspContent.ContentSources ) {
 
	// Not doing this may be considered as altering the enumeration (and throws an Exception)
	ContentSource cs2 = cs;
	cs2.StartIncrementalCrawl();
}
 
{ // We wait until it ends
	Boolean indexFinished;
 
	while ( true ) {
		indexFinished = true;
 
		// We check if each content-source is still crawling something
		foreach ( ContentSource cs in sspContent.ContentSources ) {
			if (
				cs.CrawlStatus == CrawlStatus.CrawlingFull ||
				cs.CrawlStatus == CrawlStatus.CrawlingIncremental )
				indexFinished = false;
		}
 
		if ( indexFinished )
			break;
		else {
			Thread.Sleep( 1000 );
		}
	}
}

You might wonder why we way for the crawl to end. The reason is you can’t use crawled properties before they have been created by detecting new columns in the list. This is the only way you can do this step by step.

And Step 5 : Make sure that every column of the list has its mapped property. Well, this is why might prefer to do some automatic mapping.
Here, to make things easier, we will say that we want to make managed property to be named like “owsInternalName”. You will see why later.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
Category sharepoint = null;
 
// We take the "SharePoint" category
foreach ( Category cat in _sspSchema.AllCategories )
	if ( cat.Name == "SharePoint" )
		sharepoint = cat;
 
// We only select fields that are in the defaultView 
// (you might want to change this behaviour)
var fields = new List<SPField>();
foreach ( String fieldName in list.DefaultView.ViewFields )
	fields.Add( list.Fields.GetFieldByInternalName( fieldName ) );
 
 
// for every one of these fields...
foreach ( SPField field in fields ) {
 
	var owsNameUnderscore = String.Format( "ows_{0}", field.InternalName );
	var owsName = String.Format( "ows{0}", field.InternalName );
 
	CrawledProperty cp;
 
	// We check if the crawled property exists
	if ( ( cp = CrawledPropertyExists( sharepoint.GetAllCrawledProperties(), owsNameUnderscore ) ) != null ) {
 
		// We then try to get the linked managed property
		ManagedProperty mp = ManagedPropertyExists( cp.GetMappedManagedProperties(), owsName );
 
		// If it doesn't exist
		if ( mp == null ) {
 
			// We create this mapped property
			try {
				mp = _sspSchema.AllManagedProperties.Create( name, CrawledPropertyTypeToManagedPropertyType( cp ) );
			} catch ( SqlException ) {
				// If the mapped property already exists
				// it means that it isn't mapped with our crawled property, so we get it from the 
				// global Managed property store
				mp = ManagedPropertyExists( _sspSchema.AllManagedProperties, owsName );
			}
 
			// And we finally map it with the crawled property
			var mappingColl = mp.GetMappings();
 
			mappingColl.Add(
				new Mapping(
					cp.Propset,
					cp.Name,
					cp.VariantType,
					mp.PID
				)
			);
 
			mp.SetMappings( mappingColl );
 
			mp.Update();
		}
	} else {
		// Crawled property doesn't exist. You have to put some data in it or crawl it.
	}
}

To make this code work, you must have the following methods in your code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
public static ManagedDataType CrawledPropertyTypeToManagedPropertyType( CrawledProperty cp ) {
	switch ( (VariantType) cp.VariantType ) {
		case VariantType.Array:
		case VariantType.UserDefinedType:
		case VariantType.Object:
		case VariantType.Error:
		case VariantType.Variant:
		case VariantType.DataObject:
		case VariantType.Empty:
		case VariantType.Null:
		case VariantType.Currency:
			return ManagedDataType.Unsupported;
		case VariantType.Single:
		case VariantType.Double:
		case VariantType.Decimal:
			return ManagedDataType.Decimal;
		case VariantType.Boolean:
			return ManagedDataType.YesNo;
		case VariantType.Byte:
		case VariantType.Long:
		case VariantType.Short:
		case VariantType.Integer:
			return ManagedDataType.Integer;
		case VariantType.Char:
		case VariantType.String:
			return ManagedDataType.Text;
		case VariantType.Date:
		case VariantType.Date2:
			return ManagedDataType.DateTime;
		default:
			return ManagedDataType.Text;
	}
}
 
public enum VariantType {
	Empty = 0x0000,
	Null = 0x0001,
	Short = 0x0002,
	Integer = 0x0003,
	Single = 0x0004,
	Double = 0x0005,
	Currency = 0x0006,
	Date = 0x0007,
	Date2 = 0x0040,
	String = 0x0008,
	Object = 0x0009,
	Error = 0x000A,
	Boolean = 0x000B,
	Variant = 0x000C,
	DataObject = 0x000D,
	Decimal = 0x000E,
	Byte = 0x0011,
	Char = 0x0012,
	Long = 0x0014,
	UserDefinedType = 0x0024,
	Array = 0x2000
};
 
public static CrawledProperty CrawledPropertyExists( IEnumerable enu, String name ) {
	foreach ( CrawledProperty cp in enu ) {
		if ( cp.Name == name )
			return cp;
	}
 
	return null;
}
 
public static ManagedProperty ManagedPropertyExists( ManagedPropertyCollection coll, String name ) {
	foreach ( ManagedProperty mp in coll ) {
		if ( mp.Name == name )
			return mp;
	}
 
	return null;
}
 
public static ManagedProperty ManagedPropertyExists( IEnumerable enu, String name ) {
	foreach ( ManagedProperty mp in enu ) {
		if ( mp.Name == name )
			return mp;
	}
 
	return null;
}

Automatically added columns

There also is an other way around, and this is the subject of this post : You can tell the “Sharepoint” category in the crawled columns to automatically map new crawled columns with mapped columns. You just have to enable this option in the Sharepoint UI for the “Sharepoint” Category or to do this programaticaly like that :

1
2
3
4
5
6
7
8
9
10
11
// For a defined site SPSite...
var sspSchema = new Schema( SearchContext.GetContext( site ) );
 
foreach ( Category cat in sspSchema.AllCategories ) {
        if ( cat.Name == "SharePoint" ) {
		cat.AutoCreateNewManagedProperties = autoCreate;
		cat.MapToContents = true;
		cat.DiscoverNewProperties = true;
		cat.Update();
        }
}

You should only apply it to the “Sharepoint” category because other categories will add new crawled properties and then not require them anymore. That would add some trashy crawled (and managed) properties that you should clean later.

These new crawled columns are indexed by their internal name (“ows_ReleaseDate” for instance) and the mapped property will have a similar name (“owsReleaseDate”).

By the way, you don’t have to wait for the crawl to end within your solution deployment. Because, new columns will be automatically created. But you should still do it, just in case…

The thing is, it only applies to new crawled columns. For these already crawled columns, you will have no other choice but to map them to managed columns your self (step 5).

Delete crawled properties

You might want to delete useless mapped properties to clean your site collection or to allow future new crawled properties with the same name to be automatically mapped. To do this you need to :

  • Make sure they are not mapped
  • Disable their mapping with the data, managed by the checkbox “contains indexed values of the columns” in the Sharepoint UI
  • Delete all unmapped data (by code or UI) in the “Sharepoint” category option

Programaticaly, it looks like that :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// For a defined site SPSite...
var sspSchema = new Schema( SearchContext.GetContext( site ) );
foreach ( Category cat in sspSchema.AllCategories ) {
 
	// We don't need to mess with any other category. 
	// Doing this wiht the whole "SharePoint category is brutal enough...
	if ( cat.Name != "SharePoint")
		continue;
 
	foreach ( CrawledProperty pr in cat.GetAllCrawledProperties() ) {
 
		// We look if there's some data indexed by the crawled property
		if ( ! pr.GetSamples( 1 ).MoveNext() ) {
			// This is necessary to be able to delete this unmapped property
			pr.IsMappedToContents = false;
			pr.Update();
		}
	}
 
	// Let's do it !
	cat.DeleteUnmappedProperties();
	cat.Update();
}

Search database reset

If your crawled properties have been deleted, they won’t come back when you add a list with the same columns. What you need to do is reset the search database and launch a crawl on it :

1
2
var sspContent = new Content( SearchContext.GetContext( site ) );
sspContent.SearchDatabaseCleanup( false );

The scope problem

If you want to use your managed property in a scope, you have to enable it :

1
2
3
4
5
6
7
8
9
// For a defined site SPSite...
var sspSchema = new Schema( SearchContext.GetContext( site ) );
var sspContent = new Content( SearchContext.GetContext( site ) );
foreach( ManagedProperty mp in sspSchema.getAllManagedProperties() ) {
	if ( ! mp.EnabledForScoping ) {
		mp.EnabledForScoping  = false;
		mp.Update();
	}
}

Note on crawl : I think you should always do some incremental crawl instead of full crawl, it’s really fast (it can take less than 20 seconds) and does all the work. If your search database has been reset or if you added a new content source, sharepoint will do a full crawl when you ask an incremental crawl.

Note 2 on crawl : Crawl isn’t indexing data when new items are added, but you can make it do it. You just have to create an event receiver which will start the crawl when the ItemAdded / ItemUpdated event is triggered.
If you have frequent list updates, you might have modifications while MOSS is performing a crawl. In that case, you have to create an asynchronous mechanism to enable “recrawling” just after the crawl has finished (in a threaded object in the ASP.Net Application object store for instance).

Leave a Reply

Your email address will not be published. Required fields are marked *