Failed incremental crawl can remove items from an index

I stumbled across this today whilst reading ‘Plan for crawling and federation (SharePoint Server 2010)’ : http://technet.microsoft.com/en-us/library/cc262926.aspx

In the reasons to do a full crawl section:

You want to resolve consecutive incremental crawl failures. If an incremental crawl fails one hundred consecutive times at any level in a repository, the system removes the affected content from the index.

This could mean that if an incremental crawl fails to reach a piece of content 100 times in a row it will be removed from the search index. If you are performing incremental searches every 5 minutes, and they are consistently failing, after approx. 8.5 hours your index will start to be trimmed.

Best keep an eye on those crawl logs!

UPDATE 28-Sep-2011:

My previous comment about 8.5 hours was incorrect. There is another setting documented on http://technet.microsoft.com/en-us/library/hh127009.aspx that sets the minimum time that has to elapse before content can be removed from an index. The setting ErrorDeleteIntervalAllowed must also be exceeded before the content in trimmed from the index. You can maintain this setting and a few other related ones via the PowerShell:

$SearchApplication = Get-SPEnterpriseSearchServiceApplication -Identity "<SearchServiceApplicationName>"

## 1008 = number of hours (6 weeks)

$SearchApplication.SetProperty("ErrorDeleteIntervalAllowed", 1008)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: