PDF iFilter and Sitecore Search

Over the past week I’ve seen a similar issue arise in two separate channels around PDF iFilter setup for Sitecore indexing. Once was over on the Sitecore Slack Channel while the second was around the Paragon office.

The Error

The errors experienced where similar dealing with a ‘could not compute value for the _content field.’ ending in a System.Runtime.InteropService.COMException. Here’s the error messages for the bots:

ManagedPoolThread #5 15:59:33 ERROR Could not compute value for ComputedIndexField: _content for indexable: sitecore://master/{BA9C8FA0-B9BC-43A2-8CF9-9B6B4139E6BE}?lang=en&ver=1
 Exception: System.Runtime.InteropServices.COMException
 Message: Exception from HRESULT: 0x80048605
 Source: Sitecore.ContentSearch
 at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.IPersistStream.Load(IStream stream)
 at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.InitializeFilterAsPersistStream(IFilter filter, String fileName)
 at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
 at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
 at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
 at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
 at Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder.AddComputedIndexFields()
ManagedPoolThread #10 09:26:31 WARN Could not compute value for ComputedIndexField: _content for indexable: sitecore://web/{3F5EDAB1-805D-470F-A3B6-C326B9F118E5}?lang=en&ver=1
 Exception: System.Runtime.InteropServices.COMException
 Message: Error HRESULT E_FAIL has been returned from a call to a COM component.
 Source: mscorlib
 at System.Runtime.InteropServices.ComTypes.IPersistFile.Load(String pszFileName, Int32 dwMode)
 at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.InitializeFilterAsPersistFile(IFilter filter, String fileName)
 at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
 at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
 at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
 at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
 at Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilder.AddComputedIndexField(IComputedIndexField computedIndexField, ParallelLoopState parallelLoopState, ConcurrentQueue`1 exceptions)

The Solution

It seems there are at least two potential fixes in correcting the COMException.

First, confirm the iFilter assemblies have been installed (or copied) to C:\Windows\System32\inetsrv. Then restart IIS and kick-off an index of your Media Library.

If this doesn’t resolve the problem you most likely need to install version 11 or higher of the PDF iFilter from Adobe http://supportdownloads.adobe.com/detail.jsp?ftpID=5542.

General Help – PDF iFilter Setup

When starting to setup PDF iFilters to support search scenarios, most of us have landed on the following Sitecore Document page, Index PDF files.

As of time of writing, the recommended version 9 of the iFilter does not seem to be properly indexing for users causing COM Exceptions to be seen in the log based on documented issues with version 9 on Windows 8 , I recommend and have had success running version 11 of the iFilter.

Steps

  1. Download version 11 or higher of the PDF iFilter from Adobe at http://supportdownloads.adobe.com/detail.jsp?ftpID=5542
  2. Run the install on the server (or local machine)
  3. Copy all assemblies from “C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin” to “C:\Windows\System32\inetsrv
  4. Restart IIS
  5. Kick-off your re-index of PDFs
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.