Recently, I encountered an issue with SharePoint 2013 search crawls where .pdf files smaller than 1 MB were reporting a warning: “The item has been truncated in the index because it exceeds the maximum size”. The default MaxDownLoadSize for documents in SharePoint is 64MB, which was more than enough the handle these relatively small .pdf files.

After I reached out to some co-workers; one had suggested that the error might be a false-positive and the entire document had been crawled. I tested this by first searching for words at the end of the document and no matches were found; this would be expected if it were truncated. Next, I tried searching for some text in the middle of the document and no matches were found either. I thought it must have truncated a lot of text and tried searching for text contained at the very beginning of the document. No results were found! So when the warning said it had truncated the item, it had truncated the whole document.

I decided to test a Microsoft Word document of approximately the same size and found that it did not throw a warning. I then exported that Word document to .pdf and crawled it and surprisingly no warning was thrown. I took a look at the properties of both .pdf files. I found that problem .pdf file was version 1.3 (Acrobat 4.x) and had been generated by Microsoft SQL Server Reporting Services (SSRS), while my test document was version 1.6 (Acrobat 6.x).

Image may be NSFW.
Clik here to view.

Figure 1. PDF Version 1.3 (Acrobat 4.x) generated by SQL Server Reporting Services (SSRS).

Image may be NSFW.
Clik here to view.

Figure 2. PDF Version 1.5 (Acrobat 6.x) generated by Microsoft Word

This got me thinking that problem was version 1.3, so I converted my .pdf to 1.3 and crawled it. I was surprised to find that it did not throw the warning as I expected it too. This would mean that either 1) the culprit was SSRS report generation (not the version 1.3), or 2) there was something about converting a version 1.5 to a version 1.3 that fixed the issue. With this in mind, I preferred to find a centralized solution rather than converting every .pdf document in our site.

I decided to try using the official Adobe IFilter in place of the one that is built-in to SharePoint 2013. In order to leverage the custom IFilter, I did the following:

Downloaded & installed the March 2013 PU as it is a prerequisite for the next CU (http://support.microsoft.com/kb/2767999)
Downloaded & installed the July 2014 CU (http://support.microsoft.com/kb/2767999)
Downloaded & installed the Adobe IFilter (http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542)
NOTE: May require a reboot
I created the following script based on these instructions for SharePoint 2010: http://www.adobe.com/devnet-docs/acrobatetk/tools/AdminGuide/Acrobat_Reader_IFilter_configuration.pdf.
NOTE: not all steps in documentation were necessary as SharePoint 2013 natively supports .pdf.

[sourcecode language="powershell"]
Add-PSSnapin "Microsoft.SharePoint.PowerShell" -ErrorAction SilentlyContinue
$iFilterPath = "C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\"
$oldPath=(Get-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH).Path
if(!$oldPath.Contains($ifilterPath))
{
    Write-Host "Adding Environment Path"
    $newPath="$oldPath;$iFilterPath"
    Set-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH –Value $newPath
}

Write-Host "Setting Up Registry entries"
#See: http://www.adobe.com/devnet-docs/acrobatetk/tools/AdminGuide/Acrobat_Reader_IFilter_configuration.pdf
$path = "Registry::HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Setup\Filters\.pdf"
if(Test-Path $path) {Remove-Item -Path $path -Recurse}
New-Item $path
New-ItemProperty -Path $path -Name Extension -PropertyType String -Value "pdf"
New-ItemProperty -Path $path -Name FileTypeBucket -PropertyType DWord -Value 1
New-ItemProperty -Path $path -Name MimeTypes -PropertyType String -Value "application/pdf"

$path = "Registry::HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf"
if(Test-Path $path) {Remove-Item -Path $path -Recurse}
New-Item $path
Remove-Item -Path $path\* -Recurse
New-ItemProperty -Path $path -Name "(Default)" -PropertyType MultiString -Value "{E8978DA6-047F-4E3D-9C78-CDBE46041603}"



$ssa = Get-SPEnterpriseSearchServiceApplication
$filter = Get-SPEnterpriseSearchFileFormat -SearchApplication $ssa -Identity pdf
if(!$filter.UseIFilter)
{
    Write-Host "Enabling IFilter for SharePoint"
    Set-SPEnterpriseSearchFileFormatState -SearchApplication $ssa -Identity pdf -UseIFilter $true -Enable $true

    Write-Host "Restarting Search Service"
    $service = Get-SPServiceInstance | ? {$_.TypeName -eq "Search Host Controller Service"}
    $service | Stop-SPServiceInstance -Confirm:$false
    while(-not (($service | Where-Object {$_.Status -eq "Disabled"}).Count -eq $service.Count)){
        write-host -ForegroundColor Yellow $service.Status; sleep 5;
        $service = Get-SPServiceInstance | ? {$_.TypeName -eq "Search Host Controller Service"}
    } 
    $service | Start-SPServiceInstance
    Write-Host "Restarting IIS"
    iisreset
}

[/sourcecode]

After executing the script, I performed a full crawl and was pleased to find that all of the .pdf size-related warnings were gone. I performed a search using the same criteria as before from the end, middle, and beginning of document and this time results were returned.

Conclusion

While I am not certain if the cause of this issue is SSRS document generation or legacy .pdf version 1.3 documents, I was able to solve this issue using the official IFilter from Adobe. If someone else encounters this problem, hopefully this will help.

The post Small .pdf files reporting – “The item has been truncated in the index because it exceeds the maximum size” appeared first on Messor Solutions, LLC..

Small .pdf files reporting –“The item has been truncated in the index because it exceeds the maximum size”

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112