SEO Content Strategy
March 26th 2020

Why PDFs Are Not Ideal for SEO

5 min read

The worse thing you can do is waste time optimizing a PDF file for Search. Granted, they are a common format for content assets. Case studies, eBooks, whitepapers, and product data sheets are frequently converted into PDF documents for digital marketing use. But optimizing PDFs for Search? You ‘ve got better things to do with your time.

Here’s why PDFs are not ideal for SEO

PDFs can be crawled as though they are web pages by search engines. However, in most cases, they lack information found in standard web pages. Google can still index them, but they don’t give the search engine everything it desires when analyzing and ranking pages. The same goes for Word or Excel documents. Search engine bots can crawl, index, and rank them, but they lack the data needed to be ideal SEO assets for content producers.

Here are some common issues and mistakes content producers experience with PDFs:

Most PDFs lack the proper metadata for which Google looks. Saving a text editor or slid presentation as a PDG file doesn’t automatically add the correct document properties. You can use Adobe Acrobat to map some of that metadata, but most people skip this step before publishing the PDF on the web.

The lack of metadata means that these documents usually look inferior to search engines. Yes, the PDF will probably be indexed, but its SEO performance will be substandard. For example, if there is no title tag, Google will pull the actual file name. Being not very descriptive, you can expect the click-through rate to suffer. 

The click-through rate on a PDF in a SERP is generally low anyway, due to weird SERP formatting. If you are publishing PDFs, you need to check this. People who create PDFs don’t always know how to add this info.

Links in PDFs don’t get processed like links on webpages. They are not counted the same way in terms of passing authority through them. So, if you’re using internal links in your PDF, they are not going to be as effective as you think.

PDFs aren’t as easily formatted as a webpage. A web PDF document is usually weaker than an HTML document. 

Most businesses with standard Google Analytics or other analytics packages cannot or do not track PDF traffic. When a PDF loads, a web page doesn’t load. The standard Google Analytics code is a web page tag that loads when a web page loads. That tag doesn’t exist on a PDF. It’s essentially a dead end, so it’s a bounced user, and analytics platforms can’t track the conversions. 

MarketMuse reads PDFs in the same manner as Google. It can either extract the text or process it through Optical Character Recognition if it’s stored as a text image. 

Using PDFs for content planning

PDFs are not bad; they can get great links and be great resources. Full of rich and relevant keywords, they can become very powerful. How do you use this?

If you get PDFs ranking well, how do you use that to your advantage? First of all, don’t spend time trying to make the PDF SEO friendly.

Create a content strategy that surrounds the PDF with great access points to it. Big, authoritative PDFs can be broken out into blog posts, repurposing, and linking to the PDF. Support it with clusters – what non-branded terms is it doing OK for? What concepts surround the product/topic of the PDF? 

If you ever see a PDF that is ranking for an important intent, you need to create a page for it that takes its place. It should do well fairly quickly. Setting the canonical tag of the blog post to be self-referential is one of those SEO best practices to avoid any duplicate content issues. Extra points if you can set the canonical tag of the PDF to the blog post. But don’t fret if you can’t.

Look for PDFs that have intent mismatches – PDFs have to be really aligned to the user intent to rank well. If you want to turn the PDF’s contents into a web page, you can almost exactly duplicate the PDF in a more web-friendly way. 

Or, if you want to drive people to download the PDF, create a landing page and use it as lead capture. HubSpot does this masterfully. They put a PDF behind a landing page, and use many relevant content items to shoot power to landing pages. 

A high number of PDFs on a given SERP mean that the SERP is weak. You can easily get an advantage by optimizing crawlable content. Rarely is a PDF managed properly. It’s a great great way to find fast opportunities. 

Summary

PDF is a wonderful file format with many legitimate uses. However, optimizing them for Search is not one of them.

Camden Gaspar

Written by Camden Gaspar camden_gaspar