Analysing SEO Content Depth Using Screaming Frog
In this blog post, I’ll be exploring content depth for SEO, identifying some signals to look out for, and how we can measure them using Screaming Frog.
What is content depth?
Content depth is a measure of how long and in-depth the content is on a web page. Analysis by SEOs have regularly supported findings that more in-depth content tends to rank better in organic search – our EAT Whitepaper findings also support these findings.
Content depth is more than how much copy you have on your page: It’s how useful it is to potential visitors. Naturally, Google is attempting to guess how useful your content is using the algorithm. We’re going to try and do something similar to grab the available data points to help break down a list of URLs on-page content approach.
What does copy depth mean for SEO?
Copy depth is important for SEO due to the interplay with EAT signals: More copy can signal that your content goes in-depth into the subject topic and makes you more competitive in search results.
First, you need to understand how much the competition is deploying content depth on important landing pages – this helps us prioritise work to review landing pages in SEO roadmaps. For example, if we can see a pattern for the top 10 or top 3, ranking pages in Search are deploying similar on-page approaches to copy, perhaps we need to prioritise it in our strategy.
If you’re reviewing the top 30 URLs for a keyword term, reviewing your own site or the competition, this post will help you use Screaming Frog to get those important data points.
Content depth signals: Going beyond word count
Content depth and quality are important ranking signals for on-page SEO, but measuring something subjective like content quality is hard, and we’ll leave that to Google. However, we can look at some adjacent signals to help us measure some important elements of on-page SEO to better understand what is going on in, for example, the top 10 for a keyword or on our own websites.
Word count is great, but it runs into issues – particularly on larger e-commerce sites that can have huge navigation elements – when attempting to use it as a measure of how much actual content is on a page. Counting the number of paragraphs might be a better signal.
Some badly-templated websites can use paragraphs in navigational elements, use empty paragraphs for formatting, or just use a lot of really short paragraphs which don’t go into depth. So perhaps let’s count the number of paragraphs over 150 characters.
Assuming there is only one H1 on the page (not always the case), the number of sub-headings could also be a good signal for content depth, where each new heading implies the existence of a new section of content.
FAQ and Q&A content can work very well in search. With Google deploying more Quick Answers and FAQs in search results, SEOs often target them with this type of approach. Let’s also count the number of question marks in headings.
SEOs working in YMYL verticals will be familiar with talking-head “experts” deployed on important landing pages: Let’s see if the competition mentions “expert” in their content.
Comparethemarket.com’s talking-head expert
So our data points are:
- Word Count
- Number of H2s
- Number of H3s
- Number of Paragraphs
- Number of Long Paragraphs
- Number of Question Hs
- Number of Expert Mentions
Before we dive into scraping these in Screaming Frog, let’s talk about the methodology.
Using XPath functions for SEO
XPath is a language used to query XML documents – where it becomes useful to SEOs is using XPath to count the number of elements of a certain type on a webpage. As long as we can specify the element with some accuracy, we can count it.
For our use-case, XPath selectors mostly look like this:
|//||p||[string-length() > 150]|
|Starting from the root of the HTML document;||Select these elements;||That match these rules|
See the excellent XPath cheat sheet to learn more useful XPath.
We then build a series of XPath extractors using this basic structure, grabbing key elements like paragraphs and headings to count how in-depth the content goes.
The basic functionality for us is to wrap selectors like the above in a count() function:
#returns the number of <p> elements
We can add a predicate to the node selection like follows, and XPath has a few built-in functions we can take advantage of:
count((//p)[string-length() > 150])
#returns the number of <p> elements containing more than 150 characters
Scanning contained text of elements is straightforward:
count((//p)[text()[contains(.,’expert’) or contains(.,’Expert’)]])
#returns the number of <p> elements containing ‘expert’ or ‘Expert’ (case-sensitive)
XPath also lets us do OR operators, so we can look in different heading elements for what we need:
#returns the number of <h1> OR <h2> OR <h3> elements with text containing a question mark
Most of these require using the Custom Extraction functionality in Screaming Frog – let’s start building those functions now.
Using XPath in Screaming Frog for custom extraction
To use custom extraction in Screaming Frog, navigate via Configuration → Custom → Extraction, where you will see an empty list.
Let’s start with counting paragraphs:
|Extractor name||Mode of extraction||Expression||Data to extract|
|Name the column for the data export||We’re using XPath for this||The function: We’re counting <p> tags||We want to extract the function value, not inner HTML or similar|
You should see a tick, which means your XPath is passing Screaming Frog’s XPath validator and the full row looks like this:
Let’s go ahead and build rules for the rest of our needs:
|Extractor name||Mode of extraction||Expression||Data to extract|
|p-long-count||XPath||count((//p)[string-length() > 150])||Function Value|
|expert-count||XPath||count((//p|//h2|//p/*|//h1|//h3)[text()[contains(.,’expert’) or contains(.,’Expert’)]])||Function Value|
Running the crawl
Once the custom extraction rules are in place, we can go ahead and crawl our URLs for testing. If the extraction is working as intended, you will see the extraction output in the Custom Extraction tab and the Internal tab.
Not all pages are parseable this way – Screaming Frog will occasionally return ‘unparseable HTML’ errors, but we find this approach to work well for general analysis purposes.
Using the output data is up to you. As hinted earlier in this blog, at Kaizen, we use these and other on-page signals to review the competition in the top 10, top 5 and top 3 to understand how our client’s content shapes up and what we need to prioritise within the roadmap.
This post scratches the surface of what is possible with Screaming Frog: Custom Extraction is an excellent way to get more from Screaming Frog. Understanding and deploying it is a key milestone for any SEO.
Are you interested in learning more about how we could use Screaming Frog to enhance your SEO Strategy? Get in touch.
By Harry Clarke - 21/09/2021