
Advanced Use of Regex in Screaming Frog for Link Analysis
As any SEO professional knows, link analysis is a crucial aspect of website optimization. Screaming Frog, a powerful tool for crawling and analyzing websites, offers advanced features that allow users to utilize regular expressions (regex) for more complex link analysis tasks. In this article, we’ll delve into the world of regex in Screaming Frog and explore some advanced use cases.
What is Regex?
Regex, short for regular expression, is a pattern-matching language used to search and manipulate strings. It’s a powerful tool that allows users to define complex patterns and match them against strings. In the context of link analysis, regex can be used to extract specific types of links or attributes from URLs.
Using Regex in Screaming Frog
Screaming Frog supports regular expressions through its “Regular Expression” feature. To access this feature, follow these steps:
- Open your website’s crawl report in Screaming Frog.
- Select the “Attributes” tab and click on the “+” button to add a new attribute filter.
- In the “Filter” dropdown menu, select “Regular Expression”.
- Enter your regex pattern in the “Expression” field.
Example 1: Extracting Internal Links
Suppose you want to extract all internal links from your website’s crawl report. You can use the following regex pattern:
^https?://(?:www\.)?yourdomain\.com/.*$
This pattern matches any URL that starts with “http” or “https”, followed by an optional “www.” subdomain, and ends with the string “.yourdomain.com”. The “.com/” at the end captures any internal links.
Example 2: Extracting Links from Specific Pages
Suppose you want to extract all links from specific pages on your website. You can use a regex pattern that matches the page’s URL:
^https?://(?:www\.)?yourdomain\.com/(path/to/page|category-page).*$
This pattern matches any URL that starts with “http” or “https”, followed by an optional “www.” subdomain, and ends with one of two specific page paths: “/path/to/page” or “/category-page”.
Example 3: Extracting Links with Specific Attributes
Suppose you want to extract all links with a specific attribute, such as a “rel” attribute set to “nofollow”. You can use the following regex pattern:
^https?://[^<]+ rel="nofollow"[^>]*$
This pattern matches any URL that contains an “rel” attribute with the value “nofollow”.
Conclusion
Regular expressions are a powerful tool for link analysis in Screaming Frog. By using advanced regex patterns, users can extract specific types of links or attributes from URLs. In this article, we’ve explored three example use cases: extracting internal links, extracting links from specific pages, and extracting links with specific attributes. With practice and patience, you’ll be able to unlock the full potential of regex in Screaming Frog for your link analysis tasks.