What is Google Crawl Budget ?
The number of times a search engine spider crawls your website in a given time allotment is what we call your “crawl budget.”
Ensure Your Pages Are Crawlable
Your page is crawlable if search engine spiders can find and follow links within your website, so you’ll have to configure your .htaccess and robots.txt so that they don’t block your site’s critical pages.
Of course, the opposite is true if you do want to prevent a page from showing up in search results. However, it’s not enough to simply set your Robots.txt to “Disallow,” if you want to stop a page from being indexed. According to Google: “Robots.txt Disallow does not guarantee that a page will not appear in results.”
If external information (e.g. incoming links) continue to direct traffic to the page that you’ve disallowed, Google may decide the page is still relevant. In this case, you’ll need to manually block the page from being indexed by using the noindex robots meta tag or the X-Robots-Tag HTTP header.
noindex meta tag: Place the following meta tag in the <head> section of your page to prevent most web crawlers from indexing your page.
X-Robots-Tag: Place the following in your HTTP header response to tell crawlers not to index a page.
Use Rich Media Files Cautiously
However, even if Google can read most of your rich media files, other search engines may not be able to, which means that you should use these files judiciously, and you probably want to avoid them entirely on the pages you want to be ranked.
Avoid Redirect Chains
Each URL you redirect to wastes a little of your crawl budget. When your website has long redirect chains, i.e. a large number of 301 and 302 redirects in a row, spiders such as Googlebot may drop off before they reach your destination page, which means that page won’t be indexed. Best practice with redirects is to have as few as possible on your website, and no more than two in a row.
Fix Your Broken Links ASAP
If what Mueller says is true, this is one of the fundamental differences between SEO and Googlebot optimization, because it would mean that broken links do not play a substantial role in rankings, even though they greatly impede Googlebot’s ability to index and rank your website.
That said, you should take Mueller’s advice with a grain of salt – Google’s algorithm has improved substantially over the years, and anything that affects user experience is likely to impact SERPs.
Set Parameters on Dynamic URLs
Spiders treat dynamic URLs that lead to the same page as separate pages, which means you may be unnecessarily squandering your crawl budget. You can manage your URL parameters by going to your Google Search Console and clicking Crawl > Search Parameters. From here, you can let Googlebot know if your CMS adds parameters to your URLs that doesn’t change a page’s content.
Clean Up Your Sitemap
XML sitemaps help both your users and spider bots alike, by making your content better organized and easier to find. Try to keep your sitemap up-to-date and purge it of any clutter that may harm your site’s usability, including 400-level pages, unnecessary redirects, non-canonical pages, and blocked pages.
The easiest way to clean up your sitemap is to use a tool like Website Auditor (disclaimer: my tool). You can use Website Auditor’s XML sitemap generator to create a clean sitemap that excludes all pages blocked from indexing. Plus, by going to Site Audit, you can easily find and fix all 4xx status pages, 301 and 302 redirects, and non-canonical pages.
Make Use of Feeds
Feeds, such as RSS, XML, and Atom, allow websites to deliver content to users even when they’re not browsing your website. This allows users to subscribe to their favorite sites and receive regular updates whenever new content is published.
While RSS feeds have long been a good way to boost your readership and engagement, they’re also among the most visited sites by Googlebot. When your website receives an update (e.g. new products, blog post, website update, etc.) submit it to Google’s Feed Burner so that you’re sure it’s properly indexed.
Build External Links
Now, in addition to Crowe’s excellent point, we also have evidence from Yauhen Khutarniuk’s experiment that external links closely correlate with the number of spider visits your website receives.
In his experiment, he used our tools to measure all of the internal and external links pointing to every page on 11 different sites. He then analyzed crawl stats on each page and compared the results.
Maintain Internal Link Integrity
While Khutarniuk’s experiment proved that internal link building doesn’t play a substantial role in crawl rate, that doesn’t mean you can disregard it altogether. A well-maintained site structure makes your content easily discoverable by search bots without wasting your crawl budget.
A well-organized internal linking structure may also improve user experience – especially if users can reach any area of your website within three clicks. Making everything more easily accessible in general means visitors will linger longer, which may improve your SERPs.