Creating Marketing

What You Need to Know About Search Engine Indexing

What You Need to Know About Search Engine Indexing
Written by John Boitnott

Google can produce tens of millions of results for most any question or search term you can come up with. But before it can do that, it has to index the pages that provide the necessary information.

If your site’s pages aren’t being indexed properly, they may not show up in search engine results at all. Understanding how search engines like Google index the millions of sites and pages available on the web can also help you improve your page rankings in search results.

How search engine indexing works

How search engine indexing works

Publishing the best and most valuable content in your niche won’t do you a bit of good if your audience never knows you’re there. Most site owners know this, and that’s why SEO is such an important aspect of content marketing.

Yet, before your SEO efforts can begin to bear fruit, the search engines must first know your site and its pages exist. The search engine bots must also be able to access and crawl your pages, scanning them for relevant content.

This is where search indexing comes in. While the specifics are more complex, the basic tasks involved are fairly simple:

  1. Search engines send out pieces of code (bots, crawlers or spiders) to ferret out new or updated content on the web.
  2. As the search engines crawl through a new site or page, they make note of any outgoing links they’ll need to crawl as well.
  3. The search engines then index all of the new text content they’ve found and input that content into a huge database.

From there, when a user inputs a new search term or keyword, the search engine analyzes the content in the database and produces a ranked list of pages in response to that keyword. These are called SERPs or search engine results pages. The goal of SEO is to help your page show up in those SERPs in higher positions (typically, the first 10 results).

The best content on the web won’t help your brand if search engines can’t index and rank it accurately. @clearvoice #SEO #ContentMarketing Click To Tweet

How search engines rank content

In order to fully understand indexing, it’s important to have a basic idea of what the various search engines such as Google and Bing look for in a piece of content. Essentially, they gauge your content according to an algorithm, a piece of code that establishes a set of rules by which the search engine’s ranking function operates.

The specific factors that are weighted more heavily in the algorithm are considered highly proprietary and as such are well guarded. However, they’re not quite as secretive as they used to be, thanks to constant experimentation and official releases. These days, we can be pretty sure what Google is looking for when it ranks your pages along with your competitors’ content for relevant search terms.

These factors are aimed at helping an objective process run by computer code make what can often be subjective evaluations. Which page most thoroughly answers the question posed by the search term? By analyzing each page according to these objective metrics or factors, the search engine can produce a fairly reliable ranking of authoritative content.

How to optimize your site's indexing in Google

How to optimize your site’s indexing in Google

Usually, the search engines have no trouble crawling new content as you publish it on your site. However, in some cases, the software that’s trying to crawl your site’s pages runs into difficulties stemming from one technical issue or another. Make sure Google and other search engines are properly indexing and evaluating your pages with the following steps.

To ensure the search engines are indexing your content properly, start at the source with Google’s Search Console. #SEO Click To Tweet

Check whether Google has indexed your URL

Your first step should be to go straight to the source. Google’s Search Console will tell you how many pages it’s indexing on your site and whether any errors exist that are impacting its ability to reach the rest.

  1. Start by signing in to Search Console.
  2. Make sure you’ve added your website to your account and that the correct domain URL has been selected if you have more than one website linked to your account.
  3. Scroll down to the “Coverage” panel and click “Open Report.” This page will show you whether there are any existing indexing errors and what they are.

Changes made to the page since Google accessed it might change your results. In that event, you can also perform a live URL test, as long as the page in question can be readily accessed without any sign-in or password. This test will establish whether or not Google can access your URL, not whether it’s been indexed. Simply access the index inspection tool and then click “Test live URL.”

Create and submit sitemaps to help Google do its job more effectively

An XML sitemap lists out all the pages on your website as a guide or digital roadmap. Search engine bots can then use the sitemap to more readily locate and index your website’s pages. Creating an XML sitemap may seem like an intimidating prospect if you’re not a coder or website developer. In reality, multiple different tools are available to make the process almost instantaneous, especially if you’ve built your site using WordPress.

No matter how you create your XML sitemap, you can take a few steps to ensure your site is optimized for the best possible indexing results. First and foremost, you should ensure your site’s information architecture is solid—specifically, your site’s page organization and navigation.

One popular and often-cited rule of thumb is that you should ensure every page on your site requires no more than three clicks to access from any other page. However, studies tend not to support the three-click rule. Instead, focus on making your site’s navigational tools user-friendly and intuitive with clear labels. Consider adding wayfinding tools such as navigational breadcrumbs to make it as easy and frictionless as possible for your user to find what they’re looking for and achieve their purpose in visiting your site.

Create and optimize your robots.txt file, if necessary

For many websites (if not most), a robots.txt file isn’t necessary. However, creating a robots.txt file is useful if you’ve tried other strategies yet are still having difficulty getting your important pages indexed.

This may indicate what’s known as a “crawl budget” problem, which can be resolved by directing search engine bots away from your unimportant pages or duplicate content. Simultaneously, a robots.txt file can instruct search bots to pay attention to the more mission-critical pages on your site.

To create your site’s robots.txt file, follow these simple steps:

  1. Open up a plain text file using whatever app you prefer, as long as it creates plain text files. If you’re using a Mac, select the TextEdit app; in Windows, open the Notepad app.
  2. Type in the applicable code. To instruct all search engine bots to avoid indexing the files and directories you’re about to specify, use an asterisk wildcard: User-agent: *
  3. Then type in the folders and files you want to keep out of the index: Disallow: [file path]. For example, if you have a page in your main root directory for your site with the URL of weirdstuff.html, you’d type: Disallow: weirdstuff.html. If the file in question is located in your Blog folder, you’d type: Disallow: blog/weirdstuff.html. To allow specific files, use the similar code syntax: Allow: about.html. Note that these are just suggestions. If you’re creating your robots.txt file from scratch, consult Google’s helpful file for web developers.
  4. Save the file and upload it to your root directory with the robots.txt file name.

To test your code before you upload it, try entering it via copy and paste into the Google robots.txt test tool.

Know when to use a 301 redirect

Whether you’ve changed domains, finally gotten around to converting your site to the security of https://, merged content, or simply optimized your URL structure for existing content, it’s important to properly direct both your visitors and the search engines to the right URL. That way, you can be sure the search engines are indexing only the most current version of your content.

The best practice in these circumstances is to establish a 301 redirect so that hits on the old URL will automatically be redirected to the new, preferred URL. That way, the transition to the new content is seamless and search engines will “see” only the most current page.

Depending on the way your site is structured, properly establishing 301 redirects for multiple pages can be a technically complex project. You may wish to outsource the task to a web developer. However, if you’re using a CMS such as WordPress, you can most likely find numerous plugins or extensions that greatly simplify the process.

301 redirects help funnel traffic and search engines to the right URLs on your site. Learn more. #SEO #contentmarketing Click To Tweet

Check for broken links

Broken links in your website’s content can make a search bot’s job much harder and negatively impact your site’s SEO performance as well. To make sure your site’s links aren’t working against you, you’ll need to perform a site audit for broken links.

Many online tools are available to perform a broken link audit. Some will audit the entire site while others will focus on a single page on your site. While some of these tools charge a fee, there are also many such as DeadLinkChecker and the W3C Link Checker that are free to use. Once you have a list of broken links, you can then go through the website to fix or remove them.

Eliminate flash

Flash is considered an outdated format for web content. Many mobile devices won’t even support Flash content, and if you’ve embedded text-based content or a link into a Flash file, the search engine bots will likely ignore it altogether. It’s better to move away entirely from Flash, many experts suggest.

About the author

John Boitnott

A journalist and digital consultant, John Boitnott has worked at TV, newspapers, radio and Internet companies for 25 years. He currently writes at ClearVoice, Motley Fool and He’s also written for Fast Company, NBC, Inc Magazine, USA Today and BusinessInsider, among others.

[if lte IE 8]
[if lte IE 8]
[if lte IE 8]
[if lte IE 8]