Regular evaluation and maintenance of your website’s Google Indexing or “Google Garden” can be an easily overlooked practice by webmasters and SEO’s alike. It is typically not a top priority for optimization if there are other pressing issues at hand, and it also may be deemed that “the more pages you
have indexed in Google, the more search traffic you may receive.”
However, robust websites with a wide array of files, frame technology usage, and secure pages that get regularly crawled and indexed by Google can easily succumb to a growth in “weed-like” indexing results. These weeds in the garden typically offer little to no value to your search audience, and can suck away the nutrients from growing your main crop, which is your website’s core pages – the ones you really want your search audience to feed on!
Below are examples of the weeds that can suck away nutrients from your searched Google Garden, and prevent it from flourishing:
Unoptimized PDF files and Microsoft Office Documents (doc, xls, ppt)
How many times have you been served up a PDF search result that is “Untitled” and doesn’t have a proper title or description?
PS files (PostScript) and EPS files (Encapsulated PostScript)
These file types are rarely found on websites, but they do get indexed by Google.
How often do you search for flash files???
Frames and iFrames
Clicks on these indexed results can lead people to pages without navigation.
Secure https:// website pages
Google can index pages of your secure site instead of your main public site.
Which version do you want your visitors to view?
Pages with parameters for link tracking
Google can index these pages as duplicate content.
Example: http://www.domain.com/page.htm?link=contact can be indexed and it has the same content as http://www.domain.com/page.htm
Low-value / Low-traffic pages
Does your audience truly benefit from visiting these pages from a search engine?
Monthly archives, category, tag/label, and search pages get frequently indexed by Google.
Do you really want them saturating your overall index and competing with your main blog postings?
Is your garden already overgrown with weeds? Do you want to remove a few sprouting dandelions before they get out of control? Here are some basic tips to keep those pesky weeds from overtaking your Google Garden:
Perform a site:www.domain.com query on Google
- Evaluate the presentation of the results in terms of the keywords, branding, and call-to-action
- Make sure you have addressed the basic SEO elements on ALL pages (Page Titles and Meta Descriptions)
Analyze your webstats
- Determine pages with low keyword traffic and overall low search value to remove from the Google Index
Clean up your sitewide internal linking
- Example: Change http://www.domain.com/directory/index.htm to http://www.domain.com/directory/
- If the directory name contains keywords that can be searched, it’ll stand out more if it’s not followed by /index.htm
- This also cuts down on page source code and makes your links look cleaner in the search result listings
Sculpt your site’s template links with nofollow tags
- Identify low-value pages on your site that you do not want indexed, and add a nofollow tag on links to those pages (Wikipedia says that this is ‘what nofollow is not for’ – but it is another technique outside of the robots.txt to ensure these pages do not get crawled and indexed)
- This will improve your internal link structure and give extra weight to your main pages
- For more information read SEOmoz’s post on sculpting with nofollow tags
Optimize Titles for all PDFs and Microsoft Office Docs
- Don’t overlook this simple step as these files rank for keyword searches and can receive quality traffic if optimized properly
Create an XML sitemap file and keep it updated
- If you don’t want a page or file on your site to be indexed, remove it from this file (however this does not guarantee that page or file won’t be indexed)
- Visit sitemaps.org to view proper protocol
Utilize Google Webmaster Tools
- Enough said. Keep on the lookout for new tools
The robots.txt file is your friend
- Visit robotstxt.org for information on proper formatting
- Create a separate robots file for https:// site and disallow duplicate pages that also reside on the http:// site (to help ensure these secure pages do not get indexed instead of the http:// pages)
Clean up your index!
- Upload a clean XML sitemap with pages you want indexed
- Upload a robots.txt excluding pages you do not want indexed (see examples above)
- Add a meta noindex tag to pages you want removed
- Submit a URL removal request in Google Webmaster Tools
Keeping these tools readily available in your garden shed and using them when necessary will help keep your Google Garden free of weeds, will allow your cash crop to grow big and bountiful for your search audience to feed on. Remember to keep planting new seeds (valuable content) each season to expand your garden!