Avoiding Duplicate Content with Robots.txt
Wednesday, July 30, 2008
Anyone who has spent some time researching organic Search Engine Optimization knows that avoiding duplicate content is both a must, and a struggle. In an effort to provide the best search results possible - Google makes an effort to provide only the original version of a document in the search results.
This is carried out algorithmically by a specially designed Google bot which scours the web looking for duplicate content, and then trying to figure out which is the original version. Those documents deemed to be unoriginal duplicates are de-indexed from Google.
As the SEO business is especially prone to rumors, there are a number of people who will tell you that having duplicate content on your site will cause Google to penalize you. This is not technically true - when Google finds duplicate content it removes it from the Google index. Some may argue that this is a penalty in itself, it does not directly harm search engine rankings like a real Google penalty.
Although this is an effective way to fight webspam artists who scrape wikipedia and other sites for content, it can often cause problems for webmasters and here’s how:
Assume you make an awesome blog post on your site www.example.com and it gets a lot of attention. Great job! Now lots of people are noticing example.com and even linking to you in their blogs. But there is trouble brewing… your blog post is located at http://www.example.com/blog/greatpost/ but your blogging system keeps another copy at http://www.example.com/blog/archive/greatpost/
Google’s duplicate content bot determines that the location at http://www.example.com/blog/archive/greatpost/ is the original and indexes that, nixing the main version from the Google index. This is a BIG PROBLEM because nobody has linked to the version in the archive, and it therefore has no PageRank, and can’t rank well in Search Engine Result Placements - despite the fact that it is the only version of your post visible in Google search results.
So here you have created a great blog post, had a lot of attention, and now you don’t get any SEO benefit because of duplicate content rules - Talk about frustrating!
How could this have been prevented?
Perhaps the quickest and easiest way to avoid duplicate content issues is with the Robots.TXT file
by adding the line:
Disallow: /blog/archive/
to the site’s robots.txt file, search engines will know immediately not to index any files in the /blog/archive/ directory. Not only will this solve the issue above, but will prevent this from ever happening again.
It is important when building a site, especially when it contains a CMS system like a blog, to consider all places where content may be duplicated. A decision should be made at that point of whether to block those pages from being indexed in the robots.txt file.
Posted by Quentin Muhlert on 07/30 at 10:44 AM(0) Comments • Permalink
Microsoft in Quest for Search Ad Dollars
Monday, July 28, 2008
Last week Microsoft announced it has expanded its ad deal with Facebook. Currently the exclusive seller and manager of all Facebook’s display ads, Microsoft will soon provide search and paid listings to the social networking site.
Currently Facebook provides search for its users so that they may seek out profiles and find friends through its current search function, however; does not offer web-wide search capabilities.
Compared to its main search competitor, Google, who does about 500 billion search queries a year, Microsoft has recently described its search business as “mission critical”. Presumably, Facebook would help bolster search query volume for Microsoft by “up to 40 percent year on year” and increase Microsoft’s search revenue by getting more search queries and attracting more advertisers...Read full story
Posted by Chris Breikss on 07/28 at 09:18 AM(0) Comments • Permalink
Google Rumored to Buy Digg for Third Time
Thursday, July 24, 2008
Google and Digg have apparently signed a letter of intent and are close to closing a deal that will make Digg part of the Google News empire.
The deal is reportedly set at the $200 million dollar range and would be a huge competitive blow to Microsoft, who has an advertising deal with Digg. This is the third time the “Google Digg” rumor has surfaced; however Digg has yet to publicly comment or “digg” this rumor.
Why would Digg be worth $200 million to Google? The auction for the social search engine would have strategic value for Google. Google would be able to build a superior Digg voting algorithm, Digg apparently can’t. Ask Google execs what the near-term future of search looks like and they’ll answer “social search.”...read more on the value of Digg for Google
Posted by Chris Breikss on 07/24 at 09:00 AM(1) Comments • Permalink
Google Ad Planner Explained
Monday, July 21, 2008
The team at Google recently held a webinar on Google Ad Planner which covered the how, why and what it can do for you and your online advertising campaigns:
What it is: Ad Planner is a free media planning tool which shows 30 days worth of data which is frequently updated (at least once per month) so you can easily build media plans for yourself and your clients.
Why is this tool useful?
Google Ad Planner allows you to:
- Research sites by demographic (age, sex, education, income) and online behavior
- View detailed site data such as user demographics, other sites users visited, keywords searched to arrive at site.
- Compare selected sites and show aggregated data as to how many UV’s & Page Views you will reach as well as aggregated demographic data for users.
- Find the niche sites that fit well with your target demographic
- Find sites relevant to your audience if you have little understanding of where they surf online
- View sites on an international scale, millions of sites providing greater visibility over and above the top sites in your area
- Undertake site discovery & research within minutes
- Create an entire online media plan in minutes
The focus of Google Ad Planner is purely on media planning. It takes a multi data approach using both internal and external data sources including demographic data from a leading audience measurement provider. It provides data from any website on the web that people spend time on - not just on the Google content or search networks.
Currently, Google Ad Planner is still in Beta testing. To register for the Beta, visit the Google Ad Planner sign up page.
Posted by Chris Breikss on 07/21 at 12:59 PM(0) Comments • Permalink
6S On The Cover of Yaletown Magazine!
Friday, July 18, 2008
6S Marketing was recently featured on the cover of Yaletown Magazine...how fitting! Since its first day of operation, out of the apartment of 6S President Chris Breikss, to a trendy loft-style office space tucked away atop some of Yaletown’s hottest attractions, the company has never moved from the Yaletown neighborhood, “I have been living and working in Yaletown since 1999” says Breikss.
In November of 2000, 6S Marketing Co-Founders and Directors Chris Breikss and John Blown left their full-time jobs to pursue their dream of owning their own business. At a time when the rest of North America was skeptic of the Dot.Com industry, Chris and John were determined to form 6S Marketing to provide companies with a solution for marketing themselves on the web. “Our philosophy from the start was to offer excellent customer service and internet marketing solutions to our clients,” explains Chris.
Having recently doubled in size, 6S has grown from 2 employees to over 20 full time employees in under 8 years and shows no signs of stopping. Focusing purely on internet marketing over the years has allowed 6S to carve out a position as industry experts and is now one of the largest internet marketing firms of its kind in North America...Read Full Story
(0) Comments • Permalink
How To Market To Canadians
MarketingSherpa released a report last week on the language, cultural, location and regulation differences faced by companies when marketing to Canadians. The report includes challenges, survival tips, promotion examples and plenty of tips for making websites appeal to Canadian consumers.
The report also covers why right now is the perfect time for U.S companies to start marketing to their Northern neighbours. “The economy is strong and Canadians feel comfortable buying luxury items, whereas there’s been a real pullback in the U.S.,” says Chris Breikss, President of 6S Marketing in Vancouver.
With the Canadian dollar on par with the U.S. dollar, now is the time for marketers to reap the rewards of Canada’s new-found spending power - and its’ under-served online advertising market which usually means better rates for advertisers...Read full story.
Posted by Chris Breikss on 07/18 at 08:00 AM(0) Comments • Permalink
Google’s Algorithm Explained, Sort Of
Thursday, July 17, 2008
As the search marketplace is becoming more competitive and refined, the big players like Google are starting to lower the shroud of secrecy that surrounds their most valuable commodity—their algorithm.
The other day, Amit Singhal, an Information Retrieval (IR) researcher and Google Fellow opened up and began explaining some of their search ranking techniques. While none of this is going to blow the doors of the Search Marketing world, the intended transparency is nice.
After stating that Google intends to “delve deeper into the technology behind [their search service] in a later post”, this one dealt mainly with Google’s philosophy.
The three main ideas in that search philosophy are as follows:
1) Best locally relevant results served globally.
2) Keep it simple.
3) No manual intervention.
But, I can hear you asking, if there is no manual intervention by Google, how are site’s getting banned, penalized, punk’d, sandboxed, and otherwise pushed into the search oblivion? Google, through Amit, provides the answer for that one by stating that the subjective human element that goes into the search-o-sphere is done through the creation and linking of pages. The intervention comes from algorithmic refinements that counteract imperfect search results.
As this is really just a delicious taste from Google’s secret recipe, any search geek certainly welcomes the scraps from algorithmic table.
From the post, we can see more of Google’s desire for quality in listings, and the importance of the content, linking and user-directed results in order to rank well. As search advances, it’s getting harder and harder to “game” the system, so long-term results are best based on the guidelines and philosophies of the most important rule-maker.
As usual, to check this out in full, head over to the Official Google Blog.
(0) Comments • Permalink
Web Analytics Unleashed
Wednesday, July 16, 2008
Tired of having a huge amount of data in your Analytics program and not knowing what to do with it? Join leading Analytics author and evangelist Avinash Kaushik for his upcoming webinar: 3 Things To Die For: Web Analytics Unleashed taking place on Thursday, July 17th at 9 am PDT.
In this webinar, Avinash will cover three specific areas you need to focus on to get the highest possible ROI from you Web Analytics investment, including:
- The fun clickstream analysis you should do
- Interesting ways to listen to your customers
- Why you need to fall in love with experimentation!
An added bonus: All attendees qualify to win 1 of 5 autographed copies of Avinash’s top-rated book, “Web Analytics: An Hour a Day”.
Avinash Kaushik is Google’s analytics evangelist, and is also the co-founder and chief education officer of Market Motive and is on the board of directors of University of California Irvine and several companies.
So make the most of your Analytics investment and register for the Web Analytics Unleashed Webinar!
(0) Comments • Permalink
Designers Rejoice, Google Crawls Flash
Monday, July 14, 2008
Before you can walk, you need to crawl. And before you could design large portions of your site with Flash, Google needed to learn how to crawl...and index...and ... you get the picture. Well, they did it.
That’s right, a partnership between Adobe and Google and Yahoo, has opened the door for proper search indexing for dynamic content and rich internet applications.
Identifying user embedded information, the search engines will be able to use the Adobe Flash Player Technology to see what’s actually going on in the SWF (Flash file format) files—something that was previously impossible.
As a result, SEO-conscious sites would often forgo prettier design elements for key content in favour of simple, easy-to-access content. Now that this content is no longer invisible (or at least very difficult to find) in the eyes of the engines, search quality will improve.
Is this going to dramatically effect the SERPs (Search Engine Results Page) that you’re used to? As you know, it’s difficult to immediately shoot up the results pages, even due to algorithmic refinements—but this does give Flash-heavy sites to be indexed, deemed relevant and given the credit that’s due to them in automated search.
Because this is a new element to Google’s algorithm, expect some changes in it as it matures. Adobe is committed to continually working with the top two engines, and expect the others to follow suit. It is, after all, in Adobe’s best interest to enable the search functionality of one of it’s most popular products.
Look for more information of Flash optimization to be appearing soon. For more from the big guys themselves, check out the Official Google Blog Post and Adobe’s press release
Crawl, Google, crawl. Aww geez. Someone get my camera. They grow up so fast.
Posted by Chris Breikss on 07/14 at 11:43 AM(0) Comments • Permalink
Google’s Improved Ability to Index Flash
Friday, July 11, 2008
This blog post is important for anyone with an interactive, beautifully designed website built in Flash. Google announced recently they have greatly improved their ability to index Flash.
In short, Google has been developing a new algorithm for indexing textual content in Flash files of all kinds from menus/buttons/banners, to full self-contained Flash websites. Recently, they’ve improved the performance of this Flash indexing algorithm by integrating Adobe’s Flash Player technology.
Currently there are multiple technical limitations (ie: there are difficulties with Flash content written in bidirectional languages; Google currently does not attach content from external resources that are loaded by Flash files; Googlebot does not execute some types of JavaScript, so if the page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed). That being said, these limitations are already in the process of being resolved...Read more.
(0) Comments • Permalink
Yahoo Introduces “BOSS” To Boost Its Search Business
Thursday, July 10, 2008
Yahoo is about to launch an innovative platform that lets anyone build a customized search engine atop the Internet company’s technology. The service, which enters public beta testing on Wednesday, is called BOSS (Build Your Own Search Service) and will be free to use; however, those advertisers who succeed in using this new platform successfully will be required to show search ads eventually.
With BOSS, a programmer, or anyone with a website can build an independent site with a search box, pass users’ queries to BOSS, process the results returned by Yahoo’s search engines in any manner, and display results....Read full article on BOSS
(0) Comments • Permalink
Friday Fun With “Google Suggest”
Friday, July 4, 2008
Have you tried Google Suggest? It’s a free tool that will auto-complete your search query and offer suggestions in real time. Similar to Google’s “Did you mean?” application that offers alternative spellings for your search query, this tool offers a list of refinements when searching for one word. For example, if you type in “spa”, Google Suggest will possibly suggest “day spa” or “car spa”. Similarly, if you type in only half a word, such as “prog”, Google Suggest might offer refinements such as “programming” or “progressive”.
But the best thing about Google Suggest is what people are actually using it to search for. Check out these screen-shots for a little Friday Fun!
Posted by Chris Breikss on 07/04 at 09:00 AM(0) Comments • Permalink
Google Website Optimizer Cartoons
Thursday, July 3, 2008
These are some funny Google Website Optimizer related cartoons an artist/copywriter by the name of Sean D’Souza did during one of Tom Leung’s (from the Google Website Optimizer Team) presentations at a conference in Chicago.
http://websiteoptimizer.blogspot.com/2008/07/you-know-testing-is-going-mainstream.html
Simple but worth a glance, I especially like the one for “bad landing pages”.
(1) Comments • Permalink
MSN adCenter Desktop
Microsoft has launched a new tool for their adCenter advertisers called MSN adCenter Desktop.
MSN adCenter Desktop is Microsoft’s answer to Google’s AdWords Editor , which has been a complete success in the Internet Marketing community for its easy-to-use campaign management capabilities since its public debut on January, 24th, 2006.
Since then, Online Advertisers have only had AdWords Editor to work with when wanting to create an online campaign fast and effectively.
Enter adCenter Desktop, a campaign management tool which extends the capabilities of Microsoft online tools for adCenter, especially in the areas of bulk campaign management, and the ability to create campaigns through a simple interface - saving time and allowing adCenter advertisers to create and manage campaigns more efficiently.
Some features of the adCenter Desktop:
These basic features are also found in AdWords Editor; however, there are some new features not present in the competition for Microsoft to boast about:
MSN adCenter Desktop is currently in beta and only available for advertisers who sign-up in the pilot program. To be considered for participation in this beta pilot program, there is an online form available.
Posted by Chris Breikss on 07/03 at 10:52 AM(1) Comments • Permalink
Google North: Google.com vs. Google.ca
Tuesday, July 1, 2008
Effectively Targeting Your National Audience
To celebrate Canada Day today I thought that I would write about the differences between search results in Google USA (google.com) vs. Google Canada (google.ca).
When looking at SERPs (Search Engine Results Pages) from Google, one consideration that is often overlooked is which Google is doing the searching—in this post, we’re going to look at SERPs from Google.com vs. Google.ca.
To illustrate my point about how different the search engine results pages can be, I’ll be using some examples from general search terms. The first search results pictured are from Google.com (using a reliable American proxy) and the second results are from Google.ca (using my normal internet connection from Vancouver, BC, Canada).
The Search Terms:
- “news”
- “soccer balls”
News
Google.com identifies CNN.com (with site link directory), Google News, MSNBC and Fox News as the top information sources online. (I love the “Related searches: wrestling news” suggestion. I mean, it’s not like there’s a political race, or other events going on or anything).
When we check out the Google.ca results, we see a strong Canadian showing—kicked off by the CBC (site link directory), CTV, Google News, CNN and Google News Canada.
With this general term as an example, we can see who Google identifies as the top “news” sites for the .com and .ca results pages. In these instances, we can say that the Big G is identifying these top-ranking sites as not only the best results for the user, but the best search results for a user in that area.
Soccer Balls
While this probably isn’t the top term you’re going after, it does further show how different the results can be. Specifically, take note of the limited competition in the PPC side of things. In addition to the focussed sales portion of the .com pay per click listings, look at the discrepancy of e-commerce sites vs. informational sites from the .com to .ca.
This will show that in competitive ares of e-commerce, there can be available space in nationally-targeted areas of search.
The Size and Scope of the Market
Of course, Canada is a much smaller market that the US—Canadian pop: approx. 33 Million vs. approx. 301 Million in the states (as of July 2007)—but it can be important as a Canadian company to focus your efforts on the home front before looking internationally.
While a lot of Canadian searchers are used to typing in “Google.com” because of that international and ubiquitous branding, but after identifying user IPs, the redirect goes to Google.ca—which supplies the results from above, even when the “search the web” field is selected instead of the “search pages from Canada”.
How to Capitalize On This
Basic techniques for going after national SERP results
- country specific domain (get the .ca)
- geo-targetted ppc (go after those polite searchers above the 49th parallel)
- nationally specified content and meta information (Be Canada’s Sports Leader)
Canada is, in many ways, a relatively unsophisticated search market. By using a .ca domain, a company has a great advantage in geo-targeting and getting a leg up on competition that is slow to follow, or going after a general online audience.
As a canadian company, with a proper search and internet marketing strategy, you’ve got a great chance at really cornering the market at home. Having a nationally focused site drills down your niche even further—while it can, in some ways, limit your audience, it does give you the opportunity to dominate the .ca SERPs. With first page results at home, it’ll be a lot more comfortable to take the show on the road.
Posted by Chris Breikss on 07/01 at 01:37 PM(0) Comments • Permalink

- The Bond Between Microsoft and Facebook Strengthens
- 2D Bar Codes
- Privacy Mode for IE
- Changes To The Way Quality Score Is Determined
- noodp meta and noydir meta: Why the fancy tags?
- Google Updates Content Network
- Google Global & Customized Search
- Customized Google Search Results
- Avoiding Duplicate Content with Robots.txt
- Microsoft in Quest for Search Ad Dollars
