We have a client we are assisting in getting out from under a partial matches manual action penalty assessed by Google’s webspam team. Our client had not setup an account in Google Webmaster Tools until very recently so wasn’t aware of this issue until a couple weeks ago. The Webmaster Tools manual action section denoted unnatural links pointing at our client’s domain were causing this problem, and the webspam team were zeroing in on certain pages within the client’s site rather than assessing an all encompassing site wide penalty.

Upon doing a thorough link audit on the site, it came back surprisingly clean. Nothing stood out that was particularly suspect. It was a very clean link profile with lots of naturally earned links. We spoke to the client about their prior link practices, and they informed us that many years ago they had sponsored some link block ads which could have come into play with this penalty, but those ads had been disabled for over a year at this point. Thinking these paid links could be our culprit, we decided to go ahead and submit a reconsideration request to see if this was simply an old penalty that had yet to expire out.

We waited two weeks for the webspam team to get back to us, but surprisingly the unnatural links issue was still lingering. They posted three link examples they thought were artificial in nature. Curiously, every page contained the same content. We did a quick search for the article on the client’s site, thinking other sites might have been scraping it. Nothing was found. We then did a search on the first paragraph from these referenced articles in Google and found ten pages of results. The second link in the search results was an article from February from the Huffington Post.

Our client is an authority in their space and Huffington Post had linked back to one of their pages within the article along with several other authoritative sites operating in that genre. Getting a link from Huffington Post would normally be great news. What the client didn’t realize was 20-30 sites would come in behind that publish command and scrape Huffington Post’s content (via software or through Huff’s RSS feed), posting that content verbatim on the extremely low quality scraper sites thus triggering this manual penalty.

It took about five minutes of research to pin down exactly what was going on here so it was difficult to fathom that the webspam team would have failed to do a little research to arrive at the same conclusion (i.e. this issue was out of our client’s control and certainly not a penalty worthy offense).

We are not in the business of telling Google they are wrong. We got to work rounding up those low quality Huffington Post scraper domains (ex: new2000.com, gadzuess.com & buzzweep.com to name just a few), and promptly submitted a disavow file on the domain level for all of these scraper sites.

We also started doing whois research to pinpoint who owned these trash domains in order to contact them about removing these links, but the majority were shielded under privacy protection so the success of that upcoming outreach can be labeled as slim at best.

After submitting the disvow file to Google, we scripted a new reconsideration request — outlining the scraper problem, listing the steps we’d taken to disavow these domains and remove the offending links as well as our client’s role, or lack their of, in this link spam. I’ll update the article on the resolution to this issue one we hear back from Google’s webspam team in a few weeks.

We had not seen a penalty of this variety before. Usually with scraper sites, they are reprinting content from a client’s domain thus devaluing their original content and often triggering a Panda penalty in the process. Scraping content from a major news hub like Huffington Post and penalizing sites who the scraper sites link out to (based on those sites being very low quality blogs) could potentially affect a lot of sites on the web — high quality sites at that. Hopefully, Google can take this into consideration in future algorithm releases and through evaluations by their review team to lay the blame where it truly lies — on the scraper sites.

Image Courtesy of HuffingtonPost.com