COMBATING SEO ABUSE
WEB DATA INVESTIGATION 101
How is it possible to spot illegal activity, namely the sale of counterfeit and fake goods on the internet?
HTML tags can be used for this purpose.
HTML tags or in other words metadata is our friend when attempting to spot counterfeits/phishing and so on. Its structured information that describes explains, or otherwise makes it easier to retrieve, or manage an information resource. Web pages often include metadata in the form of metatags.
Metadata or HTML tags are snippets of text that describe a page’s content; metatags don’t appear on the page itself, only in the page’s source code. Tags tell browsers and search engines everything they need to know in a language they can understand. Google actually uses metatags with the goal of obtaining information about a page’s content and, ultimately, including it in the search results. When brands, slogans, and trademarks are placed on a website—as visible text, hidden text, or within HTML metatags—they affect search engine rankings.
When someone other than the brand owner uses this type of SEO, they unfairly affect rankings. Also known as SEO abuse. Such abuse occurs when unauthorized parties use a brand as a keyword in search marketing, triggering ads that divert traffic to sites promoting unrelated, counterfeit, or competitive brands. In this example of search engine marketing abuse, a luxury goods e-tailer has used another brand’s trademarked name in its ad copy—and has inappropriately purchased the corresponding keyword. That search ad leads to a site that is likely selling counterfeit goods.
These are just a few examples of SEO manipulation, that online abusers like to use. In the majority of cases, scammers, and web shops selling counterfeit goods tend to use popular keywords in the HTML tags which attract attention, like the brand’s name plus the words like an outlet, sale, and 50% off. Let’s see how we can spot that activity using the Domain Research platform.
Using HTML tags to combat counterfeiters
Domain Research database contains the HTML data of over 1.4 billion sites. HTML data includes meta description, meta keywords, H1 titles and other tags that basically reflect up to 90% of the content of a particular site’s homepage.
So how do we use HTML tags within Domain Research for brand protection or marketing purposes?
Let’s say, you want to identify online stores that sell counterfeit goods. A good example might be googling a list of Top 10 Bestselling, Popular Clothing Brands in the World. One of them is Prada. So, let’s take a look at whether counterfeiters use this brand name.
As discussed before, it is safe to assume that they are going to have the brand name which they are ripping off together with words like sale or discount in their meta-information.
So that is what we are going to use.
Type the name of a brand and the word ‘sale’ into the HTML content filter of the Advanced Search tool you will be able to identify all the websites that contain these words in the meta-information.
We will look for All words in different sequences like ‘Prada sale’, ‘sale Prada’, there might be something else within this phrase, and we can also look for it as an exact phrase.
It is important to narrow your search by excluding certain keywords from the search, like ‘blog’ or ‘magazine’, or even the words ‘Devil wears’ since we are not interested in searching websites relating to the movie. By excluding these we will reduce the number of false positives.
DomainCrawler’s database for example contains 387 domains that have these keywords somewhere in metadata and these appear as a list of TLDs.
We can then choose to select one of the links. For example, 2882shop.com
After selecting the link, the first thing we have to do is to see where exactly the searched keyword is mentioned and which other brands might be there.
We can track and trace HTML content historically, including information about other brands, which can be useful in the case of joint action taken with another brand to shut down a website. From this link we see that Prada is used in all HTML tags, sometimes it is combined with the word ‘sale’, and we also can detect other brands used along with Prada-like: Gucci, Louis Vuitton, and others.
Using DomainCrawler’s Domain Research tool, we can investigate domains that have certain keywords in the HTML content by using specific keywords that you are interested in. We can see historical changes and reduce the number of false-positive results by specifying what we are looking for. From the results we can understand what SEO manipulations scammers are doing with the brand. Once we have a certain number of domains that meet our searched criteria, we are ready to take the next step which is using DNS records.