SEO 101: Mastery of Technical SEO

SEO Trends

It’s trusted by millions, and the Wikimedia Foundation deserves credit for implementing quality assurance, such as Featured Articles, and the Good Articles nomination process, to kaizen the academic accuracy of its content.

Perfecting an open-source encyclopedia takes time, as does perfection of technical SEO for large enterprise sites. If not for robust technical SEO, search engine spiders can’t crawl, index, and rank your thousands or millions of pages properly.

While filthy-rich content and high site authority play a significant role in Wikipedia’s SEO dominance, I argue their technical SEO plays the most important role, allowing it to rank for almost every informational keyword at the top or on page 1.

This is a tribute post that extracts lessons and praises Wikipedia’s platform for its SEO mastery, which also powers Wikipedia’s sister Project sites, such as Wikidata, Wikinews, and Wiktionary.

We’ll study various technical SEO techniques that helped Wikipedia rank that helped Wikipedia rank at scale on page one on a desktop, and on mobile:

  • Domain Setup / Internationalization
  • Sitewide Links / HTML Layout
  • Meta Descriptions
  • Page Templates / URL Formats
  • Site Architecture
  • Mobile-First Indexing
  • Page Speed
  • The Single Most Important Enterprise SEO Success Factor

DOMAIN SETUP / INTERNATIONALIZATION

Wikipedia’s mission is to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally.

The domain wikipedia.org is set up to support 300 desktop sites on sub-domains:

  • http://www.wikipedia.org (the global homepage)
  • En.wikipedia.org (English)
  • Fr.wikipedia.org (French)
  • And another 298 language sites

It’s also set up to support 300 mobile (m.) sites on sub-sub-domains, such as en.m.wikipedia.org.

The global home page www.wikipedia.org is the starting point. It invites bots and humans to browse its most popular Wikipedia sites by language.

Or in other words, to make information freely available to everyone.

The role of SEO in this mission: to help every single person in the world find and comprehend information on any subject in any spoken language. This is where a robust and scalable, international SEO strategy comes into play.

You can localize content in multiple languages using sub-domains and sub-directories on a single domain, or you can set up websites on multiple domains regionally using country code Top-level Domains, such as Wikipedia.in (India), Wikipedia.fr (France), and Wikipedia.gr (Greece) for example.

Between sub-domains, sub-directories, or ccTLDs, either of the 3 approaches can help you dominate international SEO if done correctly.

While Wikipedia sporadically uses ccTLDs, it primarily goes the sub-domain route to excel in ranking worldwide.

As you get into the Main page of each sub-domain, for example, the English site’s home page, you notice that Wikipedia continues linking to the equivalent Main page in foreign languages.

Every subsequent page, including the millions of Wiki articles, contains a link section where dynamically generated links point to the corresponding article in all other available languages.

Here’s the English page for The Office.

Google suggests you annotate links to articles in foreign languages by using the href lang and rel=”alternate” tags in the HTML link element in the header, in HTTP headers, or in Sitemaps.

Wikipedia may use their Sitemaps, but they certainly don’t use the headers to point to multilingual content.

SITEWIDE LINKS & HTML LAYOUT

Wikipedia codes their pages’ templates beautifully for spiders by prioritizing content and internal links in 2 ways:

  1. It doesn’t place any “important” links (i.e., to pages with a lot of SEOtraffic potential) in the <footer> section.
  2. It places sidebar links towards the bottom of the source code in the <body> section.

But why are these two things beneficial from a technical perspective?

Well, Google has stated that both sidewide and footer links are not given much weight—that is, they pass less link-equity (or PageRank). This is likely because such links are deemed to be less important for users. After all, when did you last click a link in a website’s footer? I can’t remember the last time I did.

Thus, Wikipedia chooses to place less-important links in their footer section. I.e., links to pages that don’t need to rank for any competitive keywords, such as their privacy policy, “about” page, and so forth.

But Wikipedia has another trick up its sleeve…they add a kind-of additional “sub-footer” section to each page. This contains a bunch of dynamically-generated links related to the overall topic of the page.

Because this sub-footer is dynamically generated for each page, none of the links are sitewide. Therefore, they don’t get devalued by Google.

And it’s a similar story with the links in the left sidebar.

Most links in the sidebar are for editors and users (i.e., for navigational purposes). And the sidebar is sitewide, so it makes sense not to include any important links (in terms of SEO) in this section.

But again, Wikipedia goes one step further…

Their page is coded in such a way that the sidebar HTML is placed towards the bottom of the source code (it’s still in the <body> section, but right at the bottom). This allows contextual links at the top of the page to receive more link equity, while the sidebar links are further demoted.

Lesson: Place links to rank-worthy target pages at the top of each page contextually, and using rich anchors. Place second-tier links in sidebars below your contextual links in the page source.

META DESCRIPTIONS

What meta descriptions? Wikipedia leaves them blank.

Wikipedia uses title tags in the header but ignores populating descriptions, which goes against all standard SEO advice. Not gonna lie, if I was consulting them I’d give them the same advice and plead my case, too:

You should have a keyword-rich description, with at least one call to action, to entice high click-through rates, which could indirectly lead to higher rankings.

I’d be wrong to give this advice. Creating a generic, one-size-fits-all description template or even encouraging contributors to update the meta description by hand doesn’t make sense for Wikipedia. Every Wiki article fits thousands of search queries.

Long-form Wiki articles like this one about the Sun, rank for over 4,900 keywords per Ahrefs Organic Keywords report. The best thing to do is leave the description blank, and let Google figure what snippets to display for any query.

Every Wiki article starts with the topic in bold and a simple sentence structure, clearly answering the who, what, where, or when, just as a meta description would do anyway:
This formatting could also help Wikipedia potentially improve its chances for featured snippets.

Here’s the search result for “What is the Sun?”

PAGE TEMPLATES & URL FORMATS

Wikipedia predominantly uses just one page template and URL format to rank pages — the Wiki article.

SITE ARCHITECTURE

If you dig into Wikipedia’s website architecture, here’s what you will find.

SECONDARY NAVIGATION

While Wikipedia uses other page templates, such as Portals, Categories, Lists, as well as editor-friendly pages you see in the left sidebar, these pages exist for secondary navigation and general information. These pages rarely rank for any search terms.

For example, Portals are topic pages that exist as additional entry points from the home page, such as Geography. A Portal seemingly exists for editors to click into the topics they’re interested in contributing. For bots, it’s like an index sitemap welcoming search engines into the world of all the Wikis.

Geography is both a Portal and a Wiki. Guess which one ranks better?

The Wiki!

As an SEO, you wonder “shouldn’t the Portal Geography page outrank the Wiki, as it’s one click away from the home page, the most authoritative page, and it contains good unique content?” The reasons are likely as follows.

While the Portals link to the Wikis, Wikis don’t typically link back to Portals. And all Geography Wikis link back to the Geography Wiki so overall, in terms of URLrating, the Wiki Geography page is stronger than the Portal Geography page.

In fact, the Geography Portal page URL Rating is 40 and it ranks for zero organic keywords.

 

But the Geography Wiki page URL Rating is 73. It ranks for 2.5K+ organic keywords.

Lesson: Both prominence and quantity of internal links determine which of 2 pages with similar content and the same target keyword outranks the other. Linking a page high up in the site hierarchy — even pages just 1 click away from the home page — doesn’t guarantee good rankings.

If Wikipedia pointed more links to the Portal Geography page from all relevant Wiki Articles, perhaps by using breadcrumbs, that page would likely beat the Wiki Geography page.

PRIMARY NAVIGATION

The primary way to navigate Wikipedia is to use its on-site search, or by clicking from Wiki to Wiki. Wikipedia’s contextual linking makes it easy for bots and users to browse the site. While the secondary navigation shown above uses a rather deep structure, the primary navigation comprises of a beautifully designed flat site architecture.

The five front-and-center content sections feature and constantly rotate timely or random Wiki articles. These articles contextually link to other related Wiki articles, and so on. Wikipedia doesn’t use mega menus or faceted navigation, as it doesn’t use a top-down categorization structure. It’s only 2 levels deep.

CATEGORIZATION

As an SEO, it’s perplexing to see that an encyclopedia with millions of articles that can easily follow a categorization structure.

You wonder how Google categorizes all this content and indexes it in neatly-ordered taxonomies. I mean, Wikipedia doesn’t use even use breadcrumbs, so how’s Google to create parent and child relationships of categories with articles?

Well, what Wikipedia teaches us is that child and parent relationships don’t matter if your contextual internal linking is super-relevant, abundant, and free of wasteful links (404s, duplicate/thin content pages, etc.).

Basically, Wikipedia treats every topic (categories, subcategories, and sub-sub-categories) as a Wiki article, and interlinks them all contextually.

In comparison, Encyclopedia.com’s top-down structure requires 4 clicks from the home page to get to an article. So they have to turn to faceted navigation and breadcrumbs to help reinforce the parent-child categorization.

INTERNAL LINK PROXIMITY

The Six Degrees of Separation is the idea that any person is connected to any other person on the planet by no more than 5 intermediary acquaintances.

Likewise, on Wikipedia, it takes on average only 4.5 clicks to get from a Wiki article to any other Wiki article.

One of Wikipedia’s greatest software functions is the ability for editors to easily cross-link to Wikis. Within the body of each article, you’ll notice that editors tend to hyperlink almost all concepts or subjects to the matching Wiki article. If you’re not using menus and breadcrumbs, this is the only way possible to establish strong link relationships across a site with millions of pages without using automatic linking software.

INTERNAL & EXTERNAL LINK COUNTS PER PAGE

Moz, for example, suggests you keep your links at roughly 150 links per page. Matt Cutts suggests keeping them at 100 so that you don’t overwhelm users with a poor experience. It’s widely believed, and for a good reason, that excess links on a page hurt PageRank distribution, and don’t do users any good. Most sites should stick to the 150 or below threshold.

Wikipedia wishes it could, but can’t.

The US page for The Office contains over 2,300 links.

  • ~100 sidebar links for editors and users to pages containing little to no value in ranking for non-brand terms.
  • ~225 (10%) external links and citations — the infamous ‘nofollow’ links SEOs love debating over.
  • ~150 links to foreign language versions of the Wiki article.

These make up roughly 20% of all the links on the page living at the bottom of the lower priority section in the HTML.

Rand Fishkin suggests that Google weighs links higher in the HTML with more weight than those lower in the HTML. I still believe this works today as an evergreen internal linking tactic.

SEARCH ENGINE CRAWLS

Question: When Googlebot starts crawling Wikipedia, does it ever finish?

Googlebot web crawls are determined by some combination of so-called domain authority and individual page authority (PageRank), frequency and prominence of internal links, URL prioritization (via Sitemaps for example), and content updates. Considering Wikipedia nails all of these factors, what does a typical Wikipedia site crawl look like?

While it might delight any SEO professional to take sneak peek at Wikipedia’s server logs or its Webmaster Tools Crawl Stats, it’s not publicly available information. What is publicly available is this little-known traffic statistics tool by WFMLabs where you can see all kinds of interesting pageview stats at a page level or at a Project level. You can even see search engine spider crawl activity by Project dating back to July 2015:

While not specific to Googlebot, all search engine crawlers average over 40MMpageviews per day. In comparison, humans average over 250MM pageviews per day.

How does this compare to the number of pages search engines crawl on your site? If you’re not already doing so, check your Webmaster Tools Crawl Stats, and for deeper analysis, try to regularly review your server logs. For most sites, you can use a tool such as the Screaming Frog Log Analyzer.

MOBILE FIRST INDEXING

It’s only a matter of time until Google rolls out mobile-first indexing in early-mid 2018, and SEO blogs and forums hit the panic button.

Wikipedia will ignore the chatter. They’ve been ready for mobile-first.

Look at the mobile version of the Office page, and notice the similarity with its desktop content. The main article content is the only content that appears on the mobile site. The only sidebar links that appear on mobile point to the equivalent article in other languages, and again, are pushed to the bottom of the HTML. All the other sidebar bloat-causing links are removed altogether.

Needless to say, when Google rolls out algorithms based on mobile-first indexation, Wikipedia’s ready to keep on ranking. Worth noting, it looks like Wikipedia has also shied away from hopping on the AMP bandwagon.

PAGE SPEED

The mobile page for the Office in the screenshot above scores an 89 on mobile and 95 on desktop experiences. Wikipedia uses the mobile approach, as opposed to a responsive or adaptive approach to show mobile content, and it doesn’t redirect desktop users when they request the mobile page.

When requesting a desktop page from mobile, the desktop page does redirect to the site, accordingly.

All 3 tools find that Wikipedia can improve page load by leveraging browser caching, optimizing images, and combining JS and CSS files so that content above-the-fold loads quickly.

Here’s the thing about page speed testing regarding SEO: very few sites get it perfect because most web teams don’t try to nail every tiny little recommendation. Page speed tools want you to optimize every single thing that loads visibly or loads invisibly in the background of a page.

Even if you optimize all of your own assets — your servers have fast response times, you use CDN to deliver static files, you show clean, efficient HTML — you still have to solve for third-party applications, plugins, HTML, Javascript, CSS, images, etc. that make up the remainder of the page.

Setting expiration dates on all available cacheable resources, compressing every single image, in-lining all above-the-fold CSS, JavaScript, and HTML, are just a few tasks for the webmaster to score above 90 using any of these page speed test tools.

Even then, there’s no guarantee of excellence because standard HTTP 1.0/1.x can only request a few files at a time. The solution for this lies in HTTP2, which allows for multiplexing where a browser or a spider can request multiple files at a time in parallel.

Wikipedia has adopted HTTP2 to improve page speeds for users today, and while Google today still hasn’t enabled HTTP2 for Googlebot crawls, they still recommend you implement it.

THE MOST IMPORTANT TECHNICAL SEO SUCCESS FACTOR

The Platform onto which a small, medium, or enterprise website is built upon, ultimately determines your SEO ability and scalability. The platform (AKA the framework) is comprised of everything — the servers, the software/code (Mediawiki software and PHP), the databases, the content, the design — that powers a (family of) website(s).

If you understand these building blocks, and how they specifically create your overall site experience, you realize the true limits and true possibilities of technical SEO. You can experiment and infuse different SEO techniques as your platform allows to design a harmonious search engine and user experience.

The Wikimedia Foundation wins enterprise SEO with their platform, where most enterprise organizations struggle due to archaic infrastructure, internal politics, and inefficiency.

If Wikipedia executed like most large enterprises, it’s SEO technology wouldn’t be as powerful as it is today, and without SEO domination, who knows if Wikipedia would exist as the household brand it is today?

The platform also enables Wikipedia’s dozen sister Project sites, such as Wikidata, Wikinews, and Wiktionary so they can piggyback and position themselves to dominate web search too.

CONCLUSION

Wikipedia is far from perfect, as an SEO platform, and as the world’s most accurate encyclopedia. Like Youtube, Reddit, and Twitter, it too has systemic biases that challenge it from truly becoming the deepest, truest, and richest source of knowledge.

Hopefully, Wikipedia’s founders keep working on it.

Whether you agree or disagree with my assessments of their technical SEO foundation, I believe SEO professionals of all levels can greatly benefit from observing how large websites like Wikipedia structure their code and content at a page level and at a site level.

What Wikipedia tactics have you tested, and what results have you seen? What tactics are you hoping to apply? Which of Wikimedia’s sister projects do you predict to be most successful?

Blog Source: Ahrefs | Technical SEO Mastery: Lessons from the GOAT, Wikipedia

Tags: ,

No Comments

    Leave a reply