Lesson 6: Links and PageRank
Article by: Matt Polsky
Breaking down what makes a link valuable is an essential skill every SEO should understand. It allows you to prioritize link-building efforts, helps uncover tactics used by competitors and even shows you what to remove or avoid.
Effectively valuing links helps do things like:
- Create more valuable link-building campaigns.
- Perform link audits to determine risk in a link profile.
- Perform competitor analysis to understand what links drive competitor rankings.
- Provide additional value and insight to teams that can impact link acquisition (digital PR and partnerships).
- Determine if you should purchase a website.
Before jumping straight into evaluation, it's important to understand the history behind links and the algorithms directly impacting links.
The History of Link-Based Algorithms
Before link-based algorithms, search engines used keyword-based algorithms to determine ranking positions. Marketers could easily manipulate keyword-based algorithms by stuffing keywords in pages or even hiding them with whitewashing or cloaking.
Whitewashing is the tactic of hiding keywords on a page by blending them in with the page's background. A site owner could easily hide hundreds or thousands of keywords in the page's background (think of white text on a white background) to prevent users from seeing lists of keywords while still influencing search rankings.
Cloaking is the tactic of showing a search engine one thing and the user another. For example, it's possible to show different content to different user agents.
What is a user agent?
In computing, a user agent is any software acting on behalf of a user that retrieves, renders and facilitates end-user interaction with web content. Or, more simply put, a user-agent is a string of text that identifies the browser and OS hitting a server.
Google's user agent for crawling and indexing websites is Googlebot and typically looks like this in server logs:
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
You can view Google complete list of user agents here.
Since Google provides specific user agents, a site owner could easily show Google's user agent different content for the same URL than what a user actually sees.
Both tactics of hiding or changing content for search engines are against Google's search guidelines and can get a website removed from search results.
At the time, these tactics only helped websites, which caused issues when determining quality. Google's founders Larry Page and Sergey Brin realized the problems in current search algorithms and saw a massive opportunity to create better search results by utilizing links. They named their link-based algorithm PageRank for the inventor Larry Page, and because the algorithm assigns ranks to web pages.
"It turns out, people who win the Nobel Prize have citations from 10,000 different papers. A large number of citations in scientific literature, he said, means your work was important, because other people thought it was worth mentioning."
-Larry Paige
What is PageRank and How Does it Work?
PageRank, or some very complex version, still exists in the algorithm today. However, the current version is drastically different from the formula created in the 1998. We do not know exactly what goes into the current link algorithm, but the past algorithms provide a guide.
Understanding concepts like PageRank can help you understand where Google is going, even if we no longer have the exact inputs.
So, what is PageRank?
PageRank is a mathematical formula that ranks pages based on the probability a random surfer – or user – ends up on a specific page from randomly clicking links. Essentially, if you click on random pages forever, PageRank is the percentage of time you spend on any given page.
PageRank assigns a score to each page, think between 0 and 1, and pages with higher scores typically ranked better than others in this era.
The PageRank Formula Simplified
The formula above is a recreation based on the original Stanford paper. To simplify this formula, let's look at an example of how PageRank works if the entire internet was composed of only a few pages.
We'll make an assumption the damping factor (d) is 0.85. The actual damping factor is likely different, but Google listed 0.85 in an early paper, so it's the best we have to go off. The damping factor exists because the random surfer will eventually stop clicking links.
If we start with only two pages, and page A links to the second page, page A passes 0.85A in PageRank.
Now imagine the internet grew to three pages, and page A links to both pages. We'll keep this example simple and say the PageRank passed gets divided evenly among links, which isn't true today but good for this example.
Note: Something we rarely talk about as SEOs is when you add a link to a page, the amount of equity passed to all other links on the page decreases. This is why you may hear recommendations around the number of links on a page – or that faceted navigation is bad for SEO.
Now the internet grows to four pages, but page A doesn't link to the newest page, but a page linked from page A does. PageRank flows from page to page, even if it doesn't directly link, so some equity from page A passes. When this happens, the damping factor runs again (0.85 / 2 ) X 0.85 = 0.36.
Increasing PageRank
A site gains PageRank from external pages linking to your own. The original PageRank algorithm made two assumptions when assigning PageRank:
- Pages that receive more links have a higher probability that someone finds their way to the page by randomly clicking links.
- Pages that receive links from authoritative or important sites have a higher probability that someone finds their way to the page by randomly clicking links.
Point 1 is fairly self-explanatory. More links = more chances to find a site. Point 2 is a bit more complex and where much of link building today focuses on, which is more authoritative sites pass more equity.
The idea behind point 2 is that a well-known and trusted site receives more clicks than an unknown site.
Increasing PageRank doesn't happen overnight. PageRank is a logarithmic formula – meaning it's significantly harder to grow as it achieves a higher score. For example, it's much easier for a site to go from 0.1 to 0.4 than it is for a site to go from 0.91 to 0.92.
As we mentioned previously, links aren't evenly split between every page. In our next section, we break down additional factors that influence links and how that affects PageRank.