Using a New Correlation Model to Predict Future Rankings with Page Authority
Posted by rjonesx.
Correlation studies have been a staple of the search engine optimization community for many years. Each time a new study is released, a chorus of naysayers seem to come magically out of the woodwork to remind us of the one thing they remember from high school statistics — that "correlation doesn't mean causation." They are, of course, right in their protestations and, to their credit, and unfortunate number of times it seems that those conducting the correlation studies have forgotten this simple aphorism.
That being said, correlation studies are not altogether fruitless simply because they don't necessarily uncover causal relationships (ie: actual ranking factors). What correlation studies discover or confirm are correlates.
Correlates are simply measurements that share some relationship with the independent variable (in this case, the order of search results on a page). For example, we know that backlink counts are correlates of rank order. We also know that social shares are correlates of rank order.
Correlation studies also provide us with direction of the relationship. For example, ice cream sales are positive correlates with temperature and winter jackets are negative correlates with temperature — that is to say, when the temperature goes up, ice cream sales go up but winter jacket sales go down.
Finally, correlation studies can help us rule out proposed ranking factors. This is often overlooked, but it is an incredibly important part of correlation studies. Research that provides a negative result is often just as valuable as research that yields a positive result. We've been able to rule out many types of potential factors — like keyword density and the meta keywords tag — using correlation studies.
Unfortunately, the value of correlation studies tends to end there. In particular, we still want to know whether a correlate causes the rankings or is spurious. Spurious is just a fancy sounding word for "false" or "fake." A good example of a spurious relationship would be that ice cream sales cause an increase in drownings. In reality, the heat of the summer increases both ice cream sales and people who go for a swim. That swimming can cause drownings. So while ice cream sales is a correlate of drowning, it is *spurious.* It does not cause the drowning.
How might we go about teasing out the difference between causal and spurious relationships? One thing we know is that a cause happens before its effect, which means that a causal variable should predict a future change.
An alternative model for correlation studies
I propose an alternate methodology for conducting correlation studies. Rather than measure the correlation between a factor (like links or shares) and a SERP, we can measure the correlation between a factor and changes in the SERP over time.
The process works like this:
- Collect a SERP on day 1
- Collect the link counts for each of the URLs in that SERP
- Look for any URLs are out of order with respect to links; for example, if position 2 has fewer links than position 3
- Record that anomaly
- Collect the same SERP in 14 days
- Record if the anomaly has been corrected (ie: position 3 now out-ranks position 2)
- Repeat across ten thousand keywords and test a variety of factors (backlinks, social shares, etc.)
So what are the benefits of this methodology? By looking at change over time, we can see whether the ranking factor (correlate) is a leading or lagging feature. A lagging feature can automatically be ruled out as causal. A leading factor has the potential to be a causal factor.
Following this methodology, we tested 3 different common correlates produced by ranking factors studies: Facebook shares, number of root linking domains, and Page Authority. The first step involved collecting 10,000 SERPs from randomly selected keywords in our Keyword Explorer corpus. We then recorded Facebook Shares, Root Linking Domains, and Page Authority for every URL. We noted every example where 2 adjacent URLs (like positions 2 and 3 or 7 and 8) were flipped with respect to the expected order predicted by the correlating factor. For example, if the #2 position had 30 shares while the #3 position had 50 shares, we noted that pair. Finally, 2 weeks later, we captured the same SERPs and identified the percent of times that Google rearranged the pair of URLs to match the expected correlation. We also randomly selected pairs of URLs to get a baseline percent likelihood that any 2 adjacent URLs would switch positions. Here were the results...
The outcome
It's important to note that it is incredibly rare to expect a leading factor to show up strongly in an analysis like this. While the experimental method is sound, it's not as simple as a factor predicting future — it assumes that in some cases we will know about a factor before Google does. The underlying assumption is that in some cases we have seen a ranking factor (like an increase in links or social shares) before Googlebot has and that in the 2 week period, Google will catch up and correct the incorrectly ordered results. As you can expect, this is a rare occasion. However, with a sufficient number of observations, we should be able to see a statistically significant difference between lagging and leading results. However, the methodology only detects when a factor is both leading and Moz Link Explorer discovered the relevant factor before Google.
Factor | Percent Corrected | P-Value | 95% Min | 95% Max |
Control | 18.93% | 0 | ||
Facebook Shares Controlled for PA | 18.31% | 0.00001 | -0.6849 | -0.5551 |
Root Linking Domains | 20.58% | 0.00001 | 0.016268 | 0.016732 |
Page Authority | 20.98% | 0.00001 | 0.026202 | 0.026398 |
Control:
In order to create a control, we randomly selected adjacent URL pairs in the first SERP collection and determined the likelihood that the second will outrank the first in the final SERP collection. Approximately 18.93% of the time the worse ranking URL would overtake the better ranking URL. By setting this control, we can determine if any of the potential correlates are leading factors - that is to say that they are potential causes of improved rankings.
Facebook Shares:
Facebook Shares performed the worst of the three tested variables. Facebook Shares actually performed worse than random (18.31% vs 18.93%), meaning that randomly selected pairs would be more likely to switch than those where shares of the second were higher than the first. This is not altogether surprising as it is the general industry consensus that social signals are lagging factors — that is to say the traffic from higher rankings drives higher social shares, not social shares drive higher rankings. Subsequently, we would expect to see the ranking change first before we would see the increase in social shares.
RLDs
Raw root linking domain counts performed substantially better than shares at ~20.5%. As I indicated before, this type of analysis is incredibly subtle because it only detects when a factor is both leading and Moz Link Explorer discovered the relevant factor before Google. Nevertheless, this result was statistically significant with a P value <0.0001 and a 95% confidence interval that RLDs will predict future ranking changes around 1.5% greater than random.
Page Authority
By far, the highest performing factor was Page Authority. At 21.5%, PA correctly predicted changes in SERPs 2.6% better than random. This is a strong indication of a leading factor, greatly outperforming social shares and outperforming the best predictive raw metric, root linking domains.This is not unsurprising. Page Authority is built to predict rankings, so we should expect that it would outperform raw metrics in identifying when a shift in rankings might occur. Now, this is not to say that Google uses Moz Page Authority to rank sites, but rather that Moz Page Authority is a relatively good approximation of whatever link metrics Google is using to determine ranking sites.
Concluding thoughts
There are so many different experimental designs we can use to help improve our research industry-wide, and this is just one of the methods that can help us tease out the differences between causal ranking factors and lagging correlates. Experimental design does not need to be elaborate and the statistics to determine reliability do not need to be cutting edge. While machine learning offers much promise for improving our predictive models, simple statistics can do the trick when we're establishing the fundamentals.
Now, get out there and do some great research!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
Categories
- 60% of the time… (1)
- A/B Testing (2)
- Ad placements (3)
- adops (4)
- adops vs sales (5)
- AdParlor 101 (43)
- adx (1)
- algorithm (1)
- Analysis (9)
- Apple (1)
- Audience (1)
- Augmented Reality (1)
- authenticity (1)
- Automation (1)
- Back to School (1)
- best practices (2)
- brand voice (1)
- branding (1)
- Build a Blog Community (12)
- Case Study (3)
- celebrate women (1)
- certification (1)
- Collections (1)
- Community (1)
- Conference News (1)
- conferences (1)
- content (1)
- content curation (1)
- content marketing (1)
- contests (1)
- Conversion Lift Test (1)
- Conversion testing (1)
- cost control (2)
- Creative (6)
- crisis (1)
- Curation (1)
- Custom Audience Targeting (4)
- Digital Advertising (2)
- Digital Marketing (6)
- DPA (1)
- Dynamic Ad Creative (1)
- dynamic product ads (1)
- E-Commerce (1)
- eCommerce (2)
- Ecosystem (1)
- email marketing (3)
- employee advocacy program (1)
- employee advocates (1)
- engineers (1)
- event marketing (1)
- event marketing strategy (1)
- events (1)
- Experiments (21)
- F8 (2)
- Facebook (64)
- Facebook Ad Split Testing (1)
- facebook ads (18)
- Facebook Ads How To (1)
- Facebook Advertising (30)
- Facebook Audience Network (1)
- Facebook Creative Platform Partners (1)
- facebook marketing (1)
- Facebook Marketing Partners (2)
- Facebook Optimizations (1)
- Facebook Posts (1)
- facebook stories (1)
- Facebook Updates (2)
- Facebook Video Ads (1)
- Facebook Watch (1)
- fbf (11)
- first impression takeover (5)
- fito (5)
- Fluent (1)
- Get Started With Wix Blog (1)
- Google (9)
- Google Ad Products (5)
- Google Analytics (1)
- Guest Post (1)
- Guides (32)
- Halloween (1)
- holiday marketing (1)
- Holiday Season Advertising (7)
- Holiday Shopping Season (4)
- Holiday Video Ads (1)
- holidays (4)
- Hootsuite How-To (3)
- Hootsuite Life (1)
- how to (5)
- How to get Instagram followers (1)
- How to get more Instagram followers (1)
- i don't understand a single thing he is or has been saying (1)
- if you need any proof that we're all just making it up (2)
- Incrementality (1)
- influencer marketing (1)
- Infographic (1)
- Instagram (39)
- Instagram Ads (11)
- Instagram advertising (8)
- Instagram best practices (1)
- Instagram followers (1)
- Instagram Partner (1)
- Instagram Stories (2)
- Instagram tips (1)
- Instagram Video Ads (2)
- invite (1)
- Landing Page (1)
- link shorteners (1)
- LinkedIn (22)
- LinkedIn Ads (2)
- LinkedIn Advertising (2)
- LinkedIn Stats (1)
- LinkedIn Targeting (5)
- Linkedin Usage (1)
- List (1)
- listening (2)
- Lists (3)
- Livestreaming (1)
- look no further than the new yorker store (2)
- lunch (1)
- Mac (1)
- macOS (1)
- Marketing to Millennials (2)
- mental health (1)
- metaverse (1)
- Mobile App Marketing (3)
- Monetizing Pinterest (2)
- Monetizing Social Media (2)
- Monthly Updates (10)
- Mothers Day (1)
- movies for social media managers (1)
- new releases (11)
- News (72)
- News & Events (13)
- no one knows what they're doing (2)
- OnlineShopping (2)
- or ari paparo (1)
- owly shortener (1)
- Paid Media (2)
- People-Based Marketing (3)
- performance marketing (5)
- Pinterest (34)
- Pinterest Ads (11)
- Pinterest Advertising (8)
- Pinterest how to (1)
- Pinterest Tag helper (5)
- Pinterest Targeting (6)
- platform health (1)
- Platform Updates (8)
- Press Release (2)
- product catalog (1)
- Productivity (10)
- Programmatic (3)
- quick work (1)
- Reddit (3)
- Reporting (1)
- Resources (34)
- ROI (1)
- rules (1)
- Seamless shopping (1)
- share of voice (1)
- Shoppable ads (4)
- Skills (28)
- SMB (1)
- SnapChat (28)
- SnapChat Ads (8)
- SnapChat Advertising (5)
- Social (169)
- social ads (1)
- Social Advertising (14)
- social customer service (1)
- Social Fresh Tips (1)
- Social Media (5)
- social media automation (1)
- social media content calendar (1)
- social media for events (1)
- social media management (2)
- Social Media Marketing (49)
- social media monitoring (1)
- Social Media News (4)
- social media statistics (1)
- social media tracking in google analytics (1)
- social media tutorial (2)
- Social Toolkit Podcast (1)
- Social Video (5)
- stories (1)
- Strategy (608)
- terms (1)
- Testing (2)
- there are times ive found myself talking to ari and even though none of the words he is using are new to me (1)
- they've done studies (1)
- this is also true of anytime i have to talk to developers (1)
- tiktok (8)
- tools (1)
- Topics & Trends (3)
- Trend (12)
- Twitter (15)
- Twitter Ads (5)
- Twitter Advertising (4)
- Uncategorised (9)
- Uncategorized (13)
- url shortener (1)
- url shorteners (1)
- vendor (2)
- video (10)
- Video Ads (7)
- Video Advertising (8)
- virtual conference (1)
- we're all just throwing mountains of shit at the wall and hoping the parts that stick don't smell too bad (2)
- web3 (1)
- where you can buy a baby onesie of a dog asking god for his testicles on it (2)
- yes i understand VAST and VPAID (1)
- yes that's the extent of the things i understand (1)
- YouTube (13)
- YouTube Ads (4)
- YouTube Advertising (9)
- YouTube Video Advertising (5)