The May 2024 Google leak is perhaps the most significant leak in Google history.
Other than a rare peek into the dominant search engine’s closely guarded algorithms, what makes the leak momentous is that some details appear to contradict past public statements by Google representatives.
However, factoring in the timeline and nature of these statements, the contradictions are not entirely black and white.
The documentation mentions a feature that implies Google computing “siteAuthority” score. Consequently, this brings into question a seven-year-old statement from Google spokesperson John Mueller, in which he mentioned the search engine not using a “website authority score.”
But, Mueller used phrases such as “from my point of view” and “something like that,” which suggests that Google spokespeople could be leaning on vagueness and semantics, a common tactic for corporate representatives.
Understandably, Google wants to protect its search results from bad actors. But for the sake of the well-meaning ones and keeping their trust, it might be better to go along the lines of “We cannot comment on that” next time and avoid belying.
So, while the Google API leak revealed some fascinating insights into the search engine giant’s algorithms, this isn’t a “gotcha” moment for search engine optimization (SEO) and digital marketing professionals.
Instead, the leak should prompt an eagerness for professionals to explore and test against what Google says.
This blog will look at the Google leaks as follows:
• How Did the Google Leak Unfold?
• Summary of the Latest Google Leak: What It Revealed
• Google’s Response: Cautions Against Misinterpreting Leaked Documents
• Another Google Data Leak on the Horizon
• How Should the SEO Community Move Forward
• Get Help: Let’s Guide You Through the Google Algo Updates
How Did the Google Leak Unfold?
“The leak itself seems to have been made by accident on Google’s side, a lapse,” said Tristan Harris, Thrive Internet Marketing Agency’s Director of Digital PR.
“The person that reported the leak has no connection to Google or any of its companies.”
The sequence of events surrounding the Google leak began on March 13, when an anonymous source known as “yoshi-code-bot” posted the leaked documentation on GitHub. Despite the potential significance of the content, the post did not attract attention from the public or media.
In an effort to bring more awareness to the leak, the anonymous source reached out to Rand Fishkin via email on May 5. Fishkin, a well-known figure in the SEO community, was initially hesitant but eventually took the matter seriously and sought advice from Mike King, another respected expert in the field.
Their collaboration culminated in Fishkin and King publishing their respective analyses and commentaries on May 27. These posts significantly raised the leak profile and drew the attention of the SEO community and beyond.
The following day, on May 28, the anonymous source revealed his identity. Erfan Azimi, an SEO practitioner and EA Eagle Digital founder, came forward and posted a YouTube video discussing why he pursued the Google algorithm leak.
Azimi stressed that he has “no financial motive,” only that “the truth needs to come out.”
Summary of the Latest Google Leak: What It Revealed
Mentions of large language models and generative features point towards the leak being a recent document.
“The leak gives us the most extensive view we’ve ever had of how Google potentially ranks sites,” Harris noted.
“It’s wise to keep ourselves informed, but jumping the gun before the industry has started in-depth testing would be the opposite, especially since we don’t know what weight each signal has.”
Here’s a summary of what’s been revealed in the Google leak:
Over 2,500 Modules Containing 14,000 Ranking Features
The Google data leak exposes an elaborate framework with 2,596 modules with 14,014 attributes. Despite the extensive documentation, specifics on the weighting of these features remain a mystery, suggesting a highly complex and nuanced ranking system. Still, it highlights the importance of various elements for SEO.
Links Still Matter, But How They Matter Is Different
The Google search leaked documentation seems to emphasize link diversity and relevance, along with the enduring influence of PageRank for website homepages. However, the focus seems to have been proven to shift from sheer quantity to quality and actual user behavior (clicks). It suggests that links from reputable sites that don’t get clicks could be irrelevant.
An Undeniable Focus on Successful Clicks
Several modules within the Google search API leaked documentation hint at how Google tracks various click metrics like “goodClicks,” “badClicks,” “lastLongestClicks” and “unicornClicks” to gauge user engagement and satisfaction. It reinforces the need for high-quality content that keeps users engaged.
Chrome Data Does Play a Role
A module named “ChromeInTotal” implies that Google leverages data from Chrome users for ranking purposes, meaning user behavior within the Chrome browser might influence search results. In the past, Google reps like Matt Cutts and John Mueller said Google does not use Chrome data for search ranking (which could have been true at that time).
Whitelisted Domains Raise Questions
Modules like “isElectionAuthority” and “isCovidLocalAuthority” in the Google algorithm leak suggest Google whitelists specific domains for elections and COVID information. While this helps ensure that high-quality, trusted sources are prioritized, it likewise raises concerns about potential bias and the criteria for such whitelisting.
Publisher and Author Entities Get Underscored
Features such as “isPublisher” (website owner) and “isAuthor” (content creator) appear to indicate that Google values these entities. But, these entities are boolean, meaning the algorithm can either know who the publisher or author is or not. If not, Google would likely lose confidence in the website or content.
Authorship Information Are Being Stored
The Content Warehouse API leak also shows that Google seems to store documentation on authors. It suggests that Google could be using the information to determine the author’s expertise and authoritativeness, which are characteristics of Google’s E-E-A-T.
Several Reasons for Demotion
The Google search leak also shed light on some possible reasons why content might be demoted in search results with a handful of features like “anchorMismatchDemotion” (links don’t correspond to the target site). Other reasons include low-quality product reviews, location mismatch, exact-match domains (lacking content quality) and adult content.
Small Sites Get a Boost?
A feature named “smallPersonalSite” in the Google search leaked documentation hints that Google might adjust rankings for small websites and blogs. However, the exact impact remains unclear, but it suggests potential ranking adjustments through Twiddlers (re-ranking systems based on freshness).
Dates Matter for Gauging Content Freshness
Several attributes imply Google values dates, looking at dates on bylines “bylineDate,” URLs “syntacticDate” and the content itself “semanticDate.” These features highlight Google’s focus on freshness and, possibly, tracking original content creators or sources for tracking duplicate content.
Content-Length and Originality Get Scored
The attribute “OriginalContentScore” suggests that content originality is scored from 0 to 512. Interestingly, for shorter pieces, the maximum score is capped at 127. The feature’s focus on originality indicates that thin content does not equal short but a lack of substance, which can be present in both short and long pieces.
Nothing New About Page Titles, but Font Size?
The “titlematchScore” feature seems to reiterate the importance of page titles. It measures how well a page title matches search queries. Google also appears to consider the average weighted font size of the terms “avgTermWeight” in documents and anchor text.
Now, there’s more where this comes from.
“A lot of the things that have come out seem to confirm suspicions we’ve had for years. But there is an enormous amount of data to comb through, so I’d expect more revelations to come out over the coming weeks and months.
It’s always been important to stay on top of industry news, tests and studies, but it’s even more important now than ever,” Harris said.
The Google search API leaked documentation also reveals attributes for tracking domain registrations and identifying and handling video-oriented websites.
It also includes documents on various Google products, like Google Cloud, People API, YouTube recommendations, Google Assistant responses, how books are ranked in search results and how videos are found.
Still, all things considered for Search, it’s unclear how the leaked attributes are precisely weighted as Google ranking factors or which are even being used right now.
Google’s Response: Cautions Against Misinterpreting Leaked Documents
The search engine giant’s response to the Google search API leak was unsurprisingly measured, indirectly acknowledging the leak while reiterating its commitment to keeping the integrity of its search results.
“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”
– Davis Thompson, Google spokesperson
In its conversation with Search Engine Land, Google emphasized that Google ranking factors or signals are “constantly changing.”
However, Google clarified that this doesn’t mean their core ranking principles are altered – those remain consistent.
Google also said that it will remain committed to sharing what information it can with the SEO community.
Another Google Data Leak on the Horizon
On June 4, an internal Google database containing privacy and security issues from 2013 to 2018 was leaked, obtained by 404 Media. The leak revealed that Google tracked these issues but did not necessarily disclose them publicly, and some were resolved quickly.
Here are some of the incidents in the Google leaks:
• Google was found making YouTube recommendations based on users’ deleted watch history.
• An incident was flagged wherein someone was manipulating customer marketing accounts on Google’s ad platform.
• Following Google’s acquisition of Socratic.org, over one million user email addresses were publicly exposed for over a year.
• Google Street View accidentally captured and stored thousands of car license plate numbers in 2016. An employee reported the issue, and Google subsequently wiped the data.
• Waze, acquired by Google for $1.3 billion in 2013, had a carpool feature that leaked users’ trip histories and home addresses.
• A contractor used their admin access to download and watch a private game trailer on Nintendo’s YouTube channel in 2017. The contractor shared a screenshot of the trailer with a friend, who then posted it on Reddit. An internal review by Google deemed the incident “non-intentional.” Nintendo later released the trailer the same year and launched the game Yoshi’s Crafted World on the Switch in 2019.
While the individual incidents may seem insignificant to the masses, the bigger picture they paint is concerning. This leak from Google, a powerful company entrusted with a massive amount of our personal data, reveals a pattern of mishandling sensitive user information.
Among Amazon, Apple, Facebook, X (previously Twitter) and Google, the latter is reported to collect the most user data.
The Alphabet Inc. company confirmed these Google leaks, saying its employees have a system for reporting potential problems with Google products. It also said that “the reports obtained by 404 [Media] are from over six years ago and are examples of these flags.”
How Should the SEO Community Move Forward
The Google algorithm leak offers a rare glimpse, not the entirety of its inner workings. It’s not wise to take any of the attributes mentioned in the documentation and treat it as a definitive Google ranking factor.
Again, we can not be certain which of these attributes or systems are active or retired.
We may know some key ingredients to Google’s “secret sauce,” but we don’t necessarily know how much sugar or spice to add.
Harris advised, as with most things in life, to take things (the recent leak and future statements from Google included) with a grain of salt.
“Focus on delivering the best user experience possible on your site and pairing that with strong content that caters to your audience’s needs,” Harris added.
Test, observe and refine.
The results won’t lie to you.
Let’s Guide You Through the Google Algo Updates
Thrive is a digital marketing agency that champions search optimization. Our team of specialists follows Google algo updates while making space for testing to maximize results.
Some of our SEO services are:
• Local SEO
• Technical SEO
• Franchise SEO
• Enterprise SEO
For expert help and a better SEO strategy, get in touch with our team.