Reliability of domain authority scores calculated by Moz , Semrush , and Ahrefs

Search engine optimization (SEO), the practice of improving website visibility on search engines, faces the considerable challenges posed by the opacity of Google ’s relevance ranking algorithm. Attempts at understanding how this algorithm operates have generated a sizeable number of studies in the worlds of both business and academia. Indeed, this research tradition has managed to present strong evidence regarding the participation of certain factors and their relative importance. For instance, there is a widespread consensus that domain authority is one of the key factors in optimizing positioning. This study seeks to determine the reliability of the domain authority scores provided by three leading platforms for SEO professionals: Moz ’s Domain Authority , Semrush ’s Authority Score


Introduction
The factors that determine the positioning of a website on a search engine results page (SERP) are of considerable interest to researchers as they allow us to both understand and predict how the ranking algorithm works (Vállez; Ventura, 2020; Zakharenko; Smagulova, 2020; Vállez; Lopezosa; Pedraza-Jiménez, 2022). Likewise, in a marketing environment, interest in the development and application of techniques that optimize website visibility is growing, since it is essential for a company to be ranked at the top of SERPs (Saura; Palos-Sánchez; Cerdá-Suárez, 2017). In Spain, for example, more than 1,000 firms have recently been identified as offering services related to search engine positioning (Escandell-Poveda; Iglesias-García; Papí-Gálvez, 2021). For this reason, search engine optimization (SEO), understood as: "the mechanism by which a website or web page is improved to maximize the frequency and quantity of organic traffic from search engines" (Almukhtar; Mahmoodd; Kareem, 2021, p. 70), attracts the attention of multiple sectors, especially within the worlds of business and academia. Indeed, SEO is a critical activity for any online business today, since it has been reported as being able to reduce customer acquisition costs by 87.41% and to improve the return on investment up to 12.2 times (Sickler, 2022).
One of the key challenges facing the application of SEO techniques is detecting the factors incorporated in the ranking algorithms of search engines such as Google, Bing, DuckDuckGo, etc. Here, given that Google is the most popular search engine (StatCounter Global Stats, 2023;NetMarketShare, n.d.), the academic world is especially interested in understanding how its relevance ranking algorithm works. Yet, one of the elements that limits such analyses is the scant information that Google itself provides about its algorithm (Google, 2022). This lack of transparency has led many researchers to analyze the characteristics of the SERPs in an effort to deduce the factors they involve and the weighting afforded them. In so doing, a number of different reverse engineering methods have been applied in different research contexts However, ranking algorithms are complex and subject to frequent changes ( Van-der-Graaf, 2012;Gupta et al., 2016), which means studies of this type quickly become obsolete and require constant revision. Moreover, it has been reported that more than 200 factors are involved in the Google algorithm (Davies, 2021; Dean, 2021), which further impedes the possibility of performing reliable analyses of their behavior. Yet, while this number of factors may not be entirely accurate, it is a clear indication of the complexity of studies of this type.
Having said that, certain positioning factors can be isolated and studied, most notably, inbound links, download speed, traffic, and website or domain authority. To do so, what is required are quantitative data about that specific factor, obtained from a reliable external source. Google's ranking can then be compared with the ranking of the isolated factor in order to determine the importance of that factor in the relevance ranking and, as such, for increasing visibility and traffic (Gupta et al., 2016). Here, not only the number of factors involved in the ranking are relevant, but also their quantitative or qualitative nature -for instance, page download speed (Sp) is a quantitative factor while user experience (UX) is qualitative-and their relative importance needs to be taken into consideration.
To be able to isolate and study a factor, reliable quantitative data must be available. This explains why most researchers use external tools to obtain information -including, the number of unique visitors, the bounce rate, the number of links and the domain authority, among others-that Google itself does not supply (Font-Julián; Ontalba-Ruipérez; Orduña-Malea, Halibas et al., 2020;Linares-Rufo et al., 2021;Mladenović et al., 2022).
The last factor in the list, domain authority, is one of the most recurrent indicators employed in the professional world and one that has been widely used in academic studies (Saberi; Mohd, 2013; Vyas, 2019; Urosa-Barreto, 2020; Nagpal; Petersen, 2021; Ganguly, 2022). It refers to a set of positioning factors that depend on the website as a whole, and not on specific webpages. It is based on quality signals associated with the entire web, such as the number of backlinks (Rowe, 2018), the domain authority of these linked sites, and website age and size, among other factors.
Google does not have a specific, independent domain authority score that it stores and updates for each website. Yet, it does recognize that there is a set of sitewide quality signals, dependent that is on the web as a whole, and which are constantly being calculated for application to all the pages of the website (Schwartz, 2016) to boost or otherwise their positioning.
This has led various SEO service companies to calculate metrics that allow them to quantify the quality of the signals that Google uses and which it applies to a SEO is a critical activity for any online business, because it reduces customer acquisition costs and improves the return on investment website. In this case, these are isolated metrics that are assigned to all known domains and, moreover, they are updated on a regular basis.
The main companies offering such an indicator are Moz, Semrush, Ahrefs, and Majestic. Yet, the task is far from easy given that it requires an index similar to that used by Google to be able to identify the quality signals involved. The leader in this field is Moz, which has developed its Domain Authority indicator. Today, it is widely used by SEO professionals; so much so, in fact, that the company's name for its metric (Domain Authority) has become synonymous with the concept itself. To avoid confusion, hereinafter, we use upper case (Domain Authority) to refer specifically to Moz's metric and lower case (domain authority) for the general concept.
Moz defines its Domain Authority (DA) as: "a search engine ranking score (…) that predicts how likely a website is to rank in search engine result pages" (Moz, n.d.).
Domains with greater authority are therefore more likely to be ranked highly and, so, to generate more traffic (Chandler; Munday, 2016). Obviously, Google does not recognize that Moz's DA plays any role in its positioning.
DA is a quantitative indicator that operates in a similar way to Google's PageRank, using a logarithmic scale from 0 to 100 (Orduña-Malea; Aytac, 2015). PageRank was patented in 1998, becoming the key element in Google's ranking algorithm and making a decisive contribution to the enormous success of its search engine. Its competitive advantage lay in the fact that the relevance ranking it generated was of much higher quality than that of its then competitors, including AltaVista and Yahoo! (Redding, 2018).
Yet, PageRank is a score given to each webpage, while domain authority is a global value associated with the domain name and, therefore, with the entire website. Thus, although the two indicators are calculated in a very similar fashion and with the same objective, they operate over different fields. Somehow or other, the domain authority is a composite score of the authority afforded each of a website's pages.
For years, Google published its PageRank values using a toolbar installed in the browser. However, since 2016, Google stopped providing this information to avoid the generation of spam and reported that its ranking algorithm was no longer based on this indicator (Sullivan, 2016). Despite these claims, various experts conclude that Google today uses an updated version of PageRank, incorporating new qualitative factors (Marcilla, 2022;Mendoza-Castro, 2021;West, 2021), and that several of the metrics involved in its ranking algorithm act at the sitewide level (Schwartz, 2016; Critchlow, 2018; Haynes, 2022).
John Mueller, a Google analyst, acknowledged that the company was still using PageRank internally. In 2020, he published a famous tweet that quickly went viral, saying: "Yes, we do use PageRank internally, among many, many other signals" (Mueller, 2020).
He also admitted that Google uses quality signals from across the whole website and that these are applied to all the pages of the site to improve its positioning: "... when we're looking at, for example, quality signals that are more sitewide, then that's something that applies across the whole website in the state that it's at now. So it's not the case that we would say, oh, five years ago, you had this score for your website. Therefore, your contact will be rated like this forever. But rather we look at your website overall now, and we apply the current score to all of your pages on the website. So that's what we do when it comes to sitewide signals" (cited in Schwartz, 2016).
Moz reports domain authority as a score ranging from 0 to 100, based on a logarithmic scale, which implies that climbing from 20 to 30 is significantly easier than climbing from 70 to 80. DA provides a prediction of the position that a website's pages will occupy in the SERPs, with higher scores having a better chance of obtaining good rankings (Moz, n.d.). A high DA, therefore, is an indication that a greater number of quality signals have been identified and that the pages are more likely to be ranked highly.
Moz reports that the metric is based on data obtained from its own web index, Link Explorer, and that it uses multiple factors in its calculation. It also applies a machine learning model that correlates its data with real Google results, which are then used as references to adjust the values obtained.
Semrush and Ahrefs calculate a very similar indicator to that of Moz (Soulo, 2022; Mendoza-Castro, 2020): the former has developed what it calls an Authority Score, while the latter provides a Domain Rating. All three companies calculate domain authority by applying different procedures and using different indexes, but each has the same objective.
The lack of transparency has led many researchers to analyze the characteristics of the results pages in order to deduce the factors involved and their weighting Domain authority is one of the most recurrent indicators employed in the professional world and one that has been widely used in academic studies In a similar way to Moz, Semrush defines its Authority Score (AS) as a: "metric used for measuring a domain's or webpage's overall quality and SEO performance" (Semrush Team, 2023).
It is based on multiple factors of trustworthiness and authority, including search, traffic, and link data, especially backlinks. AS employs a neural network and machine learning to ensure accuracy and that its information is based on actual standings of the most recent results pages. Like DA, the AS is measured on a logarithmic scale from 0 to 100, with the highest scores corresponding to more traffic and a higher ranking (Varagouli, 2020).
Finally, Ahrefs defines its Domain Rating (DR) as a: "metric that shows the relative strength of a website's backlink profile", using a logarithmic scale that goes from 0 to 100 (Soulo, 2022).
The company reports that the DR is calculated in a similar way to PageRank, the main difference being that PageRank is calculated for pages, while DR is calculated for websites. The indicator considers multiple factors such as the number of websites that are linked to the site being evaluated, the DR of the linking domains and the number of sites to which each domain links. Each company has developed a distinct indicator which, despite operating the same logarithmic scale from 0 to 100, uses different processing mechanisms and indices -databases-to measure a fundamental element of SEO, namely domain authority -that is, an indicator based on the analysis of backlinks that helps evaluate the ability to attract website traffic and which provides useful information for the creation of a strategy to increase visibility (Khan; Mahmood, 2018). However, despite the consolidation of companies developing widely used SEO analytics tools of considerable maturity, the need remains to evaluate their validity and reliability when applied to a range of different contexts (García-Carretero et al., 2016).
To the aforementioned scarcity of information about Google's algorithm, we can add the rather vague, general descriptions provided by the three companies regarding the operation of their respective versions of domain authority scores.
In light of this situation, this study seeks to evaluate the reliability of the indicators developed by Moz, Semrush, and Ahrefs for measuring domain authority. The need arises because we do not know, in any great detail, how these companies calculate domain authority, nor what data they use to do so. More specifically, the goal of this study is to determine the extent to which the three companies coincide in their calculation of the domain authority applied by Google and, in this way, to deduce their reliability. These results should be of particular interest to SEO professionals and researchers alike.

Methodology
We hypothesize that the three domain authority indicators provide very similar values. If we are able to corroborate this hypothesis, then it can be deduced that the three companies are reliable -in relation, that is, to the objective they pursue, i.e. providing similar metrics for the quality signals that depend on the overall website and which are applied by Google in its ranking algorithm (Schwartz, 2016)-since the values of one platform serve to cross verify the values from the other two.
However, the three companies calculate domain authority based on their own index data -they are as such three different indices, obtained independently. Moreover, any details of the calculation procedures employed are not made known to the general public nor are they shared between the three companies. They are direct competitors and naturally keep the factors that are included, and the weighting given to them, secret. Yet, they must necessarily employ similar methods of calculation given that they have the same origin and objective; but, it remains unknown just how similar they are.
In adopting such an approach to this study, we implicitly apply, albeit at a small scale, a method based on data triangulation. The same indicator -i.e. domain authority-is calculated for a sample of domains using three different data sources. We then compare and contrast the degree to which the scores coincide: the greater the match, the more reliable the data can be considered from all three sources.

Various experts conclude that Google today uses an updated version of PageRank, incorporating new qualitative factors
The objective of the triangulation method is to confirm or validate the results of a study applying different methodologies, data sources, theories and even researchers (Thurmond, 2001;Wilson, 2014; Arias-Valencia, 2000; Heale; Forbes, 2013). The main advantage of triangulation is that when two strategies provide similar results, the findings are corroborated, increasing the internal validity of the study (Feria-Avila; Matilla-González; Mantecón-Licea, 2019).
In the present case, we have not only two, but three, different data sources and, if the hypothesis is upheld and the three sources are similar, then the domain authority indicators will have been doubly validated. Each indicator, therefore, was triangulated by the other two, as follows: -Moz was triangulated by Ahrefs and Semrush; -Semrush was triangulated by Ahrefs and Moz; -Ahrefs was triangulated by Moz and Semrush; The direct consequence of this validation is that we are able to obtain clear indications of the reliability of the three platforms responsible for their calculation, which constitutes the ultimate goal of this study.
The degree to which the three coincide, moreover, is an important factor to bear in mind: the greater the match, the greater the reliability. To determine just how similar the values of the three tools are, a statistical analysis based on Spearman's correlation coefficient (rho) was used. Correlation coefficients measure the strength of the association between two variables, where the greater the correlation, the greater this association is in the sense that if one variable increases (decreases) so will the other. As such, this statistic also informs of the degree of similarity between the two variables analyzed, which is precisely our objective here. Spearman's rho has been selected and not Pearson's r because the variables corresponding to the three tools examined are not normally distributed.
For the calculation, the three indicators were paired off, obtaining the following three pairs: Ahrefs vs. Moz, Ahrefs vs. Semrush, and Moz vs. Semrush. Next, the Spearman's correlation coefficient of each pair was calculated and, by so doing, we were able to obtain a partial comparison with pairs of variables. Subsequently, the three pairs were unified to obtain a single value to express the degree of general coincidence.
It is normal practice to use Spearman's correlation coefficient in SEO research to identify which factors play -and the extent to which they play-a role in Google's relevance ranking algorithm (Ziakis et al., 2020;Rovira et al., 2019; Codina; Lopezosa, 2021; Tavosi; Naghshineh, 2022) and Google Scholar (Rovira; Guerrero-Solé; Codina, 2018). Applying the reverse engineering method, the native order provided by Google in a sample of unbiased searches is correlated with a second ranking of the same websites, but this time applying a single ranking factor, i.e. the one under study. The higher the correlation, the more similar the two rankings are and, consequently, the more importance the studied factor can be considered as having in Google's ranking algorithm.
The context and objective of the present study differ, however, as we are not seeking to implement reverse engineering but rather to conduct a simple triangulation of data. Having said that, the role played by our statistical analysis -in this case, Spearman's correlation-is identical. In both cases, the similarity of two variables, that is, two different rankings of the same sites, domains or webpages, is measured.
To carry out the statistical analysis, we first selected a sample of domain names, avoiding biases, especially of a thematic or geographical kind. Subsequently, the domain authority scores provided by the three tools were obtained for each domain. To select the sample of domains, different searches were conducted in Google using keywords selected in the most neutral way possible. To avoid any bias, the selection of keywords was made by applying two criteria: 1) Fifty words of four or more characters were selected from those identified as being the most frequently used on the web, according to the WordFrequency ranking, and 2) the eleven keywords used most during the previous six months to conduct searches on Google, according to Google Trends, were added. The selection was made in November 2022 just before the data collection (Table 2)  It should be noted that the order by relevance of the results of these searches has no influence on the study. The objective of the searches was exclusively to select a random sample of domain names so as then to be able to correlate the domain authority values awarded to them by the three platforms. At no time does the order in the list of results interfere with this objective.
For the collection of data, we used extensions installed in the browser. These allow SEO metrics to be obtained, both from the webpage that is being visited and from listings of Google's results. The following browser extensions were used: The three extensions were installed in the Google Chrome browser to carry out all searches and to obtain the scores corresponding to the domain sample for each of the three indicators.
As discussed, to carry out the study, 61 keywords were used (see Table 2). Thus, we conducted a total of 183 searches, with each word being searched for three times, that is, once for each extension. Google settings was adjusted to display 100 results per page and the values corresponding to Domain Authority, Authority Score, and Domain Rating were extracted. In total, 16,937 results were obtained.
All the searches were conducted simultaneously and in the same geographical location to avoid any potential bias. Subsequently, the URL information corresponding to the path, file name and parameters was removed, giving us 6,268 domains. All duplicates were then removed, leaving a final sample of 3,151 distinct domains.
Data triangulation was conducted in different phases. First, the reliability of the tools was evaluated to test the hypothesis that the values of the three indicators are similar. To do this, the data were triangulated by correlating pairs of tools, that is, Ahrefs with Moz, Ahrefs with Semrush, and Moz with Semrush. There are only three pairs because the order is interchangeable, that is, the correlation of A with B is the same as that of B with A. Then, second, a global analysis was also conducted, combining the three pairs and integrating all the data in a single sample. Third, based on the initial statistical analyses, we were able to verify that the degree of coincidence was not homogeneous for all domain authority scores. In domains of low authority, the coincidence was lower than in those with high authority scores. For this reason, we opted to carry out an additional statistical analysis comparing low with high values to determine the degree of difference.
A domain was classed as having "high" authority when its score was greater than 50 and as having "low" authority when the indicator was 50 or less. When collecting data with the different applications, if a domain scored more than 50 on one application but less on another, it was classed as "mixed". In the segmented analysis, mixed values were discarded, but they are included in the sample and in the global analysis.

Results
As indicated, the sample of domains for analysis was selected from 61 Google searches. These provided 16,937 results with a total of 3,151 different domains, which constituted our main sample. The second column in Table 3 shows the number of domains by SEO company for which the extensions actually provided a domain authority score. As is evident, in all three cases, this number is lower than the total of 3,151 domains analyzed. This is attributable to the fact that for 10% of the domains this indicator could not be obtained because the domain was not present in the platform's index. Note that the row headed "Total" corresponds to an aggregate analysis for all three companies. What we have are indeed three different data sources but which overall provide very similar data, something we have been able to confirm by double triangulation In the statistical analysis by pairs of the three platforms, Spearman's correlation coefficients greater than 0.9 were obtained in all cases (Table 4 and Figures 2, 3, and 4). This indicates a very high correlation and, therefore, we can deduce a very high similarity between the domain authority values provided by the three companies.
When combining all the pairs by aggregating the data from the three comparisons, we obtain a global value for all the data. In this case, too, the coefficients are also higher than 0.9 (see, last row of Table 4 and Figure 1).
In all cases, the p-values are indicative of statistical significance. All four figures highlight this strong correlation with the data points concentrated along the diagonal. Thus, we conclude that the hypothesis is verified: The high correlation coefficient is conclusive of the fact that the data from the three companies are reliable, especially given that they have been doubly triangulated.
In addition, we divided the main sample into subsamples based on the value of domain authority (i.e. high, low or mixed), with 79% of the domains being assigned a high value and 21% a low value. The reason for this imbalance is that the first 100 results of each search were selected and normally the highest ranked results tend to have a high domain authority, given that this is an important factor in Google's ranking algorithm.
The analysis of the two subsamples shows that the correlation coefficients of the domains with low authority are notably lower than those with high authority. In addition, the z-scores indicate that these differences are statistically significant, since, as can be seen in the last column of Table 5, all the values are greater than 1.96.
In Table 5 and Figure 5, we eliminate the mixed data, that is, where one of the variables presents a high value and the other a low one, since they do not belong to either of the two subsamples. Thus, there are no biases in the results as we are comparing exclusively sites of high and low domain authority. The values corresponding to domains with low authority and a low correlation coefficient (last row of Table 5) are located in the lower-left quadrant of Figure 2. As is evident, the points are clearly more dispersed than is the case of the values of the domains with high authority, located in the upper-right quadrant of the same figure. This dispersion can also be seen in Figures 1, 2, 3, and 4, which show all the values, including the mixed ones. In the following section, we seek to provide explanations for this difference.

Data analysis
The most surprising outcome to emerge from the preceding study is the high degree of agreement between the values of domain authority calculated for the three SEO companies. Employing different indices but largely similar procedures of calculation -albeit with some differences, the companies' tools provide very similar values.
Two findings emerge from the data analysis that seem to confirm the fact that the three companies do indeed operate different indices. The first is that 10% of the domains do not appear in one or more of the three indicators. The second is that the correlation coefficient of the subsample of low domain authority is lower than that recorded for the high domain authority subsample. This difference is statistically significant and would appear to indicate that the amount of information available to the three platforms in relation to the low authority domains is not as great, and does not coincide to the same degree, as that available to them in relation to the subsample of high authority domains. This explanation, however, needs to be corroborated by conducting new studies based on larger samples, especially as far as the sample of low authority domains is concerned, as it represents just 21% of the data here.
The affirmation that we are in fact dealing with three different indices, and three calculation procedures with certain differences, is important as regards the ultimate objective of the present study: specifically, an evaluation of the reliability of these domain authority tools. Clearly, if we were dealing with three versions of the same index based on similar calculation methods, we would not need to apply the triangulation methodology, since we would be working with a single data source. Thus, what we have are indeed three different data sources but which overall provide very similar data, something we have been able to confirm by double triangulation following the pairing of the three sources.
The correlation coefficients are in all cases greater than 0.9. This is true both for the comparison between pairs of tools and for the overall analysis. These high coefficients confirm that the domain authority values for all three platforms are very similar. Even when analyzing the subsample of low domain authority, for which the correlation coefficient is lower, we still obtain a strong correlation above 0.6. As all the data are highly correlated, we can conclude that they are globally very similar data. One tool triangulates the other two in order to corroborate their values. Thus, we can safely state -as hypothesized-that all three tools are reliable.

Conclusions
The methodological and statistical framework designed to undertake this study has had the sole purpose of determining the extent to which the domain authority values provided by Moz, Semrush, and Ahrefs can be considered reliable. According to the results obtained, and as discussed above, the three tools are highly reliable. The values of the correlation coefficients and the cross validation provided by triangulation are indisputable in this regard.
As stressed throughout this study, none of the three indicators studied is directly employed by Google in its ranking. But we can conclude that Moz's Domain Authority, Semrush's Authority Score, and Ahrefs' Domain Rating are three good estimates of the metrics that act at the website level and which Google uses in its ranking algorithm. As discussed, various studies consider this metric to be a quality signal and use it to gain new insights. Our study has provided evidence of the validity of the three commercial versions of the indicator and, in so doing, contributes a more solid basis for future studies in this same line of academic research focused on search engine optimization.
This study, it should be noted is not without its limitations. Here, the most obvious limitation occurs when selecting the sample. Even though the Google results page was expanded to obtain 100 items per search query, most of the resulting websites present a high domain authority. With this in mind, in future studies we wish to increase the sample size and analyze a greater number of domains, while seeking to ensure that the proportion between the subsamples of high and low authority domains is more balanced. We also intend including other companies that have tools for calculating domain authority, most obviously Majestic and its Trust Flow indicator. Likewise, it would be especially interesting to develop a procedure that would allow us to obtain more precise indications of the degree of similarity between the domain authority values provided by the tools of these companies and Google's metrics on the quality signals of an overall website. However, given the lack of transparency on the part of Google, this task will be far from easy.
The results of the present study demonstrate the reliability of the domain authority calculated by Moz, Semrush, and Ahrefs