Using algorithms to identify social activism and climate skepticism in user-generated content on Twitter

Climate change has become an issue of great relevance in society in recent years


Introduction
At present, social platform users actively share information about their activities, ideas, and personal experiences through their smartphones (Dubrofsky; Wood, 2014), which leads to the generation of massive amounts of data. In recent years, such user-generated content has been used extensively for research in the social sciences (Schmid, 2016) and has been analyzed in terms of both language aspects and contexts. The results of such analysis provide a comprehensive understanding of users' behavior (Schmid, 2016;Terkourafi;Haugh, 2019).
Social identity, which is an essential component of self-concept, stems from an individual's perception of their membership to a social group(s), as well as from the significance that this individual ascribes to that membership (Tajfel, 1974). Therefore, social identity explains how identification works from the individual, interactional, and institutional perspectives (Jenkins, 2014). Accordingly, social identity has been defined as the individual's concept of the self with respect to specific aspects of social behavior (Tajfel, 1981;Kastanakis;Balabanis, 2012;Singh et al., 2021).
Among the major characteristics of social media platforms is users' organization into networks, that is, communities that share common interests. This makes social platforms a valuable source of data for social scientists investigating different cultural and social issues (Ntontis et al., 2018).
Another important characteristic of social platforms is that their users collectively create and interact with content. This content, referred to as user-generated content (UGC), includes any content created by social platform users that is publicly shared with other users (Reyes-Menéndez et al., 2020) This makes social platforms such as Twitter structured communities where UGC offers an enriched source of users' activities (Fujita; Harrigan; Soutar, 2018). The analysis of the network structure on social platforms enables an analysis of not only the individual activities of specific users but also their social activism. Starting social activism on a social platform is facilitated by the joint performance of the following three effects: network structure, collaboration, and the interaction of users (Ntontis et al., 2018; Reyes-Menéndez; Álvarez-Alonso, 2019). Social activism can emerge around a social profile, for instance, @GretaThunberg (Olesen, 2022), or a hashtag (#), such as #WorldEnvironmentDay (Reyes-Menéndez; Saura; Álvarez-Alonso, 2018).
The issue of climate change has grown in importance in recent decades, causing greater social concern about the effects it may have in the future.
Public interest in climate change is growing, and the European Union, for example, has earmarked significant funding within the Horizon 2020 program for the study of this issue. The European Green Deal Research and Innovation Program funded a study aimed at collecting data on climate change and human opinions via Twitter, spanning 13 years and including more than 15 million tweets spatially distributed around the world. The variables analyzed were geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historical temperature, topic modeling, and information about environmental disaster events. The data provided by scientists, dramatic extreme weather events, and the reports published periodically by the Intergovernmental Panel on Climate Change (IPCC), have highlighted the need to act as soon as possible and forcefully. As pointed out by Eide and Kunelius (2021), the year 2018 represented a turning point toward a general discourse on the subject, further promoting activism and conveying a message of urgency. Just like the European Parliament, instead of talking about climate change, they talk about a "climate emergency." This idea of urgency and risk if action is not taken quickly is also enhanced by movements such as Fridays for Future, which have had significant social repercussions and have demonstrated the importance of citizen mobilization, in this case led by young people. All this favors an activist stance in relation to the issue that no longer involves traditional agents such as nongovernmental organizations (NGOs) but rather citizens.
Despite the data and general consensus in the scientific community, the climate change debate is highly polarized (Dunlap;McCright, 2011;Elgesem;Steskal;Diakopoulos, 2015;Hoggan, 2009;Washington;Cook, 2011;Moernaut et al., 2022;Pearce et al., 2019), and along with the voices that promote social awareness, there is also a current of thought that denies or questions climate change and downplays its effects or the role that people themselves have on it (anthropogenic climate change theory).
Social platforms have become an environmental protest space where users express their opinions and concerns about this topic. For example, the findings of research on the #WorldEnvironmentDay tag include conclusions that, among all the Sustainable Development Goals (SDGs), those of most concern to users are related to the environment and public health, such as climate change, global warming, extreme weather, water pollution, deforestation, climate risks, acid rain, and massive industrialization.
In this sense, relying on the aforementioned research, it becomes clear that the Twitter platform offers an opportunity to analyze UGC related to environmental issues such as climate change (Pearce et al., 2019;Moernaut et al., 2022) from a user perspective by enabling an analysis of both types of interactions: those organized around a profile and those organized around a hashtag.
In this context, we seek in this work to understand the social skepticism around climate change through an analysis of users' social activism and behavioral patterns. The analysis was performed on a UGC corpus of a total of 78,168 tweets using textual analysis techniques. The first of these techniques was latent Dirichlet allocation (LDA), a machine-based technique, applied in combination with a corpus linguistic approach. We also performed discourse analysis using the log-likelihood and mutual information (MI) statistical measures.
The research question (RQ1) addressed in the present study is: What are the key indicators of social skepticism around climate change according to the analysis of users' social activism and behavioral patterns on Twitter?
In what follows, we explain the theoretical framework of the present study (Section 2). This is followed by a description of the data collection process and the methodology (Section 3). The results are reported in Section 4. Section 5 presents the discussion. Conclusions, limitations of the present study, and directions for further research are presented in Section 6.

Theoretical framework
Previous research has focused on understanding social activism around climate change (Reyes-Menéndez; Saura; Álvarez-Alonso, 2018; Pearce et al., 2019;Moernaut et al., 2022). However, the conversation on climate change is highly polarized (Elgesem; Steskal; Diakopoulos, 2015; Pearce et al., 2019;Moernaut et al., 2022). In general, people adopting these two positions are referred to as accepters/believers and skeptics in literature. Authors such as Washington and Cook (2011) question the use of the term "skeptics" and propose that it would be more correct to call those who oppose the theory of anthropogenic climate change "deniers". However, we use the term "skeptics" herein to refer to both those who deny as well as those who question or minimize the scientific data or theories that indicate that climate change is taking place, because this term is most commonly used in previous studies ( show less concern about climate change than those who defend a leftist position. However, more work is needed since one study carried out in Germany by Engels et al. (2013) found a negative correlation between political participation and skepticism. Another factor that has been studied is the influence of geographical region. Whitmarsh and Capstick Despite the data and general consensus in the scientific community, the climate change debate is highly polarized Germany) and studies carried out in the United States (Smith;Leiserowitz, 2012) and Great Britain (Corner;Markowitz;Pidgeon, 2014;Capstick et al., 2015) also highlight skepticism in public opinion and further suggest that it has become especially marked over the last two decades.
Some of the factors that are argued to be possible reasons for the greater skepticism in public opinion in recent years are -news in the media and skeptical positions defended by politics (Corner; Markowitz; Pidgeon, 2014) or the scientific community (Lahsen, 2013); -a lack of commitments, which were postponed to subsequent summits, at the Copenhagen UNFCCC in 2009 (Van-Eck; Feindt, 2022); and -the climategate case (Grundmann, 2013; Matthews, 2015; Van-Eck; Feindt, 2022). It is relevant that the level of education and scientific knowledge are not important factors to explain this position (Kahan et al., 2012;Whitmarsh, 2011), and contrary voices can even be heard within the scientific community itself (Lahsen, 2013), something that also has been able to contribute to increasing the level of skepticism. Additionally, opinions that deny climate change have had greater acceptance.
The documents analyzed by McCright and Dunlap (2003), produced by 14 different conservative think-tanks between 1990 and 1997, conclude that climate skeptics challenge the science of global warming by: -treating supporting evidence as weak or nonexistent; -highlighting the potential net benefits that might result if climate change should occur; and -clarifying that policies designed to address climate change would be economically harmful and ineffective.
Given this increase in climate skepticism, various studies have tried to establish a categorization or typology for it, although a consensus has yet to be achieved owing to the different viewpoints and attitudes associated with climate skepticism (Matthews, 2015). Capstick and Pidgeon (2014) distinguish two categories: -epistemic skepticism: those who question science; and -response skepticism: those who question the value of acting to prevent climate change.
Lahsen (2013) analyzes the positions defended by scientists and distinguishes two types: -mainstream scientists, who show moderate levels of skepticism; and -contrarian scientists, who show a high level of skepticism.
On the other hand, Matthews (2015) analyzes the communication from climate skeptics in blogs and distinguishes three degrees of skepticism: -lukewarmers: who believe that pollution is affecting the planet and will continue to do so but that its impact is less than what the experts predicted; therefore, these scientists do not deny climate change but understand that the generated concern is exaggerated; -moderate skeptics: who do not consider global warming to be a problem, believe that it has been exaggerated, and distrust the scientific theories that defend it; they understand that climate change has occurred throughout history but depends more on natural processes than on human action; and -strong skeptics: who do not believe in the opinions of climate scientists or activists and think they are dishonest and fraudulent.
Social networks have brought about a change in traditional communication structures, making it possible for messages to be spread by citizens so that they coexist alongside the messages of traditional gatekeepers (legacy news media, companies, political parties, or the scientific community). Social media promote a more interaction-oriented and open horizontal communication than legacy media (Dahlberg, 2001). Especially over the last decade, it has been observed that people consult information on social networks to search for information and understand and discuss different scientific topics (Anderson; Huntington, 2017; Su et al., 2015). This represents a great opportunity because it enables social debate on relevant issues such as climate change, but at the same time it can contribute to misinformation and polarization. Williams et al. (2015) propose that the online debate on climate change is polarized with each group of believers/ skeptics considering the position of their opponents to be illegitimate or unnatural. Social media platforms make it easier for anti-climate-change activists to spread their ideas than it would be in legacy news media (Moernaut et al., 2022).
In their work, Bolsen and Shapiro (2017) review the climate change topic in the US news media and the emergence of related frames in the public discourse, focusing on divisions and highlighting the role that events, journalistic practices, technological changes, and individual-level factors such as ideology and identity have played in fostering polarization. They identify the core challenges facing communicators who seek to build consensus for action on climate change and highlight the most viable solutions for generating efficient messages.
We have named the categories of topics to understand social skepticism around climate change through the analysis of users' social activism and behavioral patterns In "The US news media, polarization on climate change, and pathways to effective communication," Bolsen and Shapiro (2017) review the results obtained from various studies over the years regarding the debate taking place about climate change on online platforms and social networks. Regarding YouTube uses in the United States, they identified that post-video discussions among members of the YouTube-viewing public tend to debate the science of climate change regardless of its relevance to the content of the videos to which they are attached (Bolsen; Shapiro, 2017). In other words, the public is using YouTube -and likely other social media discussion platforms-not to deliberate but rather to campaign for increased activism or skepticism about climate change.
One of the recommendations they make is that communicating the existence of a scientific consensus about human-caused climate change shifts the public's belief toward the scientific consensus.
There has been extensive research on social activism on social platforms ( Although different elements of interaction can serve as objects of such analysis, the means most frequently used to identify relevant content are keywords, whether they incorporate a hashtag or not (Zappavigna, 2015 (Table 2). Of all the identified movements about climate change, special attention has been paid to those regarding anti-climate views or climate skepticism. The main platform from which the analyzed content has been obtained is Twitter. For this, user profiles (Hardaker; McGlashan, 2016) or hashtags (Singh et al., 2018) have been used.
Despite its advantages, Twitter can also contribute to disinformation and polarization. Williams et al. carried out an interesting study analyzing user opinions on Twitter and concluded that active users (either skeptics or believers) show strong attitudes in their discussions about climate change, "characterized by strong attitude-based homophily and widespread segregation of users" (Williams et al., 2015, p. 135).
Anderson and Huntington (2017) also analyze the sentiment of comments on Twitter, finding a persistent presence of incivility and sarcasm. The authors find that these characteristics are more frequent among skeptics and those who mention right-leaning politics in their profiles.
It is common to analyze comments in important moments such as weather events (Anderson;  (2004), a shared reality is created through words and their specific uses in a discourse. Accordingly, an analysis of language opens up a way to understand the shared reality, as well as the underlying shared identity, of its participants. As mentioned above, social identity is shaped by individuals' perceptions of their belonging to a social group or groups and by the significance they attach to this (Tajfel, 1974). In this respect, Grover et al. (2019) argues that users' exposure to certain Twitter content can reinforce their previous opinions, thus causing a polarization of such views. A parallel process that can also occur is acculturation, which is defined as adaptions of an individual's views and opinions under the influence of individuals or groups from other cultural backgrounds. This suggests that such interactions should be carefully investigated (Stieglitz et al., 2018). Interestingly, a study that used the information system success model showed that the influence of UGC can occur on not only the user but also organizational and social levels (Alalwan et al., 2017). Therefore, online social movements can be investigated through the analysis of UGC on social platforms.
Contrary to the aforementioned studies, another paradigm that can be very useful in terms of providing a holistic perspective is that of information management (Dwivedi; Kapoor; Chen, 2015; Pace; Buzzanca; Fratocchi, 2016). From this perspective, what matters the most is not the management of information but rather the ways in which information must be provided to initiate changes in individuals' behavior. Through a review of the literature on climate skeptics, we identified an important research gap in previous research, specifically regarding social activism and climate skepticism in UGC. To address this research gap, we investigate herein the association among topics that determine the social skepticism around climate change. Our aim is to identify relevant users' social activism and behavioral patterns.
To this end, in our application of the holistic approach, we focus on both differences ( With regard to the latter, the hypothesis tested in the present study is as follows: (H1): There will be correlations between the UGC topics that identify the social skepticism around climate change through the analysis of users' social activism and behavioral patterns.
The data were collected from Twitter, with a focus on keywords related to social skepticism around climate change.
The collected tweets were then analyzed using corpus linguistics tools. In doing so, we adopted the approach previously proposed by Fujita, Harrigan and Soutar (2018). We also drew on the analysis of the texts about feminism using computational techniques carried out by Al-Nakeeb and Mufleh (2018)

Data collection
Following the work of Reyes-Menéndez, Saura and Álvarez-Alonso (2020), we extracted a sample of tweets to collect the data for subsequent linguistic analysis with keywords related to the climate skeptic movement between World Environment Day (June 5) and October 2, 2022. The optimal sample size was determined using previous studies (Saura; Rodríguez-Herráez; Reyes-Menéndez, 2019; Hardaker; McGlashan, 2016). The criteria used to extract the initial tweets collected in the present study are presented in Table 3, resulting in 78,168 tweets. To collect the database of tweets, we used Python 3.7.0. Next, since our aim was not to analyze multimedia content, a series of quality filters were applied to clean the data, and we eliminated images and videos (Saura; Reyes-Menéndez; Álvarez-Alonso, 2018). To increase the quality of the data, we also removed URLs from the tweets. The Python and Pandas software libraries were used for data cleaning. Specif-ically, the commands to select or replace columns and indices to reshape lost or empty values and to debug repeated or unnecessary data were run. Finally, since retweets represent users' opinions and individual behaviors, they were analyzed separately. Table 4 presents examples of the tweets included in the final sample.  In this way, once the algorithm determines the total number of words and that of repeated words, as well as the number of each of the most frequent words that occur before and after the identified words, each topic is assigned a name. The quality of the data is important for the quality of our model, so we preprocess the data by removing symbols with regular expressions and performing tokenization and delete punctuation and create N-grams (bigrams and trigrams), applying lemmatization and removing stop-words. Using a standardized process of the LDA model based on grounded theory studies, each topic's name is derived from the words within each of the clusters identified.

Corpus linguistics and the latent Dirichlet allocation model
Thus, the LDA model consists of the following two steps: First, all keywords present in the corpus are obtained. Second, the topics linked to these keywords are identified (Reyes-Menéndez et al., 2020). To identify topics in a maximally objective way, the mathematical distribution shown in Equation (1) is employed.
(1) β i is the distribution of word in topic i among a total of K topics θ d is the proportion of topics in document d among a total of D documents z d is the topic assignment in document d z d,n is the topic assignment for the nth word in document d among a total of N words w d is the observed words for document d w d,n is the nth word for document d In the next step, to identify the topics that make up the dataset, we used Gibbs sampling [Equation (2); Jia, 2018] using the Mac version of the Python software LDA 1.0.5. (2)

Results
This section reports the results we obtained on the keywords and frequency related to social skepticism around climate change through the analysis of users' social activism and behavioral patterns identified in our corpus (Section 4.1), the topics (Section 4.2), and the corresponding categories, social activism, and behavioral patterns (Section 4.3)

Keywords and frequency
We carried out an analysis of the keywords in the corpus, considering the importance of the fact that keywords express user behavior and the linguistic importance of the terms (Reyes-Menéndez et al., 2020).
In this same line, the frequency of a term's occurrence in a text is a key measure in corpus linguistics. Frequency is assumed to highlight users' social identity (McEnery;Hardie, 2013). Here, frequency is defined as the number of times a word appears in a given text (Baker et al., 2008). Table 5 lists the frequencies of 10 main words identified in our data. As seen in Table 5, the most frequent term in our data is the keyword "ClimateBrawl" (4,246 times) that was previously identified as a hashtag to be extracted. This was also the case with the hashtag "ClimateHoax" (3,461 times).
The fact that "WEF," which corresponds to "World Economic Forum," is present 1,205 times is interesting, as this is a global forum for economic development. Also, "science" is present 801 times, in terms such as "JunkScience." Additionally, "farmers" is mentioned 766 times, showing the interest that food production has for users.

Topics
Topics in a corpus are clusters of words linked to each other. Accordingly, topics are intrinsically related to their keywords (Reyes-Menéndez et al., 2020). To find topics in our database, the LDA model and its corresponding Equation (1) were used (Section 3.2).
Next, to evaluate our LDA model, we used the metric referred to as topic coherence, which measures the relative distance between words within a topic (Syed; Spruit, 2017; Rama-Maneiro; Vidal; Lama, 2020). It is rare to see a coherence of 1 or +0.9 unless the words being measured are either identical words or bigrams. The overall coherence score of a topic is the average of the distances between words. We attain a value of 0.34 in our LDAs, since there is no strong topic correlation; in other words, the distance between words within topics is not very close.
To determine whether the identified topics are relevant key indicators of social skepticism around climate change using the analysis of users' social activism and behavioral patterns on Twitter, we relied on the measure of coherence. This function, built in Python, searches for an optimal number of topics in the dataset. The graph (Figure 1) shows 28 topics as optimal, with a coherence score of ~0.34 listing the ideal number of topics that will compose the social skepticism around climate change using the analysis of users' social activism and behavioral patterns.
As seen in Figure 2, the 28 identified topics have different contributions to the overall research. The topic with the greatest contribution is topic 8.0.
Below, we present the contribution of the 28 topics identified (Table 6) in the tweets database. We also highlight the main keywords that make up each topic, and each topic has been assigned a name with a randomized controlled process (Jia, 2018).  This visualization reveals that topic 19.0 "biden farmers" in Table 7 is isolated and has a greater distance from the other topics, while topics 14 "dream," 18 "green new deal," and 23 "support" lie in the same quadrant and can form a category of topics.

Categories of topics, social skepticism, and behavioral patterns around climate change
We have named the categories of topics to understand social skepticism around climate change using the analysis of users' social activism and behavioral patterns. Furthermore, to identify the different categories in which the topics fall, a name has been assigned through a randomized controlled process (Jia, 2018).
The groupings of topics explained above serve as the basis for the development of the categories of the social identity and behavioral patterns. In this way, we obtained the following three different categories: -Biden; -Green New Deal; -Hoax.
Which topics correspond to which categories is explained in Table 7.   This category includes all the other topics that are also linked by the hashtags (#) included in the search and other hashtags such as #ClimateCrisis or #ClimateScam. The fact that these hashtags are strongly present in all the topics means that they are presented as a consolidated category determined by the strong use of hashtags linked to anti-climate-change activism.
As can be seen in Table 7, there are a series of topics that are well determined and cohesive and that are of considerable size. These topics are "dream," "green new deal," and "support," and they belong to the category Green New Deal.   It should be highlighted that the topic "biden farmers," owing to its position in the intertopic distance map and its size, is a singleton category, while the rest of the topics overlap, composing the last category referred to as Hoax. This category includes all of the other topics that are also linked by the hashtags (#) included in the search as well as other hashtags such as #ClimateCrisis or #ClimateScam. The fact that these hashtags are strongly present in all the topics means that they are presented as a consolidated category determined by the strong use of hashtags linked to anti-climate-change activism.

Discussion
In the present study, we used a systematic literature review to identify, evaluate, and synthesize social skepticism around climate change indicators through an analysis of users' social activism and behavioral patterns on Twitter. Our study answers Veltri and Atasanova (2017) Matthews, 2015). This is in line with the results obtained herein because the political presence is evident in the topic analysis (Section 4.2) with topics such as topic 19.0 "biden farmers," "trump," "politics," and "obama." Social media platforms make it easier for anti-climate change activists to spread their ideas than it would be in legacy news media -that Twitter can promote polarization and misinformation  Anderson; Huntington, 2017) as users tend to search for opinions similar to their own in order to reinforce them (Grover et al., 2019); and -that previous studies highlight that the degree of education and scientific knowledge is not a decisive factor in explaining whether people take a position for or against.
Moreover, within the scientific community there are opinions that deny or question the importance of climate change (Lahsen, 2013) and that, therefore, may encourage those opinions to have greater credibility.
Among the theoretical implications would be the development of new research based on the results obtained, (e.g., the relationship between political events and the polarization of opinion on climate change) and the fact that, by using data from Twitter, it is possible to analyze the discourse of climate change skeptics.
Among the limitations of this work are the number of data extracted, the hashtags used, the language of the extraction, the date selected, and the analysis carried out, which does not identify whether the comments are positive or negative, thus we cannot know whether they are for or against the arguments presented in the topics.
Future lines of research could include the modeling of the different topics identified and a model that integrates opinions based on political preferences, as well as longitudinal analysis using data extracted from the different editions of WED on Twitter to determine how the conversation and behavioral patterns evolved.

Conclusions
In this study, we used machine learning and artificial intelligence techniques to review 78,168 tweets to identify the keys of social skepticism around climate change indicators through an analysis of users' social activism and behavioral patterns on Twitter. These results were analyzed in depth to address the aim of this research.
Based on our results, we were able to answer hypothesis H1, that there will be correlations between the UGC topics that identify the social skepticism around climate change through the analysis of users' social activism and behavioral patterns. Specifically, we identified 28 topics that, in turn, could be grouped into 3 categories that identify the social skepticism around climate change through the analysis of users' social activism and behavioral patterns ( Table 7).
The investigation has produced a series of results that confirm the proposed hypothesis.
In addition, some relevant conclusions have been obtained. The first is that 24 of the 28 topics are overlapping on the intertopic distance map. The second is that the size of the topics is relatively small and linked to specific events. The third is that there is a significant political presence, especially from the United States.
There is a group of topics, 24 of the 28, that appear superimposed such that, although they use different words and therefore form different clusters, they have a close relationship and, therefore, appear not only close but also superimposed. This opens up the possibility for new research focused only on these topics to better understand the reason for this overlap, although it does not permit their combination into a single larger topic.
The size of the topics is relatively small. There is no big theme, which means that attention is divided among the 28 themes developed by the skeptics opposed to climate change. This may be due to the fact that each of the user groups defends a viewpoint on a specific climate change topic, without any of them having gone viral, or because of their temporary nature. Themes arise, but none stay around for a long time. For example, in the political arena, one can mention the different political leaders in their corresponding topics ("trump", "biden", and "obama").
In relation to this point, note that the small size of the topics may be related to the relationship between the communication actions against climate change and the specific facts related to it, for example, Biden's proposal to support farmers in the topic "biden farmers", the "covid" health crisis, or the "wef" meeting. Regarding this point, the important presence of the United States, with its different presidents, also stands out, while there is no mention of other leaders of other countries.

References
Al-Nakeeb, Ohood A. M. S.; Mufleh, Basher A. H. (2018). "Collocations and collocational networks of characters: A corpus-based feminist stylistic analysis". Language in India, v. 18, n. 9. https://onx.la/05e12 There will be correlations between the UGC topics that determine the social skepticism around climate change through the analysis of users' social activism and behavioral patterns We identified 28 topics that, in turn, could be grouped into 3 categories that identify the social skepticism around climate change through the analysis of users' social activism and behavioral patterns