The Voice of the Guests: Analysing Airbnb Reviews as a Representative Source for Tourism Studies

User-generated content on social media has led to a new form of communication known as electronic word of mouth, which generates millions of comments about goods and services on the internet every day. This openly accessible content is crucial for prospective consumers as it helps in decision-making, but it is also valuable for product or service providers, as it allows them to improve their businesses based on user reviews, some of which are highly detailed. Another interest group that benefits from these comments are researchers and academics, as it allows them to obtain and analyse information for their studies at a relatively low cost in terms of time and money. The present study aims to perform a sentiment analysis of comments posted by guests staying at a property offered by Airbnb to determine whether their opinions about their experience are positive or negative. However, before doing so, it is necessary to find out the percentage of people who write a review about the service received on Airbnb to verify the representativeness of the reviews on this platform. To achieve this, thousands of comments posted in one year on Airbnb for the four most touristic cities in Spain are analysed: Madrid, Barcelona, Seville, and Valencia. The results show that opinions on Airbnb are much more representative compared to other platforms, as a very high participation rate is calculated. Furthermore, these opinions are predominantly positive, indicating a high level of satisfaction with the service provided.


Introduction
The sharing economy movement has revolutionized the tourist accommodation industry in recent years.Platforms like Airbnb have allowed private individuals to offer their properties to travellers as an alternative to traditional hotels (Meleo; Romolini; De Marco, 2016) thanks to information and communication technologies (ICT) that have made it easier for buyers and sellers conveniently to come into direct contact (Martin-Fuentes; Mellinas, 2018), in addition to distinguishing themselves from traditional accommodation by offering guests the chance to "feel at home" (Liu;Mattila, 2017).e330202 Profesional de la información, 2024, v. 33, n. 2. e-ISSN: 1699-2407 2 In turn, ICT have led to a change in the communication system due to their immediacy but also thanks to the growth of social networks, which have completely revolutionized the way in which people interact with each other (Patmanthara;Febiharsa;Dwiyanto, 2019) and are a rich source of information (Arora et al., 2019).
While social networks are gaining importance, so too are the opinions of travellers and their influence on online sales; in the tourist accommodation sector, the active participation of this industry in these networks is a fundamental part for customer loyalty and the fulfilment of their expectations in online social networks (Jiménez García; Pérez Delgado, 2018).Users within social networks contribute to giving greater quality satisfaction and value to the product by sharing their opinions, feelings and experiences (Chae;Ko;Han, 2015).They are even a reference tool that can play an important role, for example, when planning a trip (Xiang;Gretzel, 2010).
Opinions expressed through social networks and opinion platforms or even on platforms for the distribution and sale of goods and services influence the decision-making of other consumers (Cheng;Ho, 2015), help improve the service on offer (Naeem, 2019), and are a source of data for academics who base their research on these opinions (Guo;Liu;Wu;Zhang, 2021).This is the case with Airbnb, where guests use comments and feedback to assess the quality and authenticity of the accommodation and the reliability of the host, while hosts rely on comments and feedback to attract potential guests to their property.As a result, comments and feedback by real people play a vital and decisive role in building trust and security, facilitating the following guest's booking process (Cruz; Freitas, 2021).These reviews have not only served as a source of data for numerous investigations, but have also allowed us to ascertain customer satisfaction through sentiment analysis (Cavique;Ribeiro;Batista;Correia, 2022), identify customer and host complaints (Cenni;Vásquez, 2023), and examine the image of the tourist destination (Lalicic; Marine-Roig; Ferrer-Rosell; Martin-Fuentes, 2021), among other relevant aspects.
Airbnb's referral system is characterized by its interactive, two-way approach.At the end of the stay, both guest and host receive a message from Airbnb inviting them to leave a review within 14 days.One distinctive feature of Airbnb is that once the message to write a review has been sent, the Airbnb system prevents the parties from accessing each other's reviews until both have completed their reviews or until the 14-day period has elapsed.
The analysis of guest satisfaction or dissatisfaction can be difficult due to the large volume of data generated (Martin-Fuentes; Mellinas, 2018).Moreover, automatically classifying a text written in natural language as being positive or negative in sentiment (Pang;Lee, 2008) can be complicated for human interpretation, since it can differ when influenced by cultural factors and experiences.Most existing approaches to sentiment analysis have been carried out using the semantic approach which assumes that sentiment is expressed explicitly through affective words (Saif; Fernandez; He; Alani, 2014).In the last decade, the main sources of this approach for obtaining information on feelings have been online websites and social networks (Akankasha; Arora, 2019).
Hence, this study aims to expand the research related to sentiment analysis, also known as opinion mining (Arcila-Calderón; Barbosa-Caro; Cabezuelo-Lorenzo, 2016), through the semantic analysis of the text in the comments of the hospitality and tourism services platform, Airbnb, in order to ascertain the level of satisfaction of active users automatically and using massive data analysis.However, first it is necessary to check whether the percentage of people who write reviews on Airbnb is representative, since some studies indicate that there may be biases in the degree of representativeness in the use of reviews in research from other similar platforms such as Booking.com or TripAdvisor (Mellinas;Martin-Fuentes, 2022;Mellinas, 2019a;2019b).
To do so, we analyse data from the Airbnb platform for the four Spanish cities with the most tourists in recent years (2022 and 2023); Barcelona, Madrid, Seville and Valencia (INE, 2023).
The findings of this article allow comparing the level of representativeness of scientific research in tourism that use data from user-generated content on this platform compared to studies that use data collected from other platforms, as well as to find out the sentiment, through the semantic analysis of thousands of opinions of guests who have contracted accommodation through Airbnb and compare the results between the four cities under study.

Analysis of the Literature
Reviews on platforms like Airbnb provide a wide variety of valuable information for tourism research (Wong; Lin; Lin; Xiong, 2022), including guest preferences and expectations, highlights concerning the accommodation and its amenities, and guest satisfaction, demonstrating that most reviews are largely positive (Bridges;Vásquez, 2018), as the guest often avoids damaging the host's reputation due to the personal bond they have established (Melián-González; Bulchand-Gidumal, 2020).Moreover, the findings of the study by Ye; Liang; Wei, and Law (2023) show that guests with previous positive reviews tend to be more satisfied with their future bookings.
Along the same lines, studies have also been carried out in some of the cities included in this article using the sentiment analysis method, including a comparison of Madrid and Bogotá (Vargas-Calderón; Moros Ochoa; Castro Nieto; Camargo, 2021).There is also a study of tourist overcrowding in the city of Barcelona with tourists of Chinese origin (Alonso-Almeida; Borrajo-Millán; Yi, 2019).However, the sentiments reported in Airbnb reviews in the four cities studied with such a large volume of data have not yet been analysed.

Review Representativeness
On Airbnb, unlike other accommodation platforms such as online travel agencies, the host can also comment on the guest (Lladós-Masllorens; Meseguer-Artola; Rodríguez-Ardura, 2020).The veracity and credibility of reviews on the Airbnb platform is very relevant (Cruz; Freitas, 2021), hence only people who have stayed at the accommodation can post reviews for which they dispose of 14 days following their last night spent in the lodging.
Another way of analysing consumer opinions would be through surveys (Guttentag;Smith, 2017).There is widespread debate among academics as to whether they should be carried out online or in person.Studies corroborate that both online and traditional mail surveys are susceptible to sample space and time limitations (Benítez-Aurioles, 2022).As a result, neither technique can be considered completely free of bias (Dolničar;Boh Podgornik, 2019).Likewise, it should be positively stressed that through the analysis of surveys, the sample can be segmented for greater study detail; selfselection by factors such as age or country of origin, among others.
There is also a wide variety of research on the representativeness of samples in reviews on platforms such as Expedia.es,Booking.com or Tripadvisor; but no scientific research has analysed the representativeness of reviews on the most important and relevant short-term accommodation rental platform, Airbnb.
Studies conducted to find out the response rate of writing reviews about hotels on TripAdvisor reveal that only 2% of customers write them (Mellinas, 2019a) .This percentage varies for each hotel or geographic area, depending on TripAdvisor's popularity and the actions hotels take to increase the number of reviews.Booking.comappears to have higher participation rates than TripAdvisor (around 20%), which could be due to guests receiving an email after their stay with an invitation to write a review (Mellinas, 2019a).The study by Mellinas (2019b) concludes that the response rate on Booking.com for urban hotels in Spain is at around 40%, while Booking.com in 2019 states that 38% of its guests leave reviews (Mellinas;Martin-Fuentes, 2022).Another study found that the average percentage of travellers writing reviews on TripAdvisor was between 0.382% and 0.644% in hotels in Belgrade (Mašić;Vićić, 2018) and 1.88% in Valencia and 2.95% in Barcelona, which may be due to the percentage of English speakers visiting each city (Mellinas, 2019a).
Meanwhile, Mellinas and Martin-Fuentes (2022) note that the rate of participation by customers commenting on hotels on TripAdvisor in European capitals is close to 2%, but there are substantial differences between hotels at the same destination, some with participation rates below 1% and others above 3%.Such differences could depend on the level of involvement of each hotel in harnessing reviews, as well as the size of the hotel, as there is a negative correlation between the number of rooms in a hotel and its ability to collect reviews on TripAdvisor (Mellinas; Martin-Fuentes, 2019).
In addition to the representativeness of reviews on tourist accommodation, it has also been researched with regard to tourist attractions (Mellinas;Martin-Fuentes, 2022) .Although most tourist sites on TripAdvisor have thousands of reviews, giving an initial impression of high user participation rates, the reality is that participation levels are low, approximately 20 times less than the participation rate (0.075%) of guests staying in hotels (Mellinas;Martin-Fuentes, 2022).This observation is in line with the suggestion that there are potential biases and limitations in the use of TripAdvisor data for tourism research, as well as through online surveys or other methodologies.
As mentioned, there are no scientific articles on the percentage of reviews written by Airbnb guests and hosts, but some business reports show differences on this topic.Some speak of 68% of guests (Airbnb, 2020), whereas InsideAirbnb speaks of approximately 50% (Villeneuve; O'Brien, 2020), while Brian Chesky, CEO and co-founder of Airbnb, confirmed that 80% of hosts leave a review for their guests and 72% of guests leave a review for their hosts (Chesky, 2013).According to the Keycafe blog, "community forums report a wide variation between 33% and 85% for any individual property", although they confirm that "the average review rate is 78% according to Airbnb".

Methodology
In order to measure, firstly, the representativeness of the reviews on the Airbnb platform, before performing the sentiment analysis, the same methodology has been applied as in other studies on this subject (Mellinas, 2019a;2019b;Mellinas;Martin-Fuentes, 2022).This methodology is based on analysing the number of reviews per property to establish the percentage of responses.In addition, the average occupancy rate of each city, the average length of stay and the number of nights each property is available are also calculated.The data was downloaded from the InsideAirbnb open access website (InsideAirbnb, 2023), a project whose mission is to provide data and advocacy on the impact of Airbnb on residential communities.The aggregate data compiles 24,278 listings and a total of 522,999 new reviews generated in the past year for properties available on Airbnb.
All InsideAirbnb variables, including accommodation type, price, listing identification, name, minimum number of nights' lodging, and licences, have been downloaded.For this study, the following were used: type of establishment, number of reviews in the last year/month, total days on which the listing is available, and date on which the review was posted.
Another source of data that has been used is the Airdna platform (Airdna, 2023) to obtain the average annual occupancy rate of the four cities analysed.Finally, the average number of nights spent in tourist apartments has been extracted from the Spanish National Institute of Statistics (INE, 2023) for the different cities analysed.
With regard to the accommodation for inclusion in the analysis, according to Rechavia (2018), a lodging has been deemed active if it is available between one and 365 days in the current year and if it has recorded at least one comment in the last eight months.Thus, from the database of properties available on Airbnb between April 2022 and March 2023, properties that had not received any comment in the previous eight months were discarded as inactive.On the other hand, it should be noted that of the four types of accommodation offered by Airbnb (private room, shared room, hotel room and entire dwelling) this study focuses on the entire dwelling type of accommodation, since each reservation (of an entire apartment) is unique, and the platform only allows the guest to write one review.For the sentiment analysis herein, the MeaningCloud tool (Martínez et al., 2016;Singh;Singh, 2021) has been applied to 187,494 reviews written in Spanish, downloaded from the InsideAirbnb website (Hu; Lin; Liu; Ma, 2023; Prentice; Pawlicz, 2024) and generated by Airbnb guests and hosts in the same period of time and location as analysed for representativeness.
MeaningCloud is a software platform that provides a wide variety of tools to process text and analyse text and voice mining in multiple languages.Its purpose is to provide researchers with a solution to problems related to natural language processing with functions such as summarizing and extracting topics, language identification and sentiment analysis, in several languages (MeaningCloud, 2023).
MeaningCloud's sentiment analysis application programming interface (API) uses semantic approaches, based on advanced processing of natural language in all aspects of morphology, syntax, semantics and pragmatics.First, it generates the syntactic-semantic tree of the text, and, thereupon, it applies the terms of the lexicon by propagating the polarity values throughout the tree, combining the values appropriately depending on the morphological category of the word and the syntactic relationships that affect them.
Specifically, the field score tag of the MeaningCloud API response has been used for this study, which indicates the overall polarity of the text in six different categories: strong positive (P+), positive (P), neutral (NEU), negative (N), strong negative (N+) and without sentiment (NONE).And finally, also the agree or disagree field score tag; this add-on has three differentiators: it extracts sentiment based on aspects, distinguishes facts and opinions, and detects irony and polarity disagreement (MeaningCloud, 2023).

Results
As can be seen in Table 1, the results show a very high rate of representativeness in the writing of reviews on Airbnb for the four Spanish cities analysed.Madrid receives the highest number of reviews, with a total of 214,032, and an average guest stay of 4.7 nights in tourist apartments, with an average availability of listings of 197 nights per year and a 77% occupancy rate.The result obtained for the participation rate is 71%.
Barcelona is the destination with the longest length of stay in tourist apartments (6.1 nights).We analysed a total of 6,630 listings in which 137,421 new reviews were exchanged in the last year.These listings show a very similar average to Madrid for available nights, with a total of 201.Barcelona stands out for being the city with the highest occupancy rate, 84%, and the highest rate of representativeness for the four cities in the writing of reviews, with 75%.
In Sevilla, a total of 101,830 new annual reviews are analysed from 4,588 listings, with an average stay of 3.3 nights and 173 days a year of apartment availability.It scores 71% occupancy, and has the lowest review participation rate of the four cities, at 60%.
Meanwhile, Valencia is the city with the fewest reviews in the last year, with 70,716.It also has the lowest number of apartments on offer, an average length of stay of 4.6 nights, and an average of 164 days of apartment availability.We would highlight its 81% occupancy rate despite which it comes third out of the four cities analysed for its participation rate, at 67%.
In summary, the results show that the rate of participation in the Airbnb platform (for entire dwelling accommodation) in Spain is high and more so when compared to other studies on the degree of representativeness in other opinion and tourism services sales platforms.
However, the results regarding the degree of Airbnb user satisfaction for the four cities broken down by quarters between April 2022 and March 2023, show very high satisfaction, as can be seen in Table 2.
Regarding Madrid, there is a total of 108,877 reviews written in Spanish in the year of study that achieve satisfaction of between 76 and 82%.It reaches its highest point of satisfaction compared to the other Spanish cities in the last quarter of 2022.In relation to Barcelona, the total number of reviews in Spanish is 24,285; the level of satisfaction is between 75 and 78%, slightly lower than the figures for Madrid.For Sevilla, satisfaction in the reviews is also high, at between 78 and 82%, and in Valencia it is slightly lower but also very high, at between 70 and 75%, as in Barcelona.
Profesional de la información, 2024, v. 33, n. 2. e-ISSN: 1699-2407 6 For greater detail, the quarterly analysis has been furthered to obtain the temporal trend.In this regard, it can be seen that it is in the second quarter, spring, that more positive comments were posted, both for Barcelona and for Valencia, while in Madrid the best quarter is the fourth (between October and December), and in Sevilla, the summer months.In Table 3, the analysis of types of sentiment is carried out.The six aforementioned categories are examined (negative (N), strong negative (N+), neutral (NEU) and without sentiment (NONE), positive (P) and strong positive (P+)).A clear majority of user reviews with positive feelings towards their experiences in the accommodation in the four cities is appreciated.
By merging all positive feelings, i.e., positive (P) and strong positive (P+), following the same aggregation methodology as applied in the study of user experience (Sanchis-Font et al., 2021), results of between 88 and 93% positivity were concluded for all quarters and cities studied.

Conclusions
This study shows a high rate of participation in reviews on Airbnb, far higher compared to other platforms such as Booking.com for reservations in urban hotels in Spain, which scored between 38 and 40% (Mellinas, 2019b), or Tripadvisor, where the participation rate for European hotels was close to 2% (Mellinas; Martin-Fuentes, 2022).In the specific case of cities such as Barcelona, the aforementioned study recorded participation rates of 1.39%, and of 1.97% for Madrid, far lower than those of the present research, where it is observed that the percentage of Airbnb guests who comment on their experience is of 75% and 71% of total reservations, respectively.
Thus, digitally user-generated content on social networks for research in tourism and in the hospitality sector has been used as a source of data in recent years in numerous studies, such as this one, but it is important to take into account the representativeness of the sample to contribute to the academic literature.Specifically, studies that resort to Airbnb guest reviews yield a higher degree of representativeness compared to those based on reviews from other tourism and hospitality industry platforms.
The high participation rates on Airbnb compared to TripAdvisor may be due to the fact that the Airbnb platform notifies both the guest and the host via a message at the end of the stay so that they can post a comment within a short period of 14 days.This system increases participation on this platform compared to TripAdvisor where a person The Voice of the Guests: Analysing Airbnb Reviews as a Representative Source for Tourism Studies e330202 Profesional de la información, 2024, v. 33, n. 2. e-ISSN: 1699-2407 7 wishing to leave a review has to enter voluntarily to write one, since TripAdvisor does not control customer reservations.In this sense, the system of requesting a rating after the stay is similar to the one used by Booking.com, which also has higher levels compared to TripAdvisor, although far below those reported in this study with Airbnb reviews.
Therefore, one of the main functional differences between Airbnb and the other platforms that collect opinions is the previously mentioned two-way guest-host evaluation system, which encourages curiosity and participation, since both parties are usually interested in knowing what has been said about them as soon as possible, thus encouraging them to leave their own review.This could be one of the main reasons why the percentage representativeness of reviews is so high on Airbnb compared to other rating systems.
Having verified the high degree of representativeness of the sample of reviews, the sentiment analysis that has been carried out on Airbnb reviews makes more sense since the percentage of representativeness of the written reviews is very high.Drawing an analogy with studies carried out based on surveys that enquire as to satisfaction with a tourist experience, it is not the same to get a response from 1.37% as from 75% of the total number of people who have participated in said experience, so the sample margin of error will be much lower in the second case, and therefore the level of confidence of the results will be much higher.
However, the sentiment analysis conducted in this study shows a high degree of positivity in the experience of Spanishspeaking tourists, between 88 and 93%, reflected through the opinions posted on the Airbnb platform.This result is similar to that of other studies that focused only on the city of Barcelona, where 94% positivity was obtained in the reviews (Güçlü;Roche;Marimon, 2020).
The high participation rates and high positive sentiment found in user-generated comments on the Airbnb social network compared to the other platforms could be due to various factors.On the one hand, upon arrival at the property rented through Airbnb, personal ties are generated in communications, such as the first impression at the entrance when meeting the person/family/agent who owns the accommodation rather than the communication that a hotel reception may provide.This personal bond, reinforced by the principle of 'liking' (Cialdini, 2021; Halttu; Oinas-Kukkonen, 2022), suggests that people tend to value more positively the experiences and services provided by individuals with whom they feel a personal connection or affinity.
This personal bond makes the ratings generally more positive (Melián-González; Bulchand-Gidumal, 2020), as we corroborate in the polarity analysis of this study, reaching rates of 90% positive sentiment towards the experience.It can also motivate the host to encourage the guest to write a review about their stay to increase their reputation on the platform and, consequently, to increase the percentage of reviews that are written on this platform.This communication is reinforced by external software integrated into the chat to synchronize messages.This tool automates and prepares templates, such as thanking guests for their reservation, instructions for check-in, or a reminder to leave their reviews.Closer communication leads to a more personal, and therefore more positive, bond with the traveller.
From the point of view of the practical and theoretical implications, as well as the importance of comments on Airbnb, there is a higher response rate compared to other platforms, and it can be concluded that for scientific research in tourism that uses user-generated content, data from Airbnb is more representative compared to data collected from other platforms where the percentage consumer response is considerably lower.
Previous studies have already revealed the importance of the use of user-generated content in other areas and booking platforms for tourism products and services, but we go a step further and analyse the reviews of the platform considered the most disruptive in the tourist accommodation industry.Likewise, this study can help accommodation service providers, whether individuals or companies, to understand customers' feelings about the service offered, which can help them make the necessary adjustments to optimize customer satisfaction.
It is also observed that, in the quarterly sentiment analysis, there are small differences in the satisfaction voiced in the opinions depending on the quarter in which the visit takes place.Airbnb properties are most highly rated in spring in the coastal cities analysed, in autumn in Madrid, and in summer in Sevilla.In this regard, in the future, the content of the opinions could be further analysed to seek to discover the reasons for these variations, whether it is due to the properties themselves or it is the destination that has to do with these quarterly differences in ratings.
In addition, sentiment could be analysed with approaches based on computational learning.Using a supervised learning algorithm from a collection of annotated texts, techniques based on support vector machines (SVM), naive Bayes and Knearest neighbour (KNN), latent semantic analysis (LSA) and even deep learning with artificial intelligence can be utilized.
Thie main limitation of the present study is that it focuses on sentiment analysis, another way of studying the classification of text in order to extract topics of interest in reference to the user experience.Thus, the sample focused on the four most important cities in Spain in terms of urban tourism.In the future, research could be extended to study other destinations to compare whether the findings are similar or if there are significant variations. e330202 Profesional de la información, 2024, v. 33, n. 2. e-ISSN: 1699-2407 8

Funding
This study was subsidized by: -Spanish Ministry of Industry, Trade and Tourism, funded by the European Union -Next Generation EU, within the GASTROTUR project [Ref: TUR-RETOS2022-017] "Revalorización de los destinos a través de los aspectos semióticos de la imagen gastronómica y del contenido generado por los turistas" (Revaluation of destinations through the semiotic aspects of the gastronomic image and the content generated by tourists).-Spanish Ministry of Science and Innovation within the RevTour project [Ref: PID2022-138564OA-I00] "Uso de las reseñas en línea para la inteligencia turística y el establecimiento de estándares de evaluación transparentes y confiables" (Use of online reviews for tourism intelligence and for the establishment of transparent and reliable evaluation standards).-Project TradiTur [Grant Id.TED2021-129763B-I00] "Retos para la transición digital en turismo: análisis de la inteligencia turística y propuestas normativas" (Challenges for the digital transition in tourism: analysis of tourism intelligence and regulatory proposals), and finally the Institute of Social and Territorial Development within the ResTur project for the 2023CRINDESTABC call.

Bibliography
The data refer to the four most visited cities in Spain: Barcelona, Madrid, Sevilla and Valencia .Recent studies have focused on the parallel study of Barcelona and Madrid on the Airbnb platform (Cerdá-Mansilla; Henche; Devesa, 2021), both of which are smart cities whose residents and visitors are major communicators on social networks (Molinillo; Anaya-Sánchez; Morrison; Coca-Stefaniak, 2019).At the same time, we should include studies that analyse data from Barcelona, Madrid and Sevilla to determine different accommodation price characteristics, highlighted for being the three most visited cities in Spain(Tong;Gunter, 2022) as well as specific research on the The Voice of the Guests: Analysing Airbnb Reviews as a Representative Source for Tourism Studies e330202 Profesional de la información, 2024, v. 33, n. 2. e-ISSN: 1699-2407 5 gentrification of the city of Valencia (Gil García;Martínez López, 2023) or the image of these four destinations through Airbnb reviews(Lalicic et al., 2021).

Table 1 :
Results for each City Analysed.

Table 2 :
Satisfaction between Reviews Analysed Using MeaningCloud.