Semantic similarity models for automated fact-checking: ClaimCheck as a claim matching tool
DOI:
https://doi.org/10.3145/epi.2023.may.21Keywords:
Verification, Automated fact-checking, Claim matching, Semantic similarity, Paraphrase models, Disinformation, Artificial intelligence, AI, Algorithms, SoftwareAbstract
This article presents the experimental design of ClaimCheck, an artificial intelligence tool for detecting repeated falsehoods in political discourse using a semantic similarity model developed by the fact-checking organization Newtral in collaboration with ABC Australia. The study reviews the state of the art in algorithmic fact-checking and proposes a definition of claim matching. Additionally, it outlines the scheme for annotating similar sentences and presents the results of experiments conducted with the tool.
Downloads
References
Adair, Bill (2021). "The lessons of Squash, Duke´s automated fact-checking platform". Poynter, 16 June. https://www.poynter.org/fact-checking/2021/the-lessons-of-squash-the-first-automated-fact-checking-platform
Adair, Bill; Li, Chengkai; Yang, Jun; Yu, Cong (2018). Automated pop-up fact-checking: challenges & progress. https://ranger.uta.edu/~cli/pubs/2019/popupfactcheck-cj19-adair.pdf
Agadjanian, Alexander; Bakhru, Nikita; Chi, Victoria; Greenberg, Devyn; Hollander, Byrne; Hurt, Alexander; Kind, Joseph; Lu, Ray; Ma, Annie; Nyhan, Brendan; Pham, Daniel; Qian, Michael; Tan, Mackinley; Wang, Clara; Wasdahl, Alexander; Woodruff, Alexandra (2019). "Counting the Pinocchios: the effect of summary fact-checking data on perceived accuracy and favorability of politicians". Research & politics, v. 6, n. 3. https://doi.org/10.1177/2053168019870351
Arslan, Fatma (2021). Modeling factual claims with semantic frames: definitions, datasets, tools, and fact-checking applications. Doctoral dissertation. The University of Texas at Arlington. https://rc.library.uta.edu/uta-ir/bitstream/handle/10106/30765/ARSLAN-DISSERTATION-2021.pdf
Babakar, Mevan; Moy, Will (2016). The state of automated factchecking. How to make factchecking dramatically more effective with technology we have now. Full Fact. https://fullfact.org/media/uploads/full_fact-the_state_of_automated_factchecking_aug_2016.pdf
Baker, Collin F.; Fillmore, Charles J.; Lowe, John B. (1998). "The Berkeley FrameNet project". In: Proceedings of the joint conference of the international conference on computational linguistics and the Association for Computational Linguistics (Coling-ACL), pp. 86-90. https://aclanthology.org/C98-1013.pdf
Beltrán, Javier; Míguez, Rubén; Larraz, Irene (2019). "ClaimHunter: an unattended tool for automated claim detection on Twitter". KnOD@WWW. CEUR workshop proceedings, v. 2877, n. 3. https://ceur-ws.org/Vol-2877/paper3.pdf
Corney, David (2021a). "How does automated fact checking work?". Full Fact, 5 July. https://fullfact.org/blog/2021/jul/how-does-automated-fact-checking-work
Corney, David (2021b). "Towards a common definition of claim matching". Full Fact, 5 October. https://fullfact.org/blog/2021/oct/towards-common-definition-claim-matching
Dolan, William B.; Brockett, Chris (2005). "Automatically constructing a corpus of sentential paraphrases". In: Proceedings of the third international workshop on paraphrasing (IWP2005), pp. 9-16. https://aclanthology.org/I05-5002.pdf
Floodpage, Sebastien (2021). "How fact checkers and Google.org are fighting misinformation". Google, 31 March. https://blog.google/outreach-initiatives/google-org/fullfact-and-google-fight-misinformation
Graves, Lucas (2018). Understanding the promise and limits of automated fact-checking. Reuters Institute for the Study of Journalism. Factsheets. https://ora.ox.ac.uk/objects/uuid:f321ff43-05f0-4430-b978-f5f517b73b9b
Hassan, Aumyo; Barber, Sarah J. (2021). "The effects of repetition frequency on the illusory truth effect". Cognitive research: principles and implications, v. 6, n. 38. https://doi.org/10.1186/s41235-021-00301-5
Hassan, Naeemul; Adair, Bill; Hamilton, James T.; Li, Chengkai; Tremayne, Mark; Yang, Jun; Yu, Cong (2015). "The quest to automate fact-checking". In: Proceedings of the 2015 computation + journalism symposium. Columbia University. http://cj2015.brown.columbia.edu/papers/automate-fact-checking.pdf
Hassan, Naeemul; Arslan, Fatma; Li, Chengkai; Tremayne, Mark (2017). "Toward automated fact-checking: detecting check-worthy factual claims by ClaimBuster". In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD "˜17). New York: Association for Computing Machinery, pp. 1803-1812. https://doi.org/10.1145/3097983.3098131
Hí¶velmeyer, Alica; Boland, Katarina; Dietze, Stefan (2022). "SimBa at CheckThat! 2022: lexical and semantic similarity based detection of verified claims in an unsupervised and supervised way". In: CLEF 2022: Conference and labs of the evaluation forum, 5-8 September, Bolonia, Italia. https://ceur-ws.org/Vol-3180/paper-40.pdf
Jiang, Ye; Song, Xingyi; Scarton, Carolina; Aker, Ahmet; Bontcheva, Kalina (2021). "Categorising fine-to-coarse grained misinformation: an empirical study of Covid-19 Infodemic". Arxiv. https://doi.org/10.48550/arXiv.2106.11702
Kazemi, Ashkan; Garimella, Kiran; Gaffney, Devin; Hale, Scott A. (2021). "Claim matching beyond English to scale global fact-checking". In: Proceedings of the 59th Annual meeting of the Association for Computational Linguistics and the 11th International joint conference on natural language processing. Association for Computational Linguistics, pp. 4504-4517. https://doi.org/10.18653/v1/2021.acl-long.347
Kazemi, Ashkan; Li, Zehua; Pérez-Rosas, Verónica; Hale, Scott A.; Mihalcea, Rada (2022). "Matching tweets with applicable fact-checks across languages". Arxiv. https://doi.org/10.48550/arXiv.2202.07094
Kessler, Glenn; Fox, Joe (2021). "The false claims that Trump keeps repeating". The Washington Post, 20 January. https://www.washingtonpost.com/graphics/politics/fact-checker-most-repeated-disinformation
Lan, Zhenzhong; Chen, Mingda; Goodman, Sebastian; Gimpel, Kevin; Sharma, Piyush; Soricut, Radu (2020). "ALBERT: a lite Bert for self-supervised learning of language representations". In: Conference paper at International conference on learning representations (ICLR). Arxiv. https://doi.org/10.48550/arXiv.1909.11942
Lim, Chloe (2018). "Checking how fact-checkers check". Research & politics, v. 5, n. 3. https://doi.org/10.1177/2053168018786848
Mansour, Watheq; Elsayed, Tamer; Al-Ali, Abdulaziz (2022). "Did I see it before? Detecting previously-checked claims over Twitter". Lecture notes in computer science, pp. 367-381. https://doi.org/10.1007/978-3-030-99736-6_25
Martín, Alejandro; Huertas-Tato, Javier; Huertas-García, Álvaro; Villar-Rodríguez, Guillermo; Camacho, David (2021). "FacTeR-check: semi-automated fact-checking through semantic similarity and natural language inference". Arxiv. https://doi.org/10.48550/arXiv.2110.14532
Mukherjee, Amit; Sela, Eitan; Al-Saadoon, Laith (2020). "Building an NLU-powered search application with Amazon SageMaker and the Amazon opensearch service KNN feature". Amazon SageMaker, artificial intelligence, 26 October. https://aws.amazon.com/es/blogs/machine-learning/building-an-nlu-powered-search-application-with-amazon-sagemaker-and-the-amazon-es-knn-feature
Murray, Samuel; Stanley, Matthew; McPhetres, Jon; Pennycook, Gordon; Seli, Paul (2020). ""˜I´ve said it before and I will say it again"¦´: repeating statements made by Donald Trump increases perceived truthfulness for individuals across the political spectrum". PsyArXiv preprints, 15 January. https://doi.org/10.31234/osf.io/9evzc
Nakov, Preslav; Corney, David; Hasanain, Maram; Alam, Firoj; Elsayed, Tamer; Barrón-Cedeño, Alberto; Papotti, Paolo; Shaar, Shaden; Da-San-Martino, Giovanni (2021). "Automated fact-checking for assisting human fact-checkers". International joint conference on artificial intelligence. Arxiv. https://doi.org/10.48550/arXiv.2103.07769
Nakov, Preslav; Da-San-Martino, Giovanni; Alam, Firoj; Shaar, Shaden; Mubarak, Hamdy; Babulkov, Nikolay (2022). "Overview of the CLEF-2022 CheckThat! Lab task 2 on detecting previously fact-checked claims". In: CLEF 2022: conference and labs of the evaluation forum, 5-8 septiembre, Bolonia, Italia. https://ceur-ws.org/Vol-3180/paper-29.pdf
Nguyen, Vincent; Karimi, Sarvnaz; Xing, Zhenchang (2021). "Combining shallow and deep representations for text-pair classification". In: Proceedings of the 19th Annual workshop of the Australasian Language Technology Association, pp. 68-78. https://aclanthology.org/2021.alta-1.7.pdf
Phillips, Whitney (2018). The oxygen of amplification. Better pratices for reporting on extremists, antagonists, and manipulators online. Data & Society Research Institute. https://datasociety.net/wp-content/uploads/2018/05/FULLREPORT_Oxygen_of_Amplification_DS.pdf
Porter, Ethan; Wood, Thomas J. (2021). "The global effectiveness of fact-checking: Evidence from simultaneous experiments in Argentina, Nigeria, South Africa, and the United Kingdom". Proceedings of the National Academy of Sciences of the United States of America, v. 118, n. 37. https://doi.org/10.1073/pnas.2104235118
Real, Andrea (2021). "Casado mezcla diferentes estadísticas de empleo para asegurar que hay 4 millones de parados, pero es falso". Newtral, 6 octubre. https://www.newtral.es/parados-espana-casado-pp-factcheck/20211007
Reimers, Nils; Gurevych, Iryna (2019). "Sentence-bert: sentence embeddings using siamese bert-networks". In: Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th International joint conference on natural language processing (EMNLP-IJCNLP). Hong Kong, November, pp. 3982-3992. https://doi.org/10.18653/v1/D19-1410
Shaar, Shaden; Alam, Firoj; Da-San-Martino, Giovanni; Nakov, Preslav (2021a). "The role of context in detecting previously fact-checked claims". Arxiv. https://doi.org/10.48550/arXiv.2104.07423
Shaar, Shaden; Babulkov, Nikolay; Da-San-Martino, Giovanni; Nakov, Preslav (2020). "That is a known lie: detecting previously fact-checked claims". In: Proceedings of the 58th Annual meeting of the Association for Computational Linguistics, pp. 3607-3618. https://doi.org/10.18653/v1/2020.acl-main.332
Shaar, Shaden; Haouari, Fatima; Mansour, Watheq; Hasanain, Maram; Babulkov, Nikolay; Alam, Firoj; Da-San-Martino, Giovanni; Elsayed, Tamer; Nakov, Preslav (2021b). "Overview of the CLEF-2021 CheckThat! Lab task 2 on detecting previously fact-checked claims in tweets and political debates". In: CLEF 2021: Conference and labs of the evaluation forum, 21-24 September, Bucharest, Romania. https://ceur-ws.org/Vol-2936/paper-29.pdf
Sheng, Qiang; Cao, Juan; Zhang, Xueyao; Li, Xirong; Zhong, Lei (2021). "Article reranking by memory-enhanced key sentence matching for detecting previously fact-checked claims". In: Proceedings of the 59th Annual meeting of the Association for Computational Linguistics and the 11th International joint conference on natural language processing (volume 1, Long papers). https://doi.org/10.18653/v1/2021.acl-long.425
Sippitt, Amy (2020). What is the impact of fact checkers´ work on public figures, institutions and the media?. Africa Check, Chequeado and Full Fact. https://fullfact.org/media/uploads/impact-fact-checkers-public-figures-media.pdf
Stanford Institute for Human-Centered Artificial Intelligence (2023). Artificial intelligence index. Stanford University. https://aiindex.stanford.edu/report
The Washington Post (2018). "Meet the bottomless Pinocchio | Fact Checker". [Video]. YouTube, 10 December. https://www.youtube.com/watch?v=zoS1sVZRfUU
Thorne, James; Vlachos, Andreas (2018). "Automated fact checking: task formulations, methods and future directions". Arxiv. https://doi.org/10.48550/arXiv.1806.07687
Wardle, Claire (2018). "Lessons for reporting in an age of disinformation". Medium, 28 December. https://medium.com/1st-draft/5-lessons-for-reporting-in-an-age-of-disinformation-9d98f0441722
Zeng, Xia; Abumansour, Amani S.; Zubiaga, Arkaitz (2021). "Automated fact-checking: a survey". Language and linguistics compass, v. 15, n. 10. https://doi.org/10.1111/lnc3.12438
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Profesional de la información
This work is licensed under a Creative Commons Attribution 4.0 International License.
Dissemination conditions of the articles once they are published
Authors can freely disseminate their articles on websites, social networks and repositories
However, the following conditions must be respected:
- Only the editorial version should be made public. Please do not publish preprints, postprints or proofs.
- Along with this copy, a specific mention of the publication in which the text has appeared must be included, also adding a clickable link to the URL: http://www.profesionaldelainformacion.com
- Only the final editorial version should be made public. Please do not publish preprints, postprints or proofs.
- Along with that copy, a specific mention of the publication in which the text has appeared must be included, also adding a clickable link to the URL: http://revista.profesionaldelainformacion.com
Profesional de la información journal offers the articles in open access with a Creative Commons BY license.