Semantic similarity models for automated fact-checking: ClaimCheck as a claim matching tool

Authors

DOI:

https://doi.org/10.3145/epi.2023.may.21

Keywords:

Verification, Automated fact-checking, Claim matching, Semantic similarity, Paraphrase models, Disinformation, Artificial intelligence, AI, Algorithms, Software

Abstract

This article presents the experimental design of ClaimCheck, an artificial intelligence tool for detecting repeated falsehoods in political discourse using a semantic similarity model developed by the fact-checking organization Newtral in collaboration with ABC Australia. The study reviews the state of the art in algorithmic fact-checking and proposes a definition of claim matching. Additionally, it outlines the scheme for annotating similar sentences and presents the results of experiments conducted with the tool.

Downloads

Download data is not yet available.

References

Adair, Bill (2021). "The lessons of Squash, Duke´s automated fact-checking platform". Poynter, 16 June. https://www.poynter.org/fact-checking/2021/the-lessons-of-squash-the-first-automated-fact-checking-platform

Adair, Bill; Li, Chengkai; Yang, Jun; Yu, Cong (2018). Automated pop-up fact-checking: challenges & progress. https://ranger.uta.edu/~cli/pubs/2019/popupfactcheck-cj19-adair.pdf

Agadjanian, Alexander; Bakhru, Nikita; Chi, Victoria; Greenberg, Devyn; Hollander, Byrne; Hurt, Alexander; Kind, Joseph; Lu, Ray; Ma, Annie; Nyhan, Brendan; Pham, Daniel; Qian, Michael; Tan, Mackinley; Wang, Clara; Wasdahl, Alexander; Woodruff, Alexandra (2019). "Counting the Pinocchios: the effect of summary fact-checking data on perceived accuracy and favorability of politicians". Research & politics, v. 6, n. 3. https://doi.org/10.1177/2053168019870351

Arslan, Fatma (2021). Modeling factual claims with semantic frames: definitions, datasets, tools, and fact-checking applications. Doctoral dissertation. The University of Texas at Arlington. https://rc.library.uta.edu/uta-ir/bitstream/handle/10106/30765/ARSLAN-DISSERTATION-2021.pdf

Babakar, Mevan; Moy, Will (2016). The state of automated factchecking. How to make factchecking dramatically more effective with technology we have now. Full Fact. https://fullfact.org/media/uploads/full_fact-the_state_of_automated_factchecking_aug_2016.pdf

Baker, Collin F.; Fillmore, Charles J.; Lowe, John B. (1998). "The Berkeley FrameNet project". In: Proceedings of the joint conference of the international conference on computational linguistics and the Association for Computational Linguistics (Coling-ACL), pp. 86-90. https://aclanthology.org/C98-1013.pdf

Beltrán, Javier; Mí­guez, Rubén; Larraz, Irene (2019). "ClaimHunter: an unattended tool for automated claim detection on Twitter". KnOD@WWW. CEUR workshop proceedings, v. 2877, n. 3. https://ceur-ws.org/Vol-2877/paper3.pdf

Corney, David (2021a). "How does automated fact checking work?". Full Fact, 5 July. https://fullfact.org/blog/2021/jul/how-does-automated-fact-checking-work

Corney, David (2021b). "Towards a common definition of claim matching". Full Fact, 5 October. https://fullfact.org/blog/2021/oct/towards-common-definition-claim-matching

Dolan, William B.; Brockett, Chris (2005). "Automatically constructing a corpus of sentential paraphrases". In: Proceedings of the third international workshop on paraphrasing (IWP2005), pp. 9-16. https://aclanthology.org/I05-5002.pdf

Floodpage, Sebastien (2021). "How fact checkers and Google.org are fighting misinformation". Google, 31 March. https://blog.google/outreach-initiatives/google-org/fullfact-and-google-fight-misinformation

Graves, Lucas (2018). Understanding the promise and limits of automated fact-checking. Reuters Institute for the Study of Journalism. Factsheets. https://ora.ox.ac.uk/objects/uuid:f321ff43-05f0-4430-b978-f5f517b73b9b

Hassan, Aumyo; Barber, Sarah J. (2021). "The effects of repetition frequency on the illusory truth effect". Cognitive research: principles and implications, v. 6, n. 38. https://doi.org/10.1186/s41235-021-00301-5

Hassan, Naeemul; Adair, Bill; Hamilton, James T.; Li, Chengkai; Tremayne, Mark; Yang, Jun; Yu, Cong (2015). "The quest to automate fact-checking". In: Proceedings of the 2015 computation + journalism symposium. Columbia University. http://cj2015.brown.columbia.edu/papers/automate-fact-checking.pdf

Hassan, Naeemul; Arslan, Fatma; Li, Chengkai; Tremayne, Mark (2017). "Toward automated fact-checking: detecting check-worthy factual claims by ClaimBuster". In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD "˜17). New York: Association for Computing Machinery, pp. 1803-1812. https://doi.org/10.1145/3097983.3098131

Hí¶velmeyer, Alica; Boland, Katarina; Dietze, Stefan (2022). "SimBa at CheckThat! 2022: lexical and semantic similarity based detection of verified claims in an unsupervised and supervised way". In: CLEF 2022: Conference and labs of the evaluation forum, 5-8 September, Bolonia, Italia. https://ceur-ws.org/Vol-3180/paper-40.pdf

Jiang, Ye; Song, Xingyi; Scarton, Carolina; Aker, Ahmet; Bontcheva, Kalina (2021). "Categorising fine-to-coarse grained misinformation: an empirical study of Covid-19 Infodemic". Arxiv. https://doi.org/10.48550/arXiv.2106.11702

Kazemi, Ashkan; Garimella, Kiran; Gaffney, Devin; Hale, Scott A. (2021). "Claim matching beyond English to scale global fact-checking". In: Proceedings of the 59th Annual meeting of the Association for Computational Linguistics and the 11th International joint conference on natural language processing. Association for Computational Linguistics, pp. 4504-4517. https://doi.org/10.18653/v1/2021.acl-long.347

Kazemi, Ashkan; Li, Zehua; Pérez-Rosas, Verónica; Hale, Scott A.; Mihalcea, Rada (2022). "Matching tweets with applicable fact-checks across languages". Arxiv. https://doi.org/10.48550/arXiv.2202.07094

Kessler, Glenn; Fox, Joe (2021). "The false claims that Trump keeps repeating". The Washington Post, 20 January. https://www.washingtonpost.com/graphics/politics/fact-checker-most-repeated-disinformation

Lan, Zhenzhong; Chen, Mingda; Goodman, Sebastian; Gimpel, Kevin; Sharma, Piyush; Soricut, Radu (2020). "ALBERT: a lite Bert for self-supervised learning of language representations". In: Conference paper at International conference on learning representations (ICLR). Arxiv. https://doi.org/10.48550/arXiv.1909.11942

Lim, Chloe (2018). "Checking how fact-checkers check". Research & politics, v. 5, n. 3. https://doi.org/10.1177/2053168018786848

Mansour, Watheq; Elsayed, Tamer; Al-Ali, Abdulaziz (2022). "Did I see it before? Detecting previously-checked claims over Twitter". Lecture notes in computer science, pp. 367-381. https://doi.org/10.1007/978-3-030-99736-6_25

Martí­n, Alejandro; Huertas-Tato, Javier; Huertas-Garcí­a, Álvaro; Villar-Rodrí­guez, Guillermo; Camacho, David (2021). "FacTeR-check: semi-automated fact-checking through semantic similarity and natural language inference". Arxiv. https://doi.org/10.48550/arXiv.2110.14532

Mukherjee, Amit; Sela, Eitan; Al-Saadoon, Laith (2020). "Building an NLU-powered search application with Amazon SageMaker and the Amazon opensearch service KNN feature". Amazon SageMaker, artificial intelligence, 26 October. https://aws.amazon.com/es/blogs/machine-learning/building-an-nlu-powered-search-application-with-amazon-sagemaker-and-the-amazon-es-knn-feature

Murray, Samuel; Stanley, Matthew; McPhetres, Jon; Pennycook, Gordon; Seli, Paul (2020). ""˜I´ve said it before and I will say it again"¦´: repeating statements made by Donald Trump increases perceived truthfulness for individuals across the political spectrum". PsyArXiv preprints, 15 January. https://doi.org/10.31234/osf.io/9evzc

Nakov, Preslav; Corney, David; Hasanain, Maram; Alam, Firoj; Elsayed, Tamer; Barrón-Cedeño, Alberto; Papotti, Paolo; Shaar, Shaden; Da-San-Martino, Giovanni (2021). "Automated fact-checking for assisting human fact-checkers". International joint conference on artificial intelligence. Arxiv. https://doi.org/10.48550/arXiv.2103.07769

Nakov, Preslav; Da-San-Martino, Giovanni; Alam, Firoj; Shaar, Shaden; Mubarak, Hamdy; Babulkov, Nikolay (2022). "Overview of the CLEF-2022 CheckThat! Lab task 2 on detecting previously fact-checked claims". In: CLEF 2022: conference and labs of the evaluation forum, 5-8 septiembre, Bolonia, Italia. https://ceur-ws.org/Vol-3180/paper-29.pdf

Nguyen, Vincent; Karimi, Sarvnaz; Xing, Zhenchang (2021). "Combining shallow and deep representations for text-pair classification". In: Proceedings of the 19th Annual workshop of the Australasian Language Technology Association, pp. 68-78. https://aclanthology.org/2021.alta-1.7.pdf

Phillips, Whitney (2018). The oxygen of amplification. Better pratices for reporting on extremists, antagonists, and manipulators online. Data & Society Research Institute. https://datasociety.net/wp-content/uploads/2018/05/FULLREPORT_Oxygen_of_Amplification_DS.pdf

Porter, Ethan; Wood, Thomas J. (2021). "The global effectiveness of fact-checking: Evidence from simultaneous experiments in Argentina, Nigeria, South Africa, and the United Kingdom". Proceedings of the National Academy of Sciences of the United States of America, v. 118, n. 37. https://doi.org/10.1073/pnas.2104235118

Real, Andrea (2021). "Casado mezcla diferentes estadí­sticas de empleo para asegurar que hay 4 millones de parados, pero es falso". Newtral, 6 octubre. https://www.newtral.es/parados-espana-casado-pp-factcheck/20211007

Reimers, Nils; Gurevych, Iryna (2019). "Sentence-bert: sentence embeddings using siamese bert-networks". In: Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th International joint conference on natural language processing (EMNLP-IJCNLP). Hong Kong, November, pp. 3982-3992. https://doi.org/10.18653/v1/D19-1410

Shaar, Shaden; Alam, Firoj; Da-San-Martino, Giovanni; Nakov, Preslav (2021a). "The role of context in detecting previously fact-checked claims". Arxiv. https://doi.org/10.48550/arXiv.2104.07423

Shaar, Shaden; Babulkov, Nikolay; Da-San-Martino, Giovanni; Nakov, Preslav (2020). "That is a known lie: detecting previously fact-checked claims". In: Proceedings of the 58th Annual meeting of the Association for Computational Linguistics, pp. 3607-3618. https://doi.org/10.18653/v1/2020.acl-main.332

Shaar, Shaden; Haouari, Fatima; Mansour, Watheq; Hasanain, Maram; Babulkov, Nikolay; Alam, Firoj; Da-San-Martino, Giovanni; Elsayed, Tamer; Nakov, Preslav (2021b). "Overview of the CLEF-2021 CheckThat! Lab task 2 on detecting previously fact-checked claims in tweets and political debates". In: CLEF 2021: Conference and labs of the evaluation forum, 21-24 September, Bucharest, Romania. https://ceur-ws.org/Vol-2936/paper-29.pdf

Sheng, Qiang; Cao, Juan; Zhang, Xueyao; Li, Xirong; Zhong, Lei (2021). "Article reranking by memory-enhanced key sentence matching for detecting previously fact-checked claims". In: Proceedings of the 59th Annual meeting of the Association for Computational Linguistics and the 11th International joint conference on natural language processing (volume 1, Long papers). https://doi.org/10.18653/v1/2021.acl-long.425

Sippitt, Amy (2020). What is the impact of fact checkers´ work on public figures, institutions and the media?. Africa Check, Chequeado and Full Fact. https://fullfact.org/media/uploads/impact-fact-checkers-public-figures-media.pdf

Stanford Institute for Human-Centered Artificial Intelligence (2023). Artificial intelligence index. Stanford University. https://aiindex.stanford.edu/report

The Washington Post (2018). "Meet the bottomless Pinocchio | Fact Checker". [Video]. YouTube, 10 December. https://www.youtube.com/watch?v=zoS1sVZRfUU

Thorne, James; Vlachos, Andreas (2018). "Automated fact checking: task formulations, methods and future directions". Arxiv. https://doi.org/10.48550/arXiv.1806.07687

Wardle, Claire (2018). "Lessons for reporting in an age of disinformation". Medium, 28 December. https://medium.com/1st-draft/5-lessons-for-reporting-in-an-age-of-disinformation-9d98f0441722

Zeng, Xia; Abumansour, Amani S.; Zubiaga, Arkaitz (2021). "Automated fact-checking: a survey". Language and linguistics compass, v. 15, n. 10. https://doi.org/10.1111/lnc3.12438

Published

2023-06-15

How to Cite

Larraz, I., Mí­guez, R., & Sallicati, F. (2023). Semantic similarity models for automated fact-checking: ClaimCheck as a claim matching tool. Profesional De La información, 32(3). https://doi.org/10.3145/epi.2023.may.21

Issue

Section

Artificial Intelligence