Virus de ácido ribonucleico (ARN) y coronavirus en Google Dataset Search: alcance y correlación epidemiológica

Palabras clave: Datos, Datasets, Conjuntos de datos, Virus, Virus de ARN, Coronavirus, SARS-CoV-2, Covid-19, Pandemias, Reutilización de datos, Google, Google Dataset Search, Proveedores de datos, Buscadores, Recuperación de información, Ciencia abierta

Resumen

Se presenta un análisis sobre la publicación de conjuntos de datos recogidos en el buscador Google Dataset Search, especializados en familias de virus de ARN, cuya terminología fue obtenida en el tesauro del National Cancer Institute (NCI), elaborado por el Department of Health and Human Services de los Estados Unidos. Se busca evaluar el alcance y capacidad de reutilización de los datos disponibles, determinando el número de datasets, su libre acceso, proporción en formatos de descarga reutilizables, principales proveedores, cronología de publicación y verificación de su procedencia científica. Por otra parte, definir posibles vínculos entre la publicación de datasets y las principales pandemias ocurridas en los últimos 10 años. Entre los resultados obtenidos se destaca que sólo el 52% de los datasets tienen correspondencia con investigaciones científicas y, en menor medida, un 15% son reaprovechables. También se observa una evolución al alza en la publicación de datasets, especialmente vinculada a la afectación de las principales epidemias. Esto es confirmado de manera evidente con los virus del Ébola, Zika, SARS-CoV, H1N1, H1N5 y, particularmente con el coronavirus SARS-CoV-2. Finalmente, se observa que el buscador aún no ha implementado métodos adecuados para el filtrado y supervisión de los datasets. Estos resultados muestran algunas de las dificultades que aún presenta la ciencia abierta en el campo de los datasets.

Referencias

Ahlawat, Khyati; Chug, Anuradha; Singh, Amit-Prakash (2019). “Empirical evaluation of Map Reduce based hybrid approach for problem of imbalanced classification in big data”. International journal of grid and high performance computing, v. 11, n. 3, pp. 23-45. https://doi.org/10.4018/IJGHPC.2019070102

Bekelman, Justin E.; MPhil, Yan-Li; Gross, Cary P. (2003). “Scope and impact of financial conflicts of interest in biomedical research: a systematic review”. Jama, v. 289, n. 4, pp. 454-465. https://doi.org/10.1001/jama.289.4.454

Blischak, John D.; Davenport, Emily R.; Wilson, Greg (2016). “A quick introduction to version control with Git and GitHub”. PLoS computational biology, v. 12, n. 1. https://doi.org/10.1371/journal.pcbi.1004668

Brickley, Dan; Burgess, Matthew; Noy, Natasha (2019). “Google Dataset Search: Building a search engine for datasets in an open web ecosystem”. In: Proceedings of the 19th World wide web conference (WWW’19), pp. 1365-1375. https://doi.org/10.1145/3308558.3313685

Broder, Andrei (2002). “A taxonomy of web search”. ACM Sigir forum, v. 36, n. 2, pp. 3-10. https://doi.org/10.1145/792550.792552

Canino, Adrienne (2019). “Deconstructing Google Dataset Search”. Public services quarterly, v. 15, n. 3, pp. 248-255. https://doi.org/10.1080/15228959.2019.1621793

Chen, Emily; Lerman, Kristina; Ferrara, Emilio (2020). “Tracking social media discourse about the Covid-19 pandemic: Development of a public coronavirus Twitter data set”. JMIR public health and surveillance, v. 6, n. 2. https://doi.org/10.2196/19273

Chen, Serena H.; Young, M. Todd; Gounley, John; Stanley, Christopher; Bhowmik, Debsindhu (2020). “Distinct structural flexibility within SARS-CoV-2 spike protein reveals potential therapeutic targets”. BioRxiv. https://doi.org/10.1101/2020.04.17.047548

Corrales-Garay, Diego; Ortiz-de-Urbina-Criado, Marta; Mora-Valentín, Eva-María (2019). “Knowledge areas, themes and future research on open data: A co-word analysis”. Government information quarterly, v. 36, n. 1, pp. 77-87. https://doi.org/10.1016/j.giq.2018.10.008

Dick, George W. A.; Kitchen, Stuart F.; Haddow, Alexander J. (1952). “Zika virus (I). Isolations and serological specificity”. Transactions of the Royal Society of Tropical Medicine and Hygiene, v. 46, n. 5, pp. 509-520. https://doi.org/10.1016/0035-9203(52)90042-4

Elmeiligy, Manar A.; El-Desouky, Ali I.; Elghamrawy, Sally M. (2020). “A multi-dimensional big data storing system for generated Covid-19 large-scale data using Apache Spark”. arXiv preprint. https://arxiv.org/abs/2005.05036

Emond, Ronald T.; Evans, Barry; Bowen, Ernest-Thomas; Lloyd, Graham (1977). “A case of Ebola virus infection”. British medical journal, v. 2, n. 6086, pp. 541-544. https://doi.org/10.1136/bmj.2.6086.541

Google Search (2020). Dataset. https://developers.google.com/search/docs/data-types/dataset

Haleem, Abid; Javaid, Mohd; Khan, Ibrahim-Haleem; Vaishya, Raju (2020). “Significant applications of big data in Covid-19 pandemic”. Indian journal of orthopaedics, v. 54, n. 7. https://doi.org/10.1007/s43465-020-00129-z

Hawking, David; Craswell, Nick; Bailey, Peter; Griffihs, Kathleen (2001). “Measuring search engine quality”. Information retrieval, v. 4, n. 1, pp. 33-59. https://doi.org/10.1023/A:1011468107287

Hawking, David; Craswell, Nick; Thistlewaite, Paul; Harman, Dona (1999). “Results and challenges in web search evaluation”. Computer networks, v. 31, n. 11-16, pp. 1321-1330. https://doi.org/10.1016/S1389-1286(99)00024-9

Hernández-Pérez, Tony (2016). “En la era de la web de los datos: primero datos abiertos, después datos masivos”. El profesional de la información, v. 25, n. 4, pp. 517-525. https://doi.org/10.3145/epi.2016.jul.01

Howe, Nicola; Giles, Emma; Newbury-Birch, Dorothy; McColl, Elaine (2018). “Systematic review of participants’ attitudes towards data sharing: a thematic synthesis”. Journal of health services research & policy, v. 23, n. 2, pp. 123-133. https://doi.org/10.1177/1355819617751555

Irwin, Richard S. (2009). “The role of conflict of interest in reporting of scientific information”. Chest, v. 136, n. 1, pp. 253-259.https://doi.org/10.1378/chest.09-0890

Johansson, Michael A.; Saderi, Daniela (2020). “Open peer-review platform for Covid-19 preprints”. Nature, v. 579, n. 7797. https://doi.org/10.1038/d41586-020-00613-4

Karasti, Helena; Baker, Karen S.; Halkola, Eija (2006). “Enriching the notion of data curation in e-science: data managing and information infrastructuring in the long term ecological research (LTER) network”. Computer supported cooperative work, v. 15, n. 4, pp. 321-358. https://doi.org/10.1007/s10606-006-9023-2

Khashan, Eman A.; El-Desouky, Ali I.; Fadel, Magdy; Elghamrawy, Sally M. (2020). “A big data based framework for executing complex query over Covid-19 datasets (Covid-QF)”. arXiv preprint arXiv:2005.12271. https://arxiv.org/abs/2005.12271

King, John-Douglas; Li, Yuefeng; Tao, Xiaohui; Nayak, Richi (2007). “Mining world knowledge for analysis of search engine content”. Web intelligence and agent systems: An international journal, v. 5, n. 3, pp. 233-253. https://dl.acm.org/doi/10.5555/1377776.1377777

Landau, Yuval; Kiryati, Nahum (2019). “Dataset growth in medical image analysis research”. Arxiv.org. https://arxiv.org/abs/1908.07765

Le-Guillou, Ian (2020). “Covid-19: How unprecedented data sharing has led to faster-than-ever outbreak research”. Horizon. The UE research & innovation magazine, 23 March. https://horizon-magazine.eu/article/covid-19-how-unprecedented-data-sharing-has-led-faster-ever-outbreak-research.html

Lewandowski, Dirk (2015). “Evaluating the retrieval effectiveness of web search engines using a representative query sample”. Journal of the Association for Information Science and Technology, v. 66, n. 9, pp. 1763-1775. https://doi.org/10.1002/asi.23304

López-Borrull, Alexandre; Ollé-Castellà, Candela; García-Grimau, Francesc; Abadal, Ernest (2020). “Plan S y ecosistema de revistas españolas de ciencias sociales hacia el acceso abierto: amenazas y oportunidades”. El profesional de la información, v. 29, n. 2. https://doi.org/10.3145/epi.2020.mar.14

Marcial, Laura-Haak; Hemminger, Bradley M. (2010). “Scientific data repositories on the Web: An initial survey”. Journal of the American Society for Information Science and Technology, v. 61, n. 10, pp. 2029-2048. https://doi.org/10.1002/asi.21339

McKiernan, Erin C.; Bourne, Philip E.; Brown, C. Titus; Buck, Stuart; Kenall, Amye; Lin, Jennifer; McDougall, Damon; Nosek, Brian A.; Ram, Karthik; Soderberg, Courtney K.; Spies, Jeffrey R.; Thaney, Kaitlin; Updegrove, Andrew; Woo, Kara H.; Yarkoni, Tal (2016). “Point of view: How open science helps researchers succeed”. Elife, v. 5, e16800. https://doi.org/10.7554/eLife.16800.001

Mello, Michelle M.; Lieou, Van; Goodman, Steven N. (2018). “Clinical trial participants’ views of the risks and benefits of data sharing”. New England journal of medicine, v. 378, n. 23, pp. 2202-2211. https://doi.org/10.1056/NEJMsa1713258

Nosek, Brian A.; Alter, George; Banks, George C.; Borsboom, Denny; Bowman, Sara D.; Breckler, Steven J.; Buck, Stuart; Chambers, Christopher D.; Chin, Gilbert; Christensen, Garret; Contestabile, M.; Dafoe, A.; Eich, Eric; Freese, J.; Glennerster, R.; Goroff, D.; Green, Donald P.; Hesse, Bradford W.; Humphreys, M.; Ishiyama, John; Karlan, D.; Kraut, A.; Lupia, A.; Mabry, Patricia L.; Madon, T.; Malhotra, N.; Mayo-Wilson, Evan; McNutt, M.; Miguel, Edward; Levy-Paluch, Elizabeth; Simonsohn, U.; Soderberg, Courtney; Spellman, Barbara A.; Turitto, J.; VandenBos, Gary-Roger; Vazire, Simine; Wagenmakers, E. J.; Wilson, R.; Yarkoni, T. (2015). “Promoting an open research culture”. Science, v. 348, n. 6242, pp. 1422–1425. https://doi.org/10.1126/science.aab2374

Polonetsky, Jules; Tene, Omer; Finch, Kelsey (2016). “Shades of gray: Seeing the full spectrum of practical data de-intentification”. Santa Clara law review. v. 56, n. 593, pp. 593-618. https://digitalcommons.law.scu.edu/cgi/viewcontent.cgi?article=2827&context=lawreview

Qian, Xiaoyuan; Bailey, James; Leckie, Christopher (2006). “Mining generalised emerging patterns”. In: Sattar, Abdul; Kang, Byeong-Ho (eds.). Australasian joint conference on artificial intelligence. Berlin, Heidelberg: Springer, pp. 295-304. ISBN: 978 3 540 49788 2 https://doi.org/10.1007/11941439_33

Saheb, Tahereh; Izadi, Leila (2019). “Paradigm of IoT big data analytics in healthcare industry: a review of scientific literature and mapping of research trends”. Telematics and informatics, v. 41, pp. 70-85 https://doi.org/10.1016/j.tele.2019.03.005

Schneier, Bruce (2012). “Securing medical research: A cybersecurity point of view”. Science, v. 336, n. 6088, pp. 1527-1529. https://doi.org/10.1126/science.1224321

Science Europe (2019). Plan S: Making full and immediate Open Access a reality. https://www.scienceeurope.org/coalition-s

Singhal, Ayush; Srivastava, Jaideep (2013). “Data extract: Mining context from the web for dataset extraction”. International journal of machine learning and computing, v. 3, n. 2, pp. 219-223. https://doi.org/10.7763/IJMLC.2013.V3.306

Wang, C. Jason; Ng, Chun Y.; Brook, Robert H. (2020). “Response to Covid-19 in Taiwan: big data analytics, new technology, and proactive testing”. Jama, v. 323, n. 14, pp. 1341-1342. https://doi.org/10.1001/jama.2020.3151

Weston, Sara J.; Ritchie, Stuart J.; Rohrer, Julia M.; Przybylski, Andrew K. (2019). “Recommendations for increasing the transparency of analysis of preexisting data sets”. Advances in methods and practices in psychological science, v. 2, n.3, pp. 214-227. https://doi.org/10.1177/2515245919848684

Zhou, Chenghu; Su, Fenzhen; Pei, Tao; Zhang, An; Du, Yunyan; Luo, Bin; Cao, Zhidong; Wang, Juanle; Yuan, Wen; Zhu, Yunqiang; Song, Ci; Chen, Jie; Xu, Jun; Li, Fujia; Ma, Ting; Jiang, Lili; Yan, Fengqin; Yi, Jiawei; Hu, Yunfeng; Liao, Yilan; Xiao, Han (2020). “Covid-19: challenges to GIS with big data”. Geography and sustainability, v. 1, n, 1, pp. 77-87. https://doi.org/10.1016/j.geosus.2020.03.005

Publicado
2020-12-21
Cómo citar
Blázquez-Ochando, M., & Prieto-Gutiérrez, J.-J. (2020). Virus de ácido ribonucleico (ARN) y coronavirus en Google Dataset Search: alcance y correlación epidemiológica. Profesional De La Información, 29(6). https://doi.org/10.3145/epi.2020.nov.28
Sección
Artículos de investigación Covid-19 / Covid-19 research articles

Descargas

La descarga de datos todavía no está disponible.