Virus de ácido ribonucleico (ARN) y coronavirus en Google Dataset Search: alcance y correlación epidemiológica
DOI:
https://doi.org/10.3145/epi.2020.nov.28Palabras clave:
Datos, Datasets, Conjuntos de datos, Virus, Virus de ARN, Coronavirus, SARS-CoV-2, Covid-19, Pandemias, Reutilización de datos, Google, Google Dataset Search, Proveedores de datos, Buscadores, Recuperación de información, Ciencia abiertaResumen
Se presenta un análisis sobre la publicación de conjuntos de datos recogidos en el buscador Google Dataset Search, especializados en familias de virus de ARN, cuya terminología fue obtenida en el tesauro del National Cancer Institute (NCI), elaborado por el Department of Health and Human Services de los Estados Unidos. Se busca evaluar el alcance y capacidad de reutilización de los datos disponibles, determinando el número de datasets, su libre acceso, proporción en formatos de descarga reutilizables, principales proveedores, cronología de publicación y verificación de su procedencia científica. Por otra parte, definir posibles vínculos entre la publicación de datasets y las principales pandemias ocurridas en los últimos 10 años. Entre los resultados obtenidos se destaca que sólo el 52% de los datasets tienen correspondencia con investigaciones científicas y, en menor medida, un 15% son reaprovechables. También se observa una evolución al alza en la publicación de datasets, especialmente vinculada a la afectación de las principales epidemias. Esto es confirmado de manera evidente con los virus del í‰bola, Zika, SARS-CoV, H1N1, H1N5 y, particularmente con el coronavirus SARS-CoV-2. Finalmente, se observa que el buscador aún no ha implementado métodos adecuados para el filtrado y supervisión de los datasets. Estos resultados muestran algunas de las dificultades que aún presenta la ciencia abierta en el campo de los datasets.
Descargas
Citas
Ahlawat, Khyati; Chug, Anuradha; Singh, Amit-Prakash (2019). "Empirical evaluation of Map Reduce based hybrid approach for problem of imbalanced classification in big data". International journal of grid and high performance computing, v. 11, n. 3, pp. 23-45. https://doi.org/10.4018/IJGHPC.2019070102
Bekelman, Justin E.; MPhil, Yan-Li; Gross, Cary P. (2003). "Scope and impact of financial conflicts of interest in biomedical research: a systematic review". Jama, v. 289, n. 4, pp. 454-465. https://doi.org/10.1001/jama.289.4.454
Blischak, John D.; Davenport, Emily R.; Wilson, Greg (2016). "A quick introduction to version control with Git and GitHub". PLoS computational biology, v. 12, n. 1. https://doi.org/10.1371/journal.pcbi.1004668
Brickley, Dan; Burgess, Matthew; Noy, Natasha (2019). "Google Dataset Search: Building a search engine for datasets in an open web ecosystem". In: Proceedings of the 19th World wide web conference (WWW´19), pp. 1365-1375. https://doi.org/10.1145/3308558.3313685
Broder, Andrei (2002). "A taxonomy of web search". ACM Sigir forum, v. 36, n. 2, pp. 3-10. https://doi.org/10.1145/792550.792552
Canino, Adrienne (2019). "Deconstructing Google Dataset Search". Public services quarterly, v. 15, n. 3, pp. 248-255. https://doi.org/10.1080/15228959.2019.1621793
Chen, Emily; Lerman, Kristina; Ferrara, Emilio (2020). "Tracking social media discourse about the Covid-19 pandemic: Development of a public coronavirus Twitter data set". JMIR public health and surveillance, v. 6, n. 2. https://doi.org/10.2196/19273
Chen, Serena H.; Young, M. Todd; Gounley, John; Stanley, Christopher; Bhowmik, Debsindhu (2020). "Distinct structural flexibility within SARS-CoV-2 spike protein reveals potential therapeutic targets". BioRxiv. https://doi.org/10.1101/2020.04.17.047548
Corrales-Garay, Diego; Ortiz-de-Urbina-Criado, Marta; Mora-Valentín, Eva-María (2019). "Knowledge areas, themes and future research on open data: A co-word analysis". Government information quarterly, v. 36, n. 1, pp. 77-87. https://doi.org/10.1016/j.giq.2018.10.008
Dick, George W. A.; Kitchen, Stuart F.; Haddow, Alexander J. (1952). "Zika virus (I). Isolations and serological specificity". Transactions of the Royal Society of Tropical Medicine and Hygiene, v. 46, n. 5, pp. 509-520. https://doi.org/10.1016/0035-9203(52)90042-4
Elmeiligy, Manar A.; El-Desouky, Ali I.; Elghamrawy, Sally M. (2020). "A multi-dimensional big data storing system for generated Covid-19 large-scale data using Apache Spark". arXiv preprint. https://arxiv.org/abs/2005.05036
Emond, Ronald T.; Evans, Barry; Bowen, Ernest-Thomas; Lloyd, Graham (1977). "A case of Ebola virus infection". British medical journal, v. 2, n. 6086, pp. 541-544. https://doi.org/10.1136/bmj.2.6086.541
Google Search (2020). Dataset. https://developers.google.com/search/docs/data-types/dataset
Haleem, Abid; Javaid, Mohd; Khan, Ibrahim-Haleem; Vaishya, Raju (2020). "Significant applications of big data in Covid-19 pandemic". Indian journal of orthopaedics, v. 54, n. 7. https://doi.org/10.1007/s43465-020-00129-z
Hawking, David; Craswell, Nick; Bailey, Peter; Griffihs, Kathleen (2001). "Measuring search engine quality". Information retrieval, v. 4, n. 1, pp. 33-59. https://doi.org/10.1023/A:1011468107287
Hawking, David; Craswell, Nick; Thistlewaite, Paul; Harman, Dona (1999). "Results and challenges in web search evaluation". Computer networks, v. 31, n. 11-16, pp. 1321-1330. https://doi.org/10.1016/S1389-1286(99)00024-9
Hernández-Pérez, Tony (2016). "En la era de la web de los datos: primero datos abiertos, después datos masivos". El profesional de la información, v. 25, n. 4, pp. 517-525. https://doi.org/10.3145/epi.2016.jul.01
Howe, Nicola; Giles, Emma; Newbury-Birch, Dorothy; McColl, Elaine (2018). "Systematic review of participants´ attitudes towards data sharing: a thematic synthesis". Journal of health services research & policy, v. 23, n. 2, pp. 123-133. https://doi.org/10.1177/1355819617751555
Irwin, Richard S. (2009). "The role of conflict of interest in reporting of scientific information". Chest, v. 136, n. 1, pp. 253-259.https://doi.org/10.1378/chest.09-0890
Johansson, Michael A.; Saderi, Daniela (2020). "Open peer-review platform for Covid-19 preprints". Nature, v. 579, n. 7797. https://doi.org/10.1038/d41586-020-00613-4
Karasti, Helena; Baker, Karen S.; Halkola, Eija (2006). "Enriching the notion of data curation in e-science: data managing and information infrastructuring in the long term ecological research (LTER) network". Computer supported cooperative work, v. 15, n. 4, pp. 321-358. https://doi.org/10.1007/s10606-006-9023-2
Khashan, Eman A.; El-Desouky, Ali I.; Fadel, Magdy; Elghamrawy, Sally M. (2020). "A big data based framework for executing complex query over Covid-19 datasets (Covid-QF)". arXiv preprint arXiv:2005.12271. https://arxiv.org/abs/2005.12271
King, John-Douglas; Li, Yuefeng; Tao, Xiaohui; Nayak, Richi (2007). "Mining world knowledge for analysis of search engine content". Web intelligence and agent systems: An international journal, v. 5, n. 3, pp. 233-253. https://dl.acm.org/doi/10.5555/1377776.1377777
Landau, Yuval; Kiryati, Nahum (2019). "Dataset growth in medical image analysis research". Arxiv.org. https://arxiv.org/abs/1908.07765
Le-Guillou, Ian (2020). "Covid-19: How unprecedented data sharing has led to faster-than-ever outbreak research". Horizon. The UE research & innovation magazine, 23 March. https://horizon-magazine.eu/article/covid-19-how-unprecedented-data-sharing-has-led-faster-ever-outbreak-research.html
Lewandowski, Dirk (2015). "Evaluating the retrieval effectiveness of web search engines using a representative query sample". Journal of the Association for Information Science and Technology, v. 66, n. 9, pp. 1763-1775. https://doi.org/10.1002/asi.23304
López-Borrull, Alexandre; Ollé-Castellà, Candela; García-Grimau, Francesc; Abadal, Ernest (2020). "Plan S y ecosistema de revistas españolas de ciencias sociales hacia el acceso abierto: amenazas y oportunidades". El profesional de la información, v. 29, n. 2. https://doi.org/10.3145/epi.2020.mar.14
Marcial, Laura-Haak; Hemminger, Bradley M. (2010). "Scientific data repositories on the Web: An initial survey". Journal of the American Society for Information Science and Technology, v. 61, n. 10, pp. 2029-2048. https://doi.org/10.1002/asi.21339
McKiernan, Erin C.; Bourne, Philip E.; Brown, C. Titus; Buck, Stuart; Kenall, Amye; Lin, Jennifer; McDougall, Damon; Nosek, Brian A.; Ram, Karthik; Soderberg, Courtney K.; Spies, Jeffrey R.; Thaney, Kaitlin; Updegrove, Andrew; Woo, Kara H.; Yarkoni, Tal (2016). "Point of view: How open science helps researchers succeed". Elife, v. 5, e16800. https://doi.org/10.7554/eLife.16800.001
Mello, Michelle M.; Lieou, Van; Goodman, Steven N. (2018). "Clinical trial participants´ views of the risks and benefits of data sharing". New England journal of medicine, v. 378, n. 23, pp. 2202-2211. https://doi.org/10.1056/NEJMsa1713258
Nosek, Brian A.; Alter, George; Banks, George C.; Borsboom, Denny; Bowman, Sara D.; Breckler, Steven J.; Buck, Stuart; Chambers, Christopher D.; Chin, Gilbert; Christensen, Garret; Contestabile, M.; Dafoe, A.; Eich, Eric; Freese, J.; Glennerster, R.; Goroff, D.; Green, Donald P.; Hesse, Bradford W.; Humphreys, M.; Ishiyama, John; Karlan, D.; Kraut, A.; Lupia, A.; Mabry, Patricia L.; Madon, T.; Malhotra, N.; Mayo-Wilson, Evan; McNutt, M.; Miguel, Edward; Levy-Paluch, Elizabeth; Simonsohn, U.; Soderberg, Courtney; Spellman, Barbara A.; Turitto, J.; VandenBos, Gary-Roger; Vazire, Simine; Wagenmakers, E. J.; Wilson, R.; Yarkoni, T. (2015). "Promoting an open research culture". Science, v. 348, n. 6242, pp. 1422-1425. https://doi.org/10.1126/science.aab2374
Polonetsky, Jules; Tene, Omer; Finch, Kelsey (2016). "Shades of gray: Seeing the full spectrum of practical data de-intentification". Santa Clara law review. v. 56, n. 593, pp. 593-618. https://digitalcommons.law.scu.edu/cgi/viewcontent.cgi?article=2827&context=lawreview
Qian, Xiaoyuan; Bailey, James; Leckie, Christopher (2006). "Mining generalised emerging patterns". In: Sattar, Abdul; Kang, Byeong-Ho (eds.). Australasian joint conference on artificial intelligence. Berlin, Heidelberg: Springer, pp. 295-304. ISBN: 978 3 540 49788 2 https://doi.org/10.1007/11941439_33
Saheb, Tahereh; Izadi, Leila (2019). "Paradigm of IoT big data analytics in healthcare industry: a review of scientific literature and mapping of research trends". Telematics and informatics, v. 41, pp. 70-85 https://doi.org/10.1016/j.tele.2019.03.005
Schneier, Bruce (2012). "Securing medical research: A cybersecurity point of view". Science, v. 336, n. 6088, pp. 1527-1529. https://doi.org/10.1126/science.1224321
Science Europe (2019). Plan S: Making full and immediate Open Access a reality. https://www.scienceeurope.org/coalition-s
Singhal, Ayush; Srivastava, Jaideep (2013). "Data extract: Mining context from the web for dataset extraction". International journal of machine learning and computing, v. 3, n. 2, pp. 219-223. https://doi.org/10.7763/IJMLC.2013.V3.306
Wang, C. Jason; Ng, Chun Y.; Brook, Robert H. (2020). "Response to Covid-19 in Taiwan: big data analytics, new technology, and proactive testing". Jama, v. 323, n. 14, pp. 1341-1342. https://doi.org/10.1001/jama.2020.3151
Weston, Sara J.; Ritchie, Stuart J.; Rohrer, Julia M.; Przybylski, Andrew K. (2019). "Recommendations for increasing the transparency of analysis of preexisting data sets". Advances in methods and practices in psychological science, v. 2, n.3, pp. 214-227. https://doi.org/10.1177/2515245919848684
Zhou, Chenghu; Su, Fenzhen; Pei, Tao; Zhang, An; Du, Yunyan; Luo, Bin; Cao, Zhidong; Wang, Juanle; Yuan, Wen; Zhu, Yunqiang; Song, Ci; Chen, Jie; Xu, Jun; Li, Fujia; Ma, Ting; Jiang, Lili; Yan, Fengqin; Yi, Jiawei; Hu, Yunfeng; Liao, Yilan; Xiao, Han (2020). "Covid-19: challenges to GIS with big data". Geography and sustainability, v. 1, n, 1, pp. 77-87. https://doi.org/10.1016/j.geosus.2020.03.005
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Condiciones de difusión de los artículos una vez son publicados
Los autores pueden publicitar libremente sus artículos en webs, redes sociales y repositorios
Deberán respetarse sin embargo, las siguientes condiciones:
- Solo deberá hacerse pública la versión editorial. Rogamos que no se publiquen preprints, postprints o pruebas de imprenta.
- Junto con esa copia ha de incluirse una mención específica de la publicación en la que ha aparecido el texto, añadiendo además un enlace clicable a la URL: http://revista.profesionaldelainformacion.com
La revista Profesional de la información ofrece los artículos en acceso abierto con una licencia Creative Commons BY.