Uncovering Companies Missing from the SABI Database: A Web Scraping Approach
DOI:
https://doi.org/10.3145/epi.2025.ene.34202Resumen
This study evaluates the completeness and representativeness of the SABI database, a widely used commercial source for firm-level data in Spain and Portugal, by comparing it to BORME, the official Spanish business register. Using web scraping techniques, we collected and processed approximately 100,000 BORME publications in PDF format, covering the period from 2010 to 2023. These were transformed into a structured dataset comprising over 1.2 million companies, which we then matched against SABI records from the same period. Our analysis reveals that SABI covers only 38.3% of newly established companies, with significant underrepresentation of younger firms, small enterprises, specific sectors, and certain regions. Furthermore, we find clear evidence of survivorship bias: the longer a company has been dissolved, the less likely it is to appear in SABI. Sectoral and geographic disparities are also substantial, and the coverage is skewed toward firms with higher initial capital and specific legal forms. These findings suggest that SABI represents a non-random subset of the Spanish business population, and caution should be exercised when using it for empirical research. Adjustments for sample bias are recommended to improve the reliability of analyses based on this database.
Descargas
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2025 Profesional de la información

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Condiciones de difusión de los artículos una vez son publicados
Los autores pueden publicitar libremente sus artículos en webs, redes sociales y repositorios
Deberán respetarse sin embargo, las siguientes condiciones:
- Solo deberá hacerse pública la versión editorial. Rogamos que no se publiquen preprints, postprints o pruebas de imprenta.
- Junto con esa copia ha de incluirse una mención específica de la publicación en la que ha aparecido el texto, añadiendo además un enlace clicable a la URL: http://revista.profesionaldelainformacion.com
La revista Profesional de la información ofrece los artículos en acceso abierto con una licencia Creative Commons BY.