Uncovering Companies Missing from the SABI Database: A Web Scraping Approach
DOI:
https://doi.org/10.3145/epi.2025.ene.34202Abstract
This study evaluates the completeness and representativeness of the SABI database, a widely used commercial source for firm-level data in Spain and Portugal, by comparing it to BORME, the official Spanish business register. Using web scraping techniques, we collected and processed approximately 100,000 BORME publications in PDF format, covering the period from 2010 to 2023. These were transformed into a structured dataset comprising over 1.2 million companies, which we then matched against SABI records from the same period. Our analysis reveals that SABI covers only 38.3% of newly established companies, with significant underrepresentation of younger firms, small enterprises, specific sectors, and certain regions. Furthermore, we find clear evidence of survivorship bias: the longer a company has been dissolved, the less likely it is to appear in SABI. Sectoral and geographic disparities are also substantial, and the coverage is skewed toward firms with higher initial capital and specific legal forms. These findings suggest that SABI represents a non-random subset of the Spanish business population, and caution should be exercised when using it for empirical research. Adjustments for sample bias are recommended to improve the reliability of analyses based on this database.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Profesional de la información

This work is licensed under a Creative Commons Attribution 4.0 International License.
Dissemination conditions of the articles once they are published
Authors can freely disseminate their articles on websites, social networks and repositories
However, the following conditions must be respected:
- Only the editorial version should be made public. Please do not publish preprints, postprints or proofs.
- Along with this copy, a specific mention of the publication in which the text has appeared must be included, also adding a clickable link to the URL: http://www.profesionaldelainformacion.com
- Only the final editorial version should be made public. Please do not publish preprints, postprints or proofs.
- Along with that copy, a specific mention of the publication in which the text has appeared must be included, also adding a clickable link to the URL: http://revista.profesionaldelainformacion.com
Profesional de la información journal offers the articles in open access with a Creative Commons BY license.