In 2019, security researchers discovered that Dalil, a popular Saudi Arabian caller
identification application with over 5 million users, had left its entire MongoDB
database exposed to the internet without any authentication or access controls. The
585-gigabyte database contained detailed personal information including real names,
phone numbers, precise GPS location data, device information, and mobile carrier
details for millions of Saudi users.
The database had been publicly accessible for an undetermined period before its
discovery, during which time any individual with basic technical knowledge could have
accessed, downloaded, or manipulated the complete user dataset.
## Key Facts
- .**What:** Dalil caller ID app left 585GB MongoDB database open without authentication.
- .**Who:** Over 5 million Saudi users of the Dalil app.
- .**Data Exposed:** Real names, phone numbers, GPS locations, device IDs, and carrier data.
- .**Outcome:** Pre-PDPL; zero security controls on a surveillance-grade dataset.
## What Was Exposed
- .Real names associated with phone numbers for approximately 5 million users,
predominantly Saudi Arabian nationals
- .Phone numbers including both the primary number and contacts from users' address
books uploaded by the app
- .Precise GPS location data with latitude and longitude coordinates, revealing
users' physical movements and frequented locations
- .Device information including phone model, operating system version, IMEI numbers,
and device identifiers
- .Mobile carrier information revealing which telecommunications provider each user
subscribed to
- .App usage data including installation dates, last active timestamps, and feature
usage patterns
- .Email addresses associated with user accounts
The Dalil breach is a case study in the risks posed by the proliferation of mobile
applications that collect far more data than their core functionality requires. A
caller identification app needs access to a phone number database and, arguably, a
user's own phone number. It does not need continuous GPS location tracking, device
IMEI numbers, or carrier information. The collection of this excessive data, combined
with the absence of any database security whatsoever, created a surveillance-grade
dataset on 5 million people that was freely accessible to anyone who knew where to
look.
The GPS location data is the most concerning element of the exposure. With latitude
and longitude coordinates associated with identified individuals, the dataset could
be used to track users' movements, identify their home and workplace addresses,
determine their daily routines, and establish patterns of association between
individuals who frequent the same locations. In a region where personal privacy is
culturally valued and where certain gatherings or associations may carry social or
legal sensitivity, the exposure of granular location data poses risks that extend
far beyond conventional identity theft.
The data could be weaponized for stalking, blackmail, social engineering, or targeted
surveillance. A domestic abuser could track a victim's movements. A business rival
could monitor a competitor's meetings and associations. A criminal organization could
identify high-value targets based on their home addresses and daily patterns. The
potential for harm from location data is limited only by the imagination and intent
of those who access it.
The 585GB volume of the database suggests that the exposure included not only current
user data but potentially historical location data spanning the application's entire
operational lifetime. This longitudinal location dataset would enable the construction
of detailed movement profiles showing how individuals' patterns changed over time,
where they traveled, and whom they met. For security services, intelligence agencies,
or malicious actors, this type of historical location data is extraordinarily
valuable and would normally require significant resources and legal authority to
obtain.
Open MongoDB instances have been a persistent vulnerability pattern across the
technology industry, with search engines like Shodan making it trivial to discover
unprotected databases. The Dalil case demonstrates that this vulnerability is not
limited to obscure startups in mature technology markets; it is a global problem
that affects applications serving populations in every region. The fact that an app
serving 5 million users in a high-profile market like Saudi Arabia could operate
with zero database authentication suggests a fundamental lack of security awareness
and resources during the application's development and deployment.
## Regulatory Analysis
The Dalil breach occurred in 2019, four years before the enactment of Saudi Arabia's
PDPL in September 2023. At the time of the breach, Saudi Arabia lacked a comprehensive
data protection law, which meant there was no dedicated regulatory body to investigate
the breach, no mandatory notification requirements, and no specific penalties for the
type of data exposure that occurred. This regulatory vacuum illustrates precisely why
the PDPL was necessary, as the Dalil incident, and others like it, demonstrated that
the Saudi digital ecosystem had outpaced the legal framework designed to protect it.
Analyzed under the PDPL as it now stands, the Dalil breach would trigger multiple
violations. Article 6 establishes consent requirements for the collection of personal
data, mandating that individuals be informed of the purpose for which their data is
being collected and that consent be specific, informed, and freely given. The
collection of GPS location data and device identifiers by a caller ID application
raises serious questions about whether users were adequately informed about the scope
of data collection and whether their consent extended to continuous location tracking.
If the app's privacy policy did not explicitly disclose the collection of GPS data,
or if consent was bundled into a general terms-of-service acceptance, the processing
would lack a valid legal basis under Article 6. The PDPL requires granular consent
that is specific to each purpose of processing, not blanket authorization hidden in
dense legal language that no reasonable user would read or understand.
Article 10 addresses the processing of personal data by third parties and the
obligations of data controllers to ensure processor compliance. If Dalil used
third-party cloud hosting services for its MongoDB instance, the responsibility for
securing the database would remain with Dalil as the data controller. The fact that
the database was deployed without authentication suggests either a catastrophic
misconfiguration or a deliberate decision to forgo security controls, neither of
which would constitute a valid defense under the PDPL.
Article 14's requirement for appropriate technical and organizational measures is
where the PDPL would apply most directly to the Dalil case. An exposed MongoDB
database with no authentication represents the most basic possible failure of
technical security measures. There were no access controls, no encryption, no
monitoring, and no alerts. This is not a case where sophisticated attackers defeated
well-designed defenses; it is a case where no defenses existed. Under the PDPL,
SDAIA would be justified in imposing the maximum fine of SAR 5 million, as it would
be difficult to identify a more complete failure of the duty to implement appropriate
security measures.
## What Should Have Been Done
The most fundamental control that should have been in place was database
authentication and access control. MongoDB supports robust authentication mechanisms
including SCRAM-SHA-256, X.509 certificates, and LDAP integration. Any of these
mechanisms, even the simplest username-and-password authentication, would have
prevented the database from being publicly accessible. The fact that the database
was deployed without any authentication represents a failure of the most basic
security hygiene.
Network-level access controls should have restricted database connections to only
the application servers that needed access, using firewall rules, VPN requirements,
or cloud security groups to ensure that the database was never directly reachable
from the internet. Defense in depth means that even if database authentication were
somehow bypassed, network-level controls would prevent unauthorized connections from
reaching the database at all.
Data minimization should have been a core design principle from the application's
inception. A caller identification app does not need to collect continuous GPS
location data, device IMEI numbers, or carrier information to perform its primary
function. The PDPL's data minimization principle, which requires that data collection
be limited to what is necessary for the stated purpose, would have prevented the
creation of this surveillance-grade dataset in the first place. By collecting only
the data necessary for caller identification, the blast radius of any potential
breach would have been limited to phone numbers and associated names, rather than a
comprehensive profile including physical location histories.
Encryption of data at rest should have been implemented to protect the database
contents even in the event of unauthorized access. MongoDB Enterprise and even the
community edition support encryption at rest using industry-standard algorithms. If
the database had been encrypted, an attacker who gained access to the raw files or
the database connection would have been unable to read the data without the encryption
keys. Combined with column-level encryption for the most sensitive fields such as GPS
coordinates and national identification numbers, encryption would have provided a
critical last line of defense.
Continuous security monitoring and vulnerability scanning should have been part of
the application's operational procedures. Automated tools that scan for exposed
databases, open ports, and misconfigured cloud services are widely available and are
considered a baseline requirement for any internet-facing application. Regular
penetration testing would have identified the open MongoDB instance as a critical
vulnerability, and a vulnerability management program would have ensured that the
issue was remediated promptly. The extended period of exposure suggests that no such
monitoring or testing was in place, leaving the 5 million users' data accessible to
anyone with the technical skill to run a Shodan search.
An open MongoDB containing the location histories and personal details of 5 million
users is not a sophisticated attack; it is a failure to implement the most
elementary security controls. The Dalil breach predated the PDPL, but it exemplifies
the data protection failures the law was designed to prevent. Any application
collecting personal data in Saudi Arabia today must treat database security as a
non-negotiable minimum, not an afterthought.