The information harvested from 2.6 million Duolingo users has been exposed on a hacking forum, enabling malicious actors to execute precise phishing campaigns leveraging the compromised details. Duolingo stands as one of the globe's most prominent language learning platforms, boasting a user base of more than 74 million individuals each month.
Back in January 2023, an individual put up for sale the harvested information of 2.6 million Duolingo users on the now-defunct Breached hacking forum, with a price tag of over £1,000.
The dataset comprises a blend of publicly accessible usernames and actual names, as well as confidential details like email addresses and internal information associated with the DuoLingo platform.
Although a user's genuine name and login name are openly displayed within their Duolingo profile, the email addresses raise greater concern due to their potential for enabling malicious activities through the exploitation of this publicly accessible information.
Upon the data's availability for purchase, DuoLingo verified that it had been collected from publicly available profile details. They also indicated an ongoing assessment of whether additional safeguards were necessary.
Duolingo did not acknowledge the inclusion of email addresses in the dataset, which constitutes non-public information.
A post on a hacking forum reads, "Today, I've made the Duolingo Scrape available for download. Thank you for your attention and enjoy!"
The information was gathered by exploiting a publicly accessible application programming interface (API) that has been openly circulated since at least March 2023. Researchers have shared tweets and public documentation detailing the procedure to utilise this API.
This API permits individuals to input a username and acquire JSON output containing the user's public profile particulars. However, it also allows the input of an email address to verify its connection to a legitimate DuoLingo account.
Despite the report of its misuse to DuoLingo in January, this API remains openly accessible to anyone on the internet.
Through this API, the scraper was able to input countless email addresses—likely obtained from previous data breaches—and verify their association with DuoLingo accounts. Subsequently, these email addresses were leveraged to compile the dataset encompassing both publicly available and confidential information.
Additionally, another malicious actor has distributed their own findings from the API scrape. They highlighted that those interested in employing the data for phishing endeavours should concentrate on specific fields denoting heightened permissions for select DuoLingo users, making them more valuable targets.
Frequently Disregarded Scrape Data
Companies often downplay the significance of scraped data, citing its predominantly public nature, albeit often requiring intricate compilation processes.
Nevertheless, when public data intertwines with private details like phone numbers and email addresses, it significantly amplifies the risk associated with the exposed information. This amalgamation may potentially infringe upon data protection regulations.
To sum up
The dismissive attitude towards scraped data overlooks the nuanced nature of its potential consequences. While it's true that much of the information might be publicly accessible, the real concern arises when sensitive details such as phone numbers and email addresses become entangled with this data, ultimately magnifying the associated risks, and potentially violating data protection laws.
To counteract the vulnerabilities exposed by these types of attacks, cyber security companies play a crucial role. By implementing proactive measures such as robust intrusion detection systems, continuous monitoring, and advanced threat analysis, they can identify and address vulnerabilities that malicious actors exploit. Furthermore, these companies can work to create secure APIs that limit access to sensitive user information, conducting regular security audits to identify and patch potential weaknesses.

