
Where Unicode Collation Meets Punycode Domains: A Zero-Click Account Takeover
Summary for readers: This post explains a subtle Unicode/Punycode pitfall that can appear in modern authentication flows. It highlights how a normalization mismatch enabled a zero‑click account takeover (ATO) scenario and how to remediate it safely.
This vulnerability was reported through Defend Iceland's bug bounty platform, affecting one or more customers. The independent security research surfaced a subtle authentication quirk worth sharing with the broader community. For engineers, it's a clear lesson about the intersection of database collation and internationalized domains creating unexpected attack vectors.
The vulnerability can affect both OTP-based authentication and password reset flows, where normalization mismatches can lead to account compromise through intercepted tokens. By sharing this technical analysis, we're helping everyone understand these cybersecurity threats and how to defend against them.
Video: Zero-Click ATO Demonstration
This demonstration video illustrates a realistic example of how this vulnerability presented itself affecting one or more of Defend Iceland customers.
Technical Deep-Dive: The Collation Vulnerability
The core technical issue stems from database collation behavior. Many database systems use accent-insensitive collations that perform Unicode normalization, treating visually similar characters as equivalent during string comparisons. We'll use MySQL's utf8mb4_unicode_ci collation as an example, though this applies to similar collations across different database systems.
Our testing revealed that several characters commonly used in Icelandic and other European languages are particularly vulnerable:
- í = i ✅ MATCHES
- á = a ✅ MATCHES
- é = e ✅ MATCHES
- ó = o ✅ MATCHES
- ý = y ✅ MATCHES
While characters like ö, þ, ð, and æ remain safe as they don't normalize to single ASCII equivalents.
This means that in a vulnerable system, a database query for SELECT * FROM users WHERE email = 'victim@gmáîl.com' would successfully match a stored record of victim@gmail.com, enabling the attack chain.
Why Punycode Matters (And Where Normalization Bites)
Punycode encoding lets domain names include accented characters (é, í, ó, and more) through Internationalized Domain Names (IDNs). These Unicode labels map to ASCII using Punycode—for example, gmáîl.com becomes 'http://xn--gml-fla9d.com/'. Many top-level domains support IDN registrations. In the .is TLD, it’s possible to register defendíceland.is via ISNIC.is, and it would be stored as 'http://xn--defendceland-xfb.is/'
At the same time, database collations perform accent-insensitive comparisons where characters like "é" are treated as equivalent to "e". When these two systems interact, you get a dangerous mismatch:
- Lookup side (accent-insensitive): victim@gmáîl.com matches victim@gmail.com in the user table.
- Mailer side (literal): OTP is sent to victim@gmáîl.com (the attacker-controlled domain), not the verified user email on file.
The attack becomes practical through domain registration and email forwarding. An attacker can register domains like gmáîl.com and configure email forwarding to route all emails to their controlled inbox, creating a seamless interception mechanism.
Realistic Example
- A victim's account email is victim@gmail.com.
- An attacker purchases & controls the domain gmáîl.com (Punycode: xn--gml-fla9d.com) and sets up email forwarding.
- The attacker requests a sign-in OTP using victim@gmáîl.com.
- The application normalizes the email for lookup using accent-insensitive collation, so it matches victim@gmail.com.
- The mailer sends the OTP to the original (Unicode) address victim@gmáîl.com.
- Email forwarding delivers the OTP to the attacker's inbox.
- The attacker receives the OTP and signs in as the victim. Zero‑click ATO.
The same technique can be applied broadly to target common domains and even company domains with accented variants, e.g., éxample.com → xn--xample-9ua.com.
How This Vulnerability Can Be Discovered (Safely)
This can be validated without buying multiple domains:
- Plus-addressing: Register with user+e@gmail.com, then request an OTP for user+é@gmail.com. The normalized lookup maps both to the same mailbox, but the OTP is delivered to the Unicode variant.
- Burp intercept: If the frontend rejects Unicode, intercept a valid request and replace the email with the Unicode version before it hits the server.
- Database testing: Direct SQL queries can validate collation behavior: SELECT 'á' = 'a' returns 1 (true) in vulnerable collations.
Root Cause - Lookup vs Delivery Mismatch
The email lookup treated accented characters as equivalent (accent‑insensitive comparison), while the mailer sent the one‑time password (OTP) to the literal address provided in the request. That discrepancy let an attacker receive the OTP for a legitimate account.
Where this can happen: If your database or framework uses accent‑insensitive equality for emails, you're in scope. This applies across DBMSs, including MySQL (e.g., utf8mb4_unicode_ci and other utf8mb4_*_ai_* collations) and Microsoft SQL Server (e.g., *_CI_AI collations). The vulnerability lies in the normalization logic that these collations implement—they're designed to be user-friendly for searching, but create security gaps when used for authentication identifiers.
Remediation in Practice
- Authoritative delivery address: Always send sign‑in codes/OTPs (a.k.a. PINs) to the stored, verified email on the user record—never the raw inbound address.
- Align comparisons: For security‑sensitive identifiers (emails, usernames), avoid accent‑insensitive equality. Consider using binary collations or exact-match comparisons after proper canonicalization.
- Keep handling consistent: Ensure the same canonicalization rules are applied across lookup and delivery so there's no split‑brain behavior.
- Input validation: Implement strict validation that rejects or normalizes Unicode input at the application boundary before it reaches the database layer.
The Role of Defend Iceland & Responsible Disclosure
This vulnerability was discovered and responsibly disclosed through Defend Iceland's bug bounty platform. Defend Iceland's mission to strengthen our digital infrastructure through collaborative security research made this discovery possible. Organizations participating in such programs are actively investing in cybersecurity, welcoming external expertise, and demonstrating a commitment to protecting their users and our broader digital community.
The security research that identified this issue was conducted by researcher "lil_endian" (@lil_endian), whose analysis and responsible disclosure helped prevent potential exploitation while contributing valuable knowledge to the security community.
By design, this class of issue is sneaky and easy to miss, many teams aren't aware their database or framework treats accented characters as equivalent. It took dedicated security research and careful testing to surface it. By sharing these technical details publicly, we're helping the broader community recognize and remediate a tricky class of issues before they become real threats.
This transparency isn't just good security practic, it's a contribution to building a stronger cybersecurity culture where organizations learn from each other's discoveries and collectively improve our digital defenses.