What is a valid email address?

There's only one real answer to this: a valid email address is one that you can send emails to.

There are acknowledged standards for what constitutes a valid email address. These are defined in the Request For Comments documents (RFCs) written by the lords of the internet. These documents are not rules but simply statements of what some people feel is appropriate behaviour.

Consequently, the people who make email software have often ignored the RFCs and done their own thing. Thus it is perfectly possible for you to have been issued an email address by your internet service provider (ISP) that flouts the RFC conventions and is in that sense invalid.

But if your address works then why does it matter if it's invalid?

That brings us onto the most important principle in distributed software.

The Robustness Principle

A very great man, now sadly dead, once said

be conservative in what you do, be liberal in what you accept from others

We take this to mean that all messages you send out should conform carefully to the accepted standards. Messages you receive should be interpreted as the sender intended so long as the meaning is clear.

This is a very valuable principle that allows networked software written by different people at different times to work together. If we are picky about the standards conformance of other people's work then we will lose useful functions and services.

How does this apply to validating email addresses?

If a friend says to you “this is my email address” then there's no point saying to her “Ah, but it violates RFC 5321”. That's not her fault. Her ISP has given her that address and it works and she's committed to it.

If you've got an online business that she wants to register for, she will enter her email address into the registration page. If you then refuse to create her account on the grounds that her email address is non-conformant then you've lost a customer. More fool you.

If she says her address is sally.@herisp.com the chances are she's typed it in wrong. Maybe she missed off her surname. So there is a point in validating the address – you can ask her if she's sure it's right before you lose her attention and your only mean of communicating with a potential customer. Most likely she'll say “Oh yes, silly me” and correct it.

Occasionally a user might say “Damn right that's my email address. Quit bugging me and register my account”. Better register the account before you lose a customer, even if it's not a valid email address.

Getting it right

If you're going to validate an email address you should get it right. Hardly anybody does.

The worst error is to reject email addresses that are perfectly valid. If you have a Gmail account (e.g. sally.phillips@gmail.com) then you can send emails to sally.phillips+anything@gmail.com. It will arrive in your inbox perfectly. This is great for registering with websites because you can see if they've passed your address on to somebody else when email starts arriving addressed to the unique address you gave to the website (e.g. sally.phillips+unique_reference@gmail.com).

But.

Sadly, many websites won't let you register an address with a plus sign in it. Not because they are trying to defeat your tracking strategy but just because they are crap. They've copied a broken regular expression from a dodgy website and they are using it to validate email addresses. And losing customers as a result.

How long can an email address be? A lot of people say 320 characters. A lot of people are wrong. It's 254 characters.

What RFC is the authority for mailbox formats? RFC 822? RFC 2822? Nope, it's RFC 5321.

Getting it right is hard because the RFCs that define the conventions are trying to serve many masters and they document conventions that grew up in the early wild west days of email.

My recommendation is: don't try this yourself. There's free code out there in many languages that will do this better than anybody's first attempt. My own first attempt was particularly laughable.

Test cases

If you do try to write validation code yourself then you should at least test it. Even if you're adopting somebody else's validator you should test it.

To do this you're going to have to write a series of unit tests that explore all the nooks and crannies of what is allowed by the RFCs.

Oh wait. You don't have to do that because I've done it for you.

Packaged along with the free is_email() code is an XML file of 164 unit tests. If you can write a validator that passes all of them: congratulations, you've done something hard.

See the tests and the results for is_email() here.

If you think any of the test cases is wrong please leave a comment here.