An email (short for “electronic mail”) is a message sent over a computer network, typically the internet, using a protocol called Simple Mail Transfer Protocol (SMTP). An email message consists of two main parts: the message header, which contains information about the sender, recipient, and the message itself, and the message body, which contains the content of the message. In this article let’s understand how we can create a regex for Email and how regex can be matched for Email.
Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.
Structure of Email Address
- The first part should contain one or more characters that can appear before the
@
symbol in an email address. It includes all alphabetic characters (upper and lower case), digits, and some special characters. - The @ symbol is matched literally.
- The second part contains zero or one occurrence of a subdomain (which is itself made up of one or more alphabetic characters (upper and lower case), digits, and hyphens).
- The final part matches zero or more additional subdomains, each of which is similar to the main part of the email address.
This regex will match most standard email addresses, but it may not match all possible email addresses due to the complexity of the email address specification. In particular, this regex does not support email addresses with quoted local parts (e.g., “[email protected]”), or email addresses with comments (e.g., “user@(comment)example.com”). This regex will match most standard email addresses, but it may not match all possible email addresses due to the complexity of the email address specification. In particular, this regex does not support email addresses with quoted local parts (e.g., “[email protected]”), or email addresses with comments (e.g., “user@(comment)example.com”).
Regex for checking if Email Address is valid
- The caret (^) symbol indicates the start of the string.
- The first part
[a-zA-Z0-9.!#$%&'*+/=?^_{|}~-]+
matches one or more characters that can appear before the@` symbol in an email address. It includes all alphabetic characters (upper and lower case), digits, and some special characters. - The @ symbol is matched literally.
- The second part
[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
matches the main part of the email address, which can consist of one or more alphabetic characters (upper and lower case) and digits, followed by zero or one occurrence of a subdomain (which is itself made up of one or more alphabetic characters (upper and lower case), digits, and hyphens). - The final part
(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*
matches zero or more additional subdomains, each of which is similar to the main part of the email address. - The dollar sign ($) indicates the end of the string.
Regular Expression-
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/gmi
Test string examples for the above regex-
Input String | Match Output |
---|---|
asd:[email protected] | does not match |
[email protected] | matches |
[email protected]:port | does not match |
[email protected] | matches |
This regex will match most standard email addresses, but it may not match all possible email addresses due to the complexity of the email address specification. In particular, this regex does not support email addresses with quoted local parts (e.g., “[email protected]”), or email addresses with comments (e.g., “user@(comment)example.com”).
Here is a detailed explanation of the above regex-
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/gmi
^ asserts position at start of a line
Match a single character present in the list below [a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
.!#$%&'*+ matches a single character in the list .!#$%&'*+ (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
=?^_`{|}~- matches a single character in the list =?^_`{|}~- (case insensitive)
@ matches the character @ with index 6410 (4016 or 1008) literally (case insensitive)
Match a single character present in the list below [a-zA-Z0-9]
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
Non-capturing group (?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [a-zA-Z0-9-]
{0,61} matches the previous token between 0 and 61 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
Match a single character present in the list below [a-zA-Z0-9]
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
Non-capturing group (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
Match a single character present in the list below [a-zA-Z0-9]
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
Non-capturing group (?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [a-zA-Z0-9-]
{0,61} matches the previous token between 0 and 61 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
Match a single character present in the list below [a-zA-Z0-9]
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
It is important to note that using a regular expression to validate email addresses is not a foolproof method, as there are many subtleties and edge cases to consider. It is generally a better idea to use a library or service that is specifically designed for email validation.
Hope this article was useful to match email regex pattern.