A domain is a unique name that identifies a website or other resource on the internet. It is used to access a website or resource by entering the domain name into a web browser’s address bar or by clicking on a link. In this article let’s understand how we can create a regex for domain name and how regex can be matched for domain name.
Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.
Structure of a Domain Name
The domain name should have the following criteria and structure-
- It may or maynot contain
www.
or a optionally a subdomain - then it must be followed by domain name
- then it will be followed by top level domain(TLD) like .com, .net, .io etc.,
Regex for checking if Domain Name is valid or not
Regular Expression-
/^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$/igm
Test string examples for the above regex-
Input String | Match Output |
---|---|
.as10 | does not match |
www.google.com | matches |
#@$some .qwq.eras | does not match |
something.debugpointer.com | matches |
Here is a detailed explanation of the above regex-
/^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$/igm
^ asserts position at start of a line
1st Capturing Group ((?!-))
Negative Lookahead (?!-)
Assert that the Regex below does not match
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
2nd Capturing Group (xn--)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
xn-- matches the characters xn-- literally (case insensitive)
Match a single character present in the list below [a-z0-9]
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
Match a single character present in the list below [a-z0-9-_]
{0,61} matches the previous token between 0 and 61 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
This hyphen is treated literally, which might be confusing for others. Consider escaping it or placing at the start or end of the class.
_ matches the character _ with index 9510 (5F16 or 1378) literally (case insensitive)
Match a single character present in the list below [a-z0-9]
{0,1} matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
3rd Capturing Group (xn--)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
xn-- matches the characters xn-- literally (case insensitive)
4th Capturing Group ([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})
1st Alternative [a-z0-9\-]{1,61}
Match a single character present in the list below [a-z0-9\-]
{1,61} matches the previous token between 1 and 61 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
\- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
2nd Alternative [a-z0-9-]{1,30}\.[a-z]{2,}
Match a single character present in the list below [a-z0-9-]
{1,30} matches the previous token between 1 and 30 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
Match a single character present in the list below [a-z]
{2,} matches the previous token between 2 and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
$ asserts position at the end of a line
Global pattern flags
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Hope this article was useful to check if the string is a valid domain name or not. In this article, we explored the significance of domain names and their structure on the internet. We delved into the world of regular expressions (regex), understanding how they serve as powerful tools for validating and manipulating text. By dissecting the intricate regex pattern for domain name validation, we gained insights into its components and usage. This knowledge empowers us to effectively assess the validity of domain names and harness regex capabilities for various programming and text-related tasks.