CIN stands for Corporate Identification Number and is a unique 21-digit alphanumeric code assigned to a company by the Ministry of Corporate Affairs (MCA) in India. It is used to identify the company and is a part of the company’s registration documents. In this article let’s understand how we can create a regex for CIN and how regex can be matched for CIN number.
Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.
Structure of Corporate Identification Number (CIN)
- CIN is a 21 digits alpha-numeric code.
- It starts with either alphabet letter U or L.
- Next five characters are reserved for digits (0-9).
- Next two places are occupied by alphabet letters(A-Z-a-z).
- Next four places are taken by digits(0-9).
- Next three characters are reserved for alphabet letters (A-Za-z).
- Next six characters are digits(0-9).
- It should not contain any special character or whitespaces.
Regex for checking if CIN is valid
Regular Expression-
/^([LUu]{1})([0-9]{5})([A-Za-z]{2})([0-9]{4})([A-Za-z]{3})([0-9]{6})$/gmi
Test string examples for the above regex-
Input String | Match Output |
---|---|
U12345ASDAS784CDE1234 | does not match |
U43245ZA3424ERE134343 | matches |
U12345AB6784CDE1234 | does not match |
L75645XX2344FFE643322 | matches |
U43245ZA3424ERE134343 L75645XX2344FFE643322 U12345AB6784CDE1234 U12345ASDAS784CDE1234
Here is a detailed explanation of the above regex-
/^([LUu]{1})([0-9]{5})([A-Za-z]{2})([0-9]{4})([A-Za-z]{3})([0-9]{6})$/gmi
1st Capturing Group ([LUu]{1})
Match a single character present in the list below [LUu]
{1} matches the previous token exactly one time (meaningless quantifier)
LUu matches a single character in the list LUu (case insensitive)
2nd Capturing Group ([0-9]{5})
Match a single character present in the list below [0-9]
{5} matches the previous token exactly 5 times
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
3rd Capturing Group ([A-Za-z]{2})
Match a single character present in the list below [A-Za-z]
{2} matches the previous token exactly 2 times
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
4th Capturing Group ([0-9]{4})
Match a single character present in the list below [0-9]
{4} matches the previous token exactly 4 times
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
5th Capturing Group ([A-Za-z]{3})
Match a single character present in the list below [A-Za-z]
{3} matches the previous token exactly 3 times
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
6th Capturing Group ([0-9]{6})
Match a single character present in the list below [0-9]
{6} matches the previous token exactly 6 times
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
Hope this article was useful to match CIN regex pattern. In conclusion, a Corporate Identification Number (CIN) is a unique 21-digit alphanumeric code assigned to companies in India by the Ministry of Corporate Affairs (MCA). Creating a regex pattern for CIN and using regex to match CIN numbers can help in validating and identifying these codes efficiently. Regex is a powerful tool widely used in programming languages, text editors, and command line tools for searching and manipulating text based on specific patterns.