A comma-separated string is a string that consists of a sequence of values separated by commas. It is a way of representing a list of items as a single string, with each item in the list being separated by a comma. In this article let’s understand how we can create a regex for age of a person and how regex can be matched for age.
Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.
Conditions to match a Comma Separated String (CSV)
The csv string has to be in the format-
- Strings – empty or of defined length
- Two strings separated by , separator
Regex for checking if its a valid Comma Separated String (CSV)
Regular Expression-
/(?:^|,)(?=[^"]|(")?)"?((?(1)(?:[^"]|"")*|[^,"]*))"?(?=,|$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
,2.99,AMO024,Title,”Description, more info”,,123987564 | matches |
123,2.99,AMO024,Title,”Description, more info”,,123987564 | matches |
“””test”” test”,”test””fdas””fdad”, | matches |
Here is a detailed explanation of the above regex-
/(?:^|,)(?=[^"]|(")?)"?((?(1)(?:[^"]|"")*|[^,"]*))"?(?=,|$)/gm
Non-capturing group (?:^|,)
1st Alternative ^
^ asserts position at start of a line
2nd Alternative ,
, matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
Positive Lookahead (?=[^"]|(")?)
Assert that the Regex below matches
1st Alternative [^"]
Match a single character not present in the list below [^"]
" matches the character " with index 3410 (2216 or 428) literally (case sensitive)
2nd Alternative (")?
1st Capturing Group (")?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
" matches the character " with index 3410 (2216 or 428) literally (case sensitive)
" matches the character " with index 3410 (2216 or 428) literally (case sensitive)
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group ((?(1)(?:[^"]|"")*|[^,"]*))
Conditional (?(1)(?:[^"]|"")*|[^,"]*)
Conditionally matches one of two options depending on whether the 1st capturing group matched
If condition is met, match the following regex (?:[^"]|"")*
Non-capturing group (?:[^"]|"")*
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
1st Alternative [^"]
Match a single character not present in the list below [^"]
" matches the character " with index 3410 (2216 or 428) literally (case sensitive)
2nd Alternative ""
"" matches the characters "" literally (case sensitive)
Else match the following regex [^,"]*
Match a single character not present in the list below [^,"]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
," matches a single character in the list ," (case sensitive)
" matches the character " with index 3410 (2216 or 428) literally (case sensitive)
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
Positive Lookahead (?=,|$)
Assert that the Regex below matches
1st Alternative ,
, matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
2nd Alternative $
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
x modifier: extended. Spaces and text after a ## in the pattern are ignored
Hope this article was useful to check the comma separated(csv) string capturing group is valid.