YouTube ID can be extracted in 3 forms – one from youtube.com domain, the second from youtu.be domain, and the third by specifying the ID itself. In this article let’s understand how we can create a regex for extracting YouTube ID and how regex can be matched for ID.
Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.
Structure of YouTube URL from YouTube ID
- It should have 11 characters
- It can be a part of YouTube.com domain
- It can be a part of YouTu.be domain
- It can exist independently as string
Regex for extracting YouTube URL from YouTube URL
Regular Expression-
Containing minimum 8 characters, with at least 1 letter and 1 number-
/^(?<=\d\/|\.be\/|v[=\/])([\w\-]{11,})|^([\w\-]{11})$/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
thisdoesnotmatchs | does not match |
http://www.youtube.com/watch?v=vpiMAaPTze8 | matches |
1213233 | does not match |
l_lasASQJdk | matches |
http://youtu.be/l_lca11iQsdk | matches |
Here is a detailed explanation of the above regex-
/^(?<=\d\/|\.be\/|v[=\/])([\w\-]{11,})|^([\w\-]{11})$/gm
1st Alternative (?<=\d\/|\.be\/|v[=\/])([\w\-]{11,})
Positive Lookbehind (?<=\d\/|\.be\/|v[=\/])
Assert that the Regex below matches
1st Alternative \d\/
\d matches a digit (equivalent to [0-9])
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
2nd Alternative \.be\/
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
be matches the characters be literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
3rd Alternative v[=\/]
v matches the character v with index 11810 (7616 or 1668) literally (case insensitive)
Match a single character present in the list below [=\/]
= matches the character = with index 6110 (3D16 or 758) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
1st Capturing Group ([\w\-]{11,})
Match a single character present in the list below [\w\-]
{11,} matches the previous token between 11 and unlimited times, as many times as possible, giving back as needed (greedy)
\w matches any word character (equivalent to [a-zA-Z0-9_])
\- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
2nd Alternative ^([\w\-]{11})$
^ asserts position at start of a line
2nd Capturing Group ([\w\-]{11})
Match a single character present in the list below [\w\-]
{11} matches the previous token exactly 11 times
\w matches any word character (equivalent to [a-zA-Z0-9_])
\- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
The above regex just matches for the pattern and gets the match for YouTube ID from the URL. Validation of YouTube ID if it exists on YouTube.com or not is another problem in itself that has to be solved.
Hope this article was useful to match YouTube ID from URL regex pattern. In this article, we delved into the diverse methods of extracting YouTube IDs, whether from youtube.com, youtu.be, or by specifying the ID itself. The power of regular expressions (regex) was harnessed to create effective patterns for extraction. Regex, a potent tool for text manipulation, offers versatility in locating and validating patterns within text. By understanding these techniques, you’re better equipped to handle YouTube ID extraction using regex in various scenarios.