Vimeo URL has various public URL formats. In this article let’s understand how we can create a regex for extracting Vimeo ID and how regex can be matched for ID.
Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.
Structure of Vimeo URL
- It should have domain
vimeo.com
- It can have Video ID
vimeo.com/[Video ID]
- It can have Channels
vimeo.com/channels/[Channel]/[Video ID]
- It can have Groups
vimeo.com/groups/[Group]/[Video ID]
- It can have Player
player.vimeo.com/video/[Video ID]
Regex for Vimeo URL
In this expression we are also extracting the Vimeo ID from Vimeo URL.
Regular Expression-
/^(http|https)?:\/\/(www\.|player\.)?vimeo\.com\/(?:channels\/(?:\w+\/)?|groups\/([^\/]*)\/videos\/|video\/|)(\d+)(?:|\/\?)$/gmi
Test string examples for the above regex-
Input String | Match Output |
---|---|
www.google.com | does not match |
https://vimeo.com/1234323 | matches |
www.youtube.com | does not match |
https://vimeo.com/channels/mychannel/1234323 | matches |
https://player.vimeo.com/video/1234323 | matches |
Here is a detailed explanation of the above regex-
/^(http|https)?:\/\/(www\.|player\.)?vimeo\.com\/(?:channels\/(?:\w+\/)?|groups\/([^\/]*)\/videos\/|video\/|)(\d+)(?:|\/\?)$/gmi
^ asserts position at start of a line
1st Capturing Group (http|https)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
1st Alternative http
http matches the characters http literally (case insensitive)
2nd Alternative https
https matches the characters https literally (case insensitive)
: matches the character : with index 5810 (3A16 or 728) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
2nd Capturing Group (www\.|player\.)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
1st Alternative www\.
www matches the characters www literally (case insensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
2nd Alternative player\.
player matches the characters player literally (case insensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
vimeo matches the characters vimeo literally (case insensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
com matches the characters com literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
Non-capturing group (?:channels\/(?:\w+\/)?|groups\/([^\/]*)\/videos\/|video\/|)
1st Alternative channels\/(?:\w+\/)?
channels matches the characters channels literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
Non-capturing group (?:\w+\/)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
\w matches any word character (equivalent to [a-zA-Z0-9_])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
2nd Alternative groups\/([^\/]*)\/videos\/
groups matches the characters groups literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
3rd Capturing Group ([^\/]*)
Match a single character not present in the list below [^\/]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
videos matches the characters videos literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
3rd Alternative video\/
video matches the characters video literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
4th Alternative — always finds a zero-length match
4th Capturing Group (\d+)
\d matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
Non-capturing group (?:|\/\?)
1st Alternative — always finds a zero-length match
2nd Alternative \/\?
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
\? matches the character ? with index 6310 (3F16 or 778) literally (case insensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
The above regex just matches for the pattern and gets the match for Vimeo ID from the URL. Validation of Vimeo ID if it exists on Vimeo.com or not is another problem in itself that has to be solved.
Hope this article was useful to match Vimeo ID from URL regex pattern. In this article, we explored the versatility of regular expressions (regex) and their practical application in extracting Vimeo IDs from various URL formats. Regex is a powerful tool for pattern recognition and manipulation of text. We discussed the structure of different Vimeo URLs and provided a detailed explanation of a regex pattern that effectively captures Vimeo IDs. Remember that while regex can help identify patterns, additional validation might be required for ensuring the existence of extracted IDs on Vimeo’s platform.