A file is a collection of data that is stored in a computer or other electronic device. It can contain any type of information, such as text, numbers, images, audio, or video. There are many different types of files, including text files, data files, audio files, video files, and more. The type of file is determined by the file extension, which is the three or four letter code that appears after the period at the end of the file name. For example, a file with the extension “.txt” is a text file, while a file with the extension “.mp3” is an audio file. In this article let’s understand how we can create a regex for file and how regex can be matched for file with extensions.
Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.
Structure of file name
- A complete file name consists of – name of file and its extension
- File name can contain any character except for the following – \ / : * ? ” < > |
- File extension can contain any character except for the following – \ / : * ? ” < > |
- File extension can be of any length depending on the operating system
- File name can be of any length depending on the operating system
Regex for checking if File Extension is valid or not
Regular Expression for document file extensions – .doc, .docx, .pdf, .txt, .rtf, .odt, .wps, .wpd, .pages
/(?i:^.*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
something.wrongext | does not match |
donald-trump.is.from.usa.pdf | matches |
213245 | does not match |
mydoc.docx | matches |
Regular Expression for spreadsheet file extensions – .xls, .xlsx, .csv, .ods, .fods, .ots, .gnumeric, .numbers
/(?i:^.*\.(xls|xlsx|csv|ods|fods|ots|gnumeric|numbers)$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
something.wrongext | does not match |
finance-report.xlsx | matches |
213245 | does not match |
email-list.csv | matches |
Regular Expression for presentation file extensions – .ppt, .pptx, .pps, .ppsx, .odp, .fodp, .otp, .key
/(?i:^.*\.(ppt|pptx|pps|ppsx|odp|fodp|otp|key)$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
something.wrongext | does not match |
finance-report.odp | matches |
213245 | does not match |
pitch-deck.pptx | matches |
Regular Expression for image file extensions – .jpg, .jpeg, .png, .gif, .bmp, .tiff, .psd, .raw, .cr2, .nef, .orf, .sr2
Regex supporting standard image file formats like .jpg, .jpeg, .png, .gif
/(?i:^.*\.(jpg|jpeg|png|gif)$)/gm
Regex supporting all image formats
/(?i:^.*\.(jpg|jpeg|png|gif|bmp|tiff|psd|raw|cr2|nef|orf|sr2)$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
something.wrongext | does not match |
an-image.png | matches |
213245 | does not match |
image-raw-file.psd | matches |
Regular Expression for audio file extensions – .mp3, .wav, .wma, .aac, .flac, .ogg, .m4a, .aiff, .alac, .amr, .ape, .au, .mpc, .tta, .wv, .opus
Regex supporting standard audio file formats like .mp3, .wav, .m4a
/(?i:^.*\.(mp3|wav|m4a)$)/gm
Regex supporting all audio formats
/(?i:^.*\.(mp3|wav|wma|aac|flac|ogg|m4a|aiff|alac|amr|ape|au|mpc|tta|wv|opus)$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
something.wrongext | does not match |
my-song.wav | matches |
4532.png | does not match |
audio-raw-file.aiff | matches |
Regular Expression for video file extensions – .mp4, .avi, .wmv, .mov, .flv, .mkv, .webm, .vob, .ogv, .m4v, .3gp, .3g2, .mpeg, .mpg, .m2v, .m4v, .svi, .3gpp, .3gpp2, .mxf, .roq, .nsv, .flv, .f4v, .f4p, .f4a, .f4b
Regex supporting standard video file formats like .mp4, .avi, .wmv, .mov, .flv, .mkv
/(?i:^.*\.(mp4|mov|avi|mkv|flv)$)/gm
Regex supporting all video formats
/(?i:^.*\.(mp4|avi|wmv|mov|flv|mkv|webm|vob|ogv|m4v|3gp|3g2|mpeg|mpg|m2v|m4v|svi|3gpp|3gpp2|mxf|roq|nsv|flv|f4v|f4p|f4a|f4b)$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
something.wrongext | does not match |
my-video.mp4 | matches |
4532.png | does not match |
video-file.webm | matches |
Regular Expression for compressed file extensions – .zip, .rar, .7z, .tar, .gz, .bz2, .xz, .iso, .dmg
Regex supporting standard compressed file formats like .zip, .rar, .7z, .tar
/(?i:^.*\.(zip|rar|7z|tar)$)/gm
Regex supporting all compressed file formats
/(?i:^.*\.(mp4|avi|wmv|mov|flv|mkv|webm|vob|ogv|m4v|3gp|3g2|mpeg|mpg|m2v|m4v|svi|3gpp|3gpp2|mxf|roq|nsv|flv|f4v|f4p|f4a|f4b)$)/gm
Test string examples for the above regex-
Input String | Match Output |
---|---|
something.wrongext | does not match |
my-video.mp4 | matches |
4532.png | does not match |
video-file.webm | matches |
Explanation of Regex
Here is a detailed explanation of the document file extension regex-
/(?i:^.\*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)/gm
Non-capturing Group. Matches the tokens contained with the following effective flags: gmi (?i:^.\*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
^ asserts position at start of a line
. matches any character (except for line terminators)
- matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
1st Capturing Group (doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)
1st Alternative doc
doc matches the characters doc literally (case insensitive)
2nd Alternative docx
docx matches the characters docx literally (case insensitive)
3rd Alternative pdf
pdf matches the characters pdf literally (case insensitive)
4th Alternative txt
txt matches the characters txt literally (case insensitive)
5th Alternative rtf
rtf matches the characters rtf literally (case insensitive)
6th Alternative odt
odt matches the characters odt literally (case insensitive)
7th Alternative wps
wps matches the characters wps literally (case insensitive)
8th Alternative wpd
wpd matches the characters wpd literally (case insensitive)
9th Alternative pages
pages matches the characters pages literally (case insensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Hope this article was useful to match file extensions regex pattern. In this article, we explored the concept of files, their structure, and the importance of file extensions in determining file types. We delved into the power of regular expressions (regex) and provided detailed regex patterns for validating various file extensions, including document, spreadsheet, presentation, image, audio, video, and compressed file formats. Understanding and implementing these regex patterns will help programmers and developers efficiently handle and validate file extensions in their projects.