How to Use Regular Expressions TODAY in Your Windows PowerShell Code
If you're a Window systems administrator (and decidedly not a programmer), I would hazard a guess that your PowerShell adoption thus far has been a bit...slow. Am I correct?
Let me speed things up for you. I'll teach you in this article how to use regular expressions (regex for short, typically pronounced REJ-ex) in your PowerShell code to parse string data with laser-like efficiency.
Suppose you're tasked with one or more of the following real-world scenarios:
- Finding personally identifiable information in a folder containing hundreds of files
- Finding and replacing globally unique identifiers (GUIDs) in hundreds of server log files
- Validating date formats and password strength in your company's intranet portal
The aforementioned tasks are trivial for .NET programmers: "I'll just use regex!" they say. However, if you're getting into PowerShell automation slowly, your blood might run cold at the thought of performing complicated pattern matches.
Don't stress! By the end of this article, you'll understand what regex actually does, and you'll learn how to implement regex patterns in PowerShell by using the -match operator, the -replace operator, and the Select-String cmdlet. Let's begin.
Regular Expression Basics
In a nutshell, regular expressions represent a rule set for performing pattern matching on string data. You're probably familiar with using the old MS-DOS wildcard characters. For instance, we can run the following command at the prompt to find all .xls or .xlsx files in the current folder whose names contain the word report:
C:\>dir *report*.xl?
In this example, the asterisk (*) represents zero or more characters, and the question mark (?) substitutes for any single character.
Open an administrative PowerShell console, and let's dive right in. We can use the -match operator to perform true/false tests against incoming string data. Doing so gives you valuable practice with both regex and PowerShell syntax.
The following tests should both evaluate to True. Can you see why?
'project14' -match 'pro' 'project14' -match '14'
Your first regular expressions lesson is that you can perform literal matches. The subject string project14 contains both pro and 14, so both expressions evaluate to True. Of course, this question arises: Does the match value include just the matching characters, or the entire string?
Windows PowerShell populates the $matches automatic array variable with the previous regex match result. Run the previous tests again, this time adding $matches after each. In the following code, I'm using the PowerShell command separator, the semicolon (;), to keep the example compact:
PS C:\> 'project14' -match 'pro' ; $Matches True Name Value ---- ----- 0 pro PS C:\> 'project14' -match '14' ; $Matches True Name Value ---- ----- 0 14
Now let's say we have a bunch of files whose names start with the word project. Do you think the following expression will result in True or False?
'project14' -match 'project*'
If you tried the previous example, you know we'll get False here. Why? Your second regular expression lesson is that some regex metacharacters operate only on the preceding character, so 'project*' can be translated as "one or more occurrences of t." Yes, that's right. With regex, you need to construct your match patterns one character at a time.
While the asterisk matches one or more occurrences of the preceding character, the question mark actually behaves much like the MS-DOS question mark wildcard. Let's say we wanted to match project10 through project19:
'project14' -match 'project1?'
A metacharacter in regex is a character (or character combination) that's processed by the regex engine in a non-literal way.
Let's check out another metacharacter:
'8675309' -match '\d'
The \d metacharacter is called a character class, and it matches one or more instances of (you guessed it) the preceding character in the string. You can use quantifiers to match specific occurrences. Take a look:
'8675309' -match '\d{7}'
The $matches variable should show you the entire subject string (8675309) instead of only the number 8, because the {7} denotes seven repetitions of the digit match. The following table shows other examples of using the \d character class with the { } quantifier.
Example |
Interpretation |
'\d{1,3}' |
Match between one and three times |
'\d{5,}' |
Match five or more times |
Regex has many character classes, but I can't explain them all here. Instead, the following table gives you a "punchlist" of my favorites.
Character Class |
Action |
\w |
Matches entire words |
\b |
Matches word boundaries |
\s |
Matches whitespace |
One more regex concept before we do some "real world" examples: Put match ranges in square brackets ([ ]). The following expression should evaluate to True (be sure to inspect $matches as well):
'admin@company.com' -match '[a-z]+'
The match should have been 'admin' in this case. Yes, I sneaked in another metacharacter; in regex syntax, the plus (+) quantifier matches one or more instances of the preceding character. This is unlike the asterisk, which you'll recall matches zero or more instances of the preceding character. The range construct is awesome in regex, because your subject string might have variable length.
Using the -match Operator in the Real World
Let's say we need to parse a list of universal naming convention (UNC) paths in a text file named C:\input\servers.txt:
\\dc1\logs \\mem2\documents \\23ressvr\share1 \\sharepoint.company.pri\doclibe \\server234\dfs1 \\server532\dfs2 \\server99\dfs5
We need to find out (a) whether server532 exists in the file; and, if so, (b) the name(s) of any shared folder(s) hosted by that server. How can we do this? Well, the first thing we need to do is grab all the servers.txt content and import the data into our PowerShell run space:
Get-Content -Path 'C:\input\servers.txt'
That's not enough, though. We need to filter that file content by using the Where-Object cmdlet, the -match operator, and a regex expression:
Get-Content -Path 'C:\input\servers.txt' | Where-Object { $_ -match '\\\\server532' }
You probably know that the $_ token is shorthand notation for the current object in the PowerShell pipeline. But doubtless you're wondering what \\\\ means. Get ready for regular expression lesson three: We need to escape certain characters to suppress the .NET regex engine from processing them as non-literals.
The UNC example is particularly confusing because the backslash (\) is the escape character, and we need to escape the two literal backslashes that precede any UNC path.
Let's try another example. This time, we want to match \\sharepoint.company.pri from servers.txt:
Get-Content -Path 'C:\input\servers.txt' | Where-Object { $_ -match '\\\\\w+\.\w+\.\w+' }
Whoa, Nelly! Now we're truly getting into the thick of things. Notice that I used the shorthand \w+ construction to match one or more occurrences of a word character. Because the period/dot (.) isn't a word character, I escape the two periods in the hostname sharepoint.company.pri. Cool, eh?
Introducing Select-String
For jobs when you need to dip into one or more files, find matches, and potentially make replacements, Select-String is what you need. Consider the following sample file named C:\input\customers.csv:
FirstName,LastName,SSN,Birthdate Carey,Landry,123-45-6789,5/22/1981 Kayla,Duquette,344-55-5677,4/2/1970 Mike,Connor,543-21-9876,11/29/1955 Wendy,Robbins,987-32-4244,10/4/1968
First of all, the names and metadata in this example are entirely fictional. Second, notice that we have a comma and no intervening space separating each column entry (this file contains comma-separated values, after all).
Now imagine that instead of four records this database file has several thousand records. We're tasked with identifying every U.S. Social Security number (SSN) in the file. As you may know, the SSN has the following general format:
111-22-3333
We're keeping things extra-simple here; in the real world, you'll want to employ a regex expression that matches only valid SSNs. For instance, real SSNs don't start with 000 or 666.
I habitually use the Select-String -AllMatches switch parameter to gather all matches instead of only one match per line:
Select-String -Path 'C:\input\customers.csv' -Pattern '\d{3}\-\d{2}\-\d{4}' -AllMatches input\customers.csv:2:Carey,Landry,123-45-6789,5/22/1981 input\customers.csv:3:Kayla,Duquette,344-55-5677,4/2/1970 input\customers.csv:4:Mike,Connor,543-21-9876,11/29/1955 input\customers.csv:5:Wendy,Robbins,987-32-4244,10/4/1968
Notice that the result set gives us the line number where each match took place. Let's finish up by redacting each exposed Social Security number with the string pattern XXX-XX-XXXX:
Select-String -Path 'C:\input\customers.csv' -Pattern '\d{3}\-\d{2}\-\d{4}'
-AllMatches | ForEach { $_ -replace '\d{3}\-\d{2}\-\d{4}', 'XXX-XX-XXXX' } C:\input\customers.csv:2:Carey,Landry,XXX-XX-XXXX,5/22/1981 C:\input\customers.csv:3:Kayla,Duquette,XXX-XX-XXXX,4/2/1970 C:\input\customers.csv:4:Mike,Connor,XXX-XX-XXXX,11/29/1955 C:\input\customers.csv:5:Wendy,Robbins,XXX-XX-XXXX,10/4/1968
I used the ForEach construct to loop through the dataset and the -replace operator to replace the SSN matches with our redaction string. The results look good, but if you open the source file, you won't see the letter X everywhere. What's up?
Well, Select-String writes MatchInfo objects to the pipeline. In order to replace the source string data, we need to operate on that source string data.
My proposed solution is to use Get-Content to "vacuum" the customers.csv text into our run space, perform the match/replace, and then export the final result set to a new file. Try the following:
$file = 'C:\input\customers.csv' $content = Get-Content $file $content | ForEach-Object {$_ -Replace '\d{3}\-\d{2}\-\d{4}', 'XXX-XX-XXXX' } | Set-Content $file
Here's what each line does:
- Store the .csv file path string as the variable $file.
- Create a second variable named $content that actually contains the .csv file contents.
- Perform the replacement using our regex expression, and write the result set back to the file with Set-Content. Done and done!
Get-Content -Path 'C:\input\customers.csv' FirstName,LastName,SSN,Birthdate Carey,Landry,XXX-XX-XXXX,5/22/1981 Kayla,Duquette,XXX-XX-XXXX,4/2/1970 Mike,Connor,XXX-XX-XXXX,11/29/1955 Wendy,Robbins,XXX-XX-XXXX,10/4/1968
Next Steps
Clearly, using regular expressions is an enormous subject. I'm going to leave you with a laundry list of useful resources. Enjoy!
Some online regex testers that I like:
Some neat online regex tutorials:
Some Windows regex applications that can make learning and using regex easier: