General regular expressions
ReplaceStringRegex: Replace with matching group can be used
If your regular expression contains groups, specified by parenthesis (), you can use the matched value inside the group in the replace function. You can reference a matching group with \n with n as the number of the group.
For example:
Input string: foo 1234|qwerASDF 23bar
Regular expression: ([0-9]+)\|([a-zA-Z])
Will match: foo 1234|qwerASDF 23bar
Replace with: \1\2
Result: foo 1234qwerASDF 23bar
Explanation: The ReplaceStringRegex function will replace the complete match of the regex, e.g. 1234|q, with the matched value of the first group ([0-9]+), e.g. 1234, and the matched value of the second group ([a-zA-Z]), e.g. q.
This is how the example would look like in migration-center:
Matching control characters
If you would like to match control characters, such as carriage return (CR) or line feed (LF), you should use the predefined character class cntrl.
For example:
The regex [[:cntrl:]] will match any control character.
Get last value of a repeating attribute
With two processing steps you can get the last value of a repeating attribute, no matter how many values are stored in the attribute.
- Use the RepeatingToSingleValue function with a separator that is not in the values list (the pipe character ("|") usually works), in order to transform all repeating attribute values into a single value.
- Then use the SubstringRegex function with the regex [^\|]+$ in order to fetch only the last value.
Match invalid XML 1.0 characters
The following regular expression will match all two-byte Unicode characters that are not valid in XML 1.0:
[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]
Note: The codepoint range x10000 - x10FFFF are also valid characters in XML 1.0 but unfortunately these are also matched by the regex above and thus removed when used in the ReplaceStringRegex function.
Extract the filename from a path
Use SubstringRegex with \\[^\\]+$ regular expression.
Extract the file extension from a path
Use SubstringRegex with \.[^\.]+$ regular expression.
SharePoint specific regular expressions
Find invalid characters in a path or file name
SP 2013: Use ReplaceStringRegex with "|\*|:|<|>|\?|\\|/|\||~|#|%|&|\{|} regular expression and replace with empty string "" or any other valid character.
SP 2016: Use ReplaceStringRegex with "|\*|:|<|>|\?|\\|/|\||#|% regular expression and replace with empty string "" or any other valid character.
SP Online: Use ReplaceStringRegex with "|\*|:|<|>|\?|\\|/|\| regular expression and replace with empty string "" or any other valid character.
Remove leading and trailing [SPACE] characters in path elements
Use ReplaceStringRegex with ( )*/( )* regular expression and replace with empty string "".
Replace consecutive chars with only one char
If you would like to replace consecutive chars, e.g. several consecutive periods in a file name, with only one char, e.g. only one period, you can use the following regular expression:
(<enter matching char here>)\1+
For example, if you would like to replace all consecutive periods in a file name with only one period, your transformation rule should look like:
I used [\.] as matching character in the example to match the period character (note that . is a special character in regular expressions and thus needs to be escaped by \).
An input value of aaa...bbb...ccc.pdf will be converted to aaa.bbb.ccc.pdf by the above replace function.
Comments
2 comments
Hi Cosmin,
How is tab character handled? Any regex specific to tab character?
Hi Prafull,
For tab and any whitespace charater you should use: '[[:space:]]'. More details here: https://www.regular-expressions.info/posixbrackets.html
Thanks!
Vasile.
Please sign in to leave a comment.