Helpful regular expressions – Product Support

General regular expressions

ReplaceStringRegex: Replace with matching group can be used

If your regular expression contains groups, specified by parenthesis (), you can use the matched value inside the group in the replace function. You can reference a matching group with \n with n as the number of the group.

For example:

Input string: foo 1234|qwerASDF 23bar

Regular expression: ([0-9]+)\|([a-zA-Z])

Will match: foo 1234|qwerASDF 23bar

Replace with: \1\2

Result: foo 1234qwerASDF 23bar

Explanation: The ReplaceStringRegex function will replace the complete match of the regex, e.g. 1234|q, with the matched value of the first group ([0-9]+), e.g. 1234, and the matched value of the second group ([a-zA-Z]), e.g. q.

This is how the example would look like in migration-center:

Matching control characters

If you would like to match control characters, such as carriage return (CR) or line feed (LF), you should use the predefined character class cntrl.

See more here: https://www.regular-expressions.info/posixbrackets.html

For example:

The regex [[:cntrl:]] will match any control character.

Get last value of a repeating attribute

With two processing steps you can get the last value of a repeating attribute, no matter how many values are stored in the attribute.

Use the RepeatingToSingleValue function with a separator that is not in the values list (the pipe character ("|") usually works), in order to transform all repeating attribute values into a single value.
Then use the SubstringRegex function with the regex [^\|]+$ in order to fetch only the last value.

Match invalid XML 1.0 characters

The following regular expression will match all two-byte Unicode characters that are not valid in XML 1.0:

[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]

Note: The codepoint range x10000 - x10FFFF are also valid characters in XML 1.0 but unfortunately these are also matched by the regex above and thus removed when used in the ReplaceStringRegex function.

Extract the filename from a path

Use SubstringRegex with \\[^\\]+$ regular expression.

Extract the file extension from a path

Use SubstringRegex with \.[^\.]+$ regular expression.

SharePoint specific regular expressions

Find invalid characters in a path or file name

SP 2013: Use ReplaceStringRegex with "|\*|:|<|>|\?|\\|/|\||~|#|%|&|\{|} regular expression and replace with empty string "" or any other valid character.

SP 2016: Use ReplaceStringRegex with "|\*|:|<|>|\?|\\|/|\||#|% regular expression and replace with empty string "" or any other valid character.

SP Online: Use ReplaceStringRegex with "|\*|:|<|>|\?|\\|/|\| regular expression and replace with empty string "" or any other valid character.

Remove leading and trailing [SPACE] characters in path elements

Use ReplaceStringRegex with ( )*/( )* regular expression and replace with empty string "".

Replace consecutive chars with only one char

If you would like to replace consecutive chars, e.g. several consecutive periods in a file name, with only one char, e.g. only one period, you can use the following regular expression:

(<enter matching char here>)\1+

For example, if you would like to replace all consecutive periods in a file name with only one period, your transformation rule should look like:

I used [\.] as matching character in the example to match the period character (note that . is a special character in regular expressions and thus needs to be escaped by \).

An input value of aaa...bbb...ccc.pdf will be converted to aaa.bbb.ccc.pdf by the above replace function.

Comments

2 comments

Prafull KhosePatil

September 18, 2019 09:00
Hi Cosmin,

How is tab character handled? Any regex specific to tab character?
0
Vasile Gavrila

October 18, 2019 13:21
Hi Prafull,

For tab and any whitespace charater you should use: '[[:space:]]'. More details here: https://www.regular-expressions.info/posixbrackets.html

Thanks!

Vasile.
0

Please sign in to leave a comment.