Regex Split

  • Thread starter Thread starter etl2016
  • Start date Start date
E

etl2016

Guest
hi,

I am trying to implement a solution to be able to split a line into array of strings, considering two criteria. Firstly- there are certain columns that are text-qualified with multi-character boundaries. Secondly, a multi-character delimiter. The situation may get complex when there are common characters in the two features. To add, metacharacters such as $ and ^ may add more challenges. It seems that, Regex is most suited for such purposes. One of the implementations as below is working for most cases, but, is breaking for metacharacters being opted in the text-qualifier and/or delimiters


using System.Text.RegularExpressions;

public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
string _Statement = String.Format
("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
Regex.Escape(delimiter), Regex.Escape(qualifier));

RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;

Regex _Expression = New Regex(_Statement, _Options);
return _Expression.Split(expression);
}

The above works for majority of the scenarios, but, doesn't for such situations where metacharacters like $ are involved (especially as part of text-qualifier. Looks like particular interpretation of escaping is needed)


string input = "*|This is an .. example*|..Am2..Cool!";
string input2 = "*|This is an $ example*|$Am2$Cool!";
string input3 = "$|This is an $ example$|$Am2$Cool!";
string input4 = "|$This is an $ example|$$Am2$Cool!";

foreach (string _Part in Split(input, "..", "*|", true))
Console.WriteLine(_Part);

foreach (string _Part in Split(input2, "$", "*|", true))
Console.WriteLine(_Part);

foreach (string _Part in Split(input3, "$", "$|", true)) // doesn't work correctly
Console.WriteLine(_Part);

foreach (string _Part in Split(input4, "$", "|$", true)) // doesn't work correctly
Console.WriteLine(_Part);

Could you please let me know how do we handle all situations, including the ones that involve metacharacters as part of text-qualifier and/or delimiters?

thank you

Continue reading...
 

Similar threads

Back
Top