How to trim all whitespace from a string?

  • Thread starter Thread starter RickyM777
  • Start date Start date
R

RickyM777

Guest
I'm using HTMLAgilityPack along with the CSS selector extension in order to scrape a website and get specific data. Each page that I scrape contains a H3 html tag that contains the title of the page. When I scrape the website code, I get alot of whitespace. I managed to get rid of all the white space from the string, except for the beginning which looks like a single space charactor. I tried trim and trimfont and trimback but they do not remove the leading charactor which appears to be a space.

The website is using XHTML doctype and says the following in the html source I thought to mention that here in case it has something to do with an invisible charactor of the us-ascii char set or something. Below is the html code from the site I'm scraping as well as my c# function to attempt to stripe the whitespace and any other characters from the string.

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">


public string FixAndReplaceStringsInString(string dataString)
{
//get the page main title
string fixedString1 = dataString.Replace("Lead Time: 1~3 Days", "").Replace("Lead Time:2~5Days", "").Replace("\n", "").Replace("\r\n", "").Replace("\n", "").Replace("\r", "").Replace(Environment.NewLine, "").Replace("\t", "");
//remove any html from the title
string noHTML = Regex.Replace(fixedString1, @"<[^>]+>|&nbsp;", "").Trim();
//remove string 1
string fixedString2 = noHTML.Replace("Lead Time: 2~5 Days", "");
//remove string 2
string fixedString3 = fixedString2.Replace("Same Day Shipping on orders before 6PM (GMT+8)", "");
//remove quotes
string fixedString4 = fixedString3.Replace("\"", "");
//remove slash
string fixedString5 = fixedString4.Replace("\'", "").Trim();
//remove $ signs
string fixedString6 = fixedString5.Replace("$", "").Trim().Replace("\n", "").Replace("\r\n", "").Replace("\n", "").Replace("\r", "").Replace(Environment.NewLine, "").Replace("\t", "");

//remove all white spaces from a string
string whitespaceStripped = new string((from char c in fixedString6
where !char.IsWhiteSpace(c)
select c).ToArray());

return whitespaceStripped;
}




<meta content="text/html; charset=us-ascii" http-equiv="Content-Type" />

Continue reading...
 
Back
Top