<p class=MsoNormal>Hello,
<p class=MsoNormal>
<p class=MsoNormal>(This is my first post on MSDN, so please feel free to move this post to a more appropriate place).
<p class=MsoNormal>
<p class=MsoNormal>Working on parallelizing my application I have come across with the issue that STL streams cannot be effectively used concurrently. This conclusion sounds too strong for me, so I decided to ask for a help and (hopefully) unveiling this assumption.
<p class=MsoNormal>
<p class=MsoNormal>Here are the details. I have dozens of thousands of container objects where each contains a char* buffer, and I want to parse them concurrently using std::istream and operator >>. This is simply to avoid reinventing a wheel (a string parser).
<p class=MsoNormal>Here is a code snippet:
<p class=MsoNormal>
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:blue class <span style="font-size:10pt;font-family:Courier;color:black DataContainer
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black {
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black public:
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black ...
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:green /*! Reads integers from a buffer populated in the constructor.
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:green <span> */
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue void <span style="font-size:10pt;font-family:Courier;color:black Parse() <span style="font-size:10pt;font-family:Courier;color:blue const
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> {
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:green //represent myBuf as STL stream
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> std::strstreambuf aDataBuf (myBuf, mySize);
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> std::istream aDataStream (&aDataBuf);
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:green <span> //aDataStream.imbue (std::locale ("C"));
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black n = 0;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> aDataStream >> n;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue for <span style="font-size:10pt;font-family:Courier;color:black ( <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black i = 0; i < n; i++) {
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black k;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> assert (aDataStream.good());
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> aDataStream >> k;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> }
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> }
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black private:
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue char <span style="font-size:10pt;font-family:Courier;color:black *<span> myBuf;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black <span> mySize;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black };
<p class=MsoNormal>
<p class=MsoNormal>
<p class=MsoNormal>There is an array of 20,000+ of such DataContainers and a loop where Parse() method is called on each. The code running concurrently executes about 2x-5x slower than one running sequentially !!! Analyzing the hotspots (using Intel Parallel Amplifier) I have found the root-cause as follows:
<p class=MsoNormal>
<p class=MsoNormal>It is connected with critical section used to protect a common locale object. operator >>() inside creates a basic_istream::sentry object on the stack. Its constructor calls (through another method) ios_base::locale() which returns a std::locale object (see syntax below). So its copy constructor is called which calls Incref() to increment a reference counter. Incrementing reference counter is surrounded by a critical section.
<p class=MsoNormal>As all streams have a pointer to a shared locale object then there is a high contention.
<p class=MsoNormal>
<p class=MsoNormal style="margin:0cm 0.75pt 0.0001pt <span style="font-family:Courier New;color:#000066 locale ios_base::getloc( ) const;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier New locale __CLR_OR_THIS_CALL getloc() <span style="color:blue const
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier New { <span style="color:green // get locale
<p class=MsoNormal style="text-indent:36pt <span style="font-size:10pt;font-family:Courier New;color:blue return <span style="font-size:10pt;font-family:Courier New (*_Ploc);
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier New }
<p class=MsoNormal>
<p class=MsoNormal>Trying to set individual locale objects into each stream (if to remove comments on the line with imbue() above)does not help much though there is some minor performance improvement. Stepping with the debugger into STL code I see that strstreambuf and stream constructors still call Incref() for a global locale object, thereby still causing a high contention.
<p class=MsoNormal>
<p class=MsoNormal>Am I doing something wrong or is it a principal limitation of STL that the streams cannot be used concurrently ?
<p class=MsoNormal>
<p class=MsoNormal>Thank you very much in advance.
<p class=MsoNormal>Roman
View the full article
<p class=MsoNormal>
<p class=MsoNormal>(This is my first post on MSDN, so please feel free to move this post to a more appropriate place).
<p class=MsoNormal>
<p class=MsoNormal>Working on parallelizing my application I have come across with the issue that STL streams cannot be effectively used concurrently. This conclusion sounds too strong for me, so I decided to ask for a help and (hopefully) unveiling this assumption.
<p class=MsoNormal>
<p class=MsoNormal>Here are the details. I have dozens of thousands of container objects where each contains a char* buffer, and I want to parse them concurrently using std::istream and operator >>. This is simply to avoid reinventing a wheel (a string parser).
<p class=MsoNormal>Here is a code snippet:
<p class=MsoNormal>
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:blue class <span style="font-size:10pt;font-family:Courier;color:black DataContainer
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black {
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black public:
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black ...
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:green /*! Reads integers from a buffer populated in the constructor.
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:green <span> */
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue void <span style="font-size:10pt;font-family:Courier;color:black Parse() <span style="font-size:10pt;font-family:Courier;color:blue const
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> {
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:green //represent myBuf as STL stream
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> std::strstreambuf aDataBuf (myBuf, mySize);
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> std::istream aDataStream (&aDataBuf);
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:green <span> //aDataStream.imbue (std::locale ("C"));
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black n = 0;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> aDataStream >> n;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue for <span style="font-size:10pt;font-family:Courier;color:black ( <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black i = 0; i < n; i++) {
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black k;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> assert (aDataStream.good());
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> aDataStream >> k;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> }
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> }
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black private:
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue char <span style="font-size:10pt;font-family:Courier;color:black *<span> myBuf;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black <span> <span style="font-size:10pt;font-family:Courier;color:blue int <span style="font-size:10pt;font-family:Courier;color:black <span> mySize;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier;color:black };
<p class=MsoNormal>
<p class=MsoNormal>
<p class=MsoNormal>There is an array of 20,000+ of such DataContainers and a loop where Parse() method is called on each. The code running concurrently executes about 2x-5x slower than one running sequentially !!! Analyzing the hotspots (using Intel Parallel Amplifier) I have found the root-cause as follows:
<p class=MsoNormal>
<p class=MsoNormal>It is connected with critical section used to protect a common locale object. operator >>() inside creates a basic_istream::sentry object on the stack. Its constructor calls (through another method) ios_base::locale() which returns a std::locale object (see syntax below). So its copy constructor is called which calls Incref() to increment a reference counter. Incrementing reference counter is surrounded by a critical section.
<p class=MsoNormal>As all streams have a pointer to a shared locale object then there is a high contention.
<p class=MsoNormal>
<p class=MsoNormal style="margin:0cm 0.75pt 0.0001pt <span style="font-family:Courier New;color:#000066 locale ios_base::getloc( ) const;
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier New locale __CLR_OR_THIS_CALL getloc() <span style="color:blue const
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier New { <span style="color:green // get locale
<p class=MsoNormal style="text-indent:36pt <span style="font-size:10pt;font-family:Courier New;color:blue return <span style="font-size:10pt;font-family:Courier New (*_Ploc);
<p class=MsoNormal><span style="font-size:10pt;font-family:Courier New }
<p class=MsoNormal>
<p class=MsoNormal>Trying to set individual locale objects into each stream (if to remove comments on the line with imbue() above)does not help much though there is some minor performance improvement. Stepping with the debugger into STL code I see that strstreambuf and stream constructors still call Incref() for a global locale object, thereby still causing a high contention.
<p class=MsoNormal>
<p class=MsoNormal>Am I doing something wrong or is it a principal limitation of STL that the streams cannot be used concurrently ?
<p class=MsoNormal>
<p class=MsoNormal>Thank you very much in advance.
<p class=MsoNormal>Roman
View the full article