EDN Admin
Well-known member
I have a trivial 64-bit do-nothing assembly language procedure that runs 10x slower than an equivalent procedure in C++. Can anyone point out why? I am building a high-performance 128-bit integer math library and need to use the 128-bit division
instruction that does not have a compiler intrinsic available for it, unlike the 128-bit multiply. Alternately, if someone can point me to the hidden 128-bit division intrinsic, that would be great too!
My program code to do this is simple. I have two do-nothing procedures, one written in assembly (ML64) and one in C++. I call each 5,000,000 times, using the Stopwatch to time them. As expected, the code stream executed for the C++ call
is considerably larger than for the assembly call - 8 instructions vs 1. What is strange is that the C++ execution time is shorter. Ive already tried increasing the test reps and the inconsistency holds. All timings were taken in debug builds,
as release simply optimizes out the C++ calls.
The source for the timing program and the two routines follows.
<pre style="font-family:Consolas; background-color:white <span style="color:blue extern <span style="color:#a31515 "C" <span style="color:blue void <span style="color:blue __fastcall DoNothing();
<span style="color:blue void CppDoNothing( <span style="color:blue void );
<span style="color:blue int main(<span style="color:blue array<System::String ^> ^args)
{
System:iagnostics::Stopwatch^ timer = <span style="color:blue gcnew System:iagnostics::Stopwatch();
<span style="color:blue int currTestNo;
<span style="color:blue int cpDoNothingElapsedTime;
<span style="color:blue int assemblyDoNothingElapsedTime;
<span style="color:blue int relativeSpeed;
<span style="color:blue int noTests = 10000000;
<span style="color:green // C++ DoNothing
timer->Restart();
<span style="color:blue for (currTestNo = 0; currTestNo < noTests; currTestNo ++)
CppDoNothing();
timer->Stop();
cpDoNothingElapsedTime = timer->ElapsedMilliseconds;
<span style="color:green // Assembly DoNothing
timer->Restart();
<span style="color:blue for (currTestNo = 0; currTestNo < noTests; currTestNo ++)
DoNothing();
timer->Stop();
assemblyDoNothingElapsedTime = timer->ElapsedMilliseconds;
<span style="color:blue if (referenceElapsedTime > 0)
{
relativeSpeed = assemblyDoNothingElapsedTime / cpDoNothingElapsedTime;
Console::WriteLine(String::Format(L<span style="color:#a31515 "Speed vs C++ is {0}x, or {1} ms vs {2} ms", relativeSpeed, assemblyDoNothingElapsedTime, cpDoNothingElapsedTime));
}
<span style="color:blue else
{
Console::WriteLine(String::Format(L<span style="color:#a31515 "Speed is {0} ms C++ DoNothing, {1} assembly DoNothing ", cpDoNothingElapsedTime, assemblyDoNothingElapsedTime));
}
<span style="color:blue return 0;
}
[/code]
<pre style="font-family:Consolas; background-color:white <span style="color:blue void CppDoNothing( <span style="color:blue void )
{
<span style="color:blue return;
} <pre style="font-family:Consolas; background-color:white .CODE
DoNothing PROC EXPORT
xor eax, eax
ret
DoNothing ENDP
END[/code] [/code]
View the full article
instruction that does not have a compiler intrinsic available for it, unlike the 128-bit multiply. Alternately, if someone can point me to the hidden 128-bit division intrinsic, that would be great too!
My program code to do this is simple. I have two do-nothing procedures, one written in assembly (ML64) and one in C++. I call each 5,000,000 times, using the Stopwatch to time them. As expected, the code stream executed for the C++ call
is considerably larger than for the assembly call - 8 instructions vs 1. What is strange is that the C++ execution time is shorter. Ive already tried increasing the test reps and the inconsistency holds. All timings were taken in debug builds,
as release simply optimizes out the C++ calls.
The source for the timing program and the two routines follows.
<pre style="font-family:Consolas; background-color:white <span style="color:blue extern <span style="color:#a31515 "C" <span style="color:blue void <span style="color:blue __fastcall DoNothing();
<span style="color:blue void CppDoNothing( <span style="color:blue void );
<span style="color:blue int main(<span style="color:blue array<System::String ^> ^args)
{
System:iagnostics::Stopwatch^ timer = <span style="color:blue gcnew System:iagnostics::Stopwatch();
<span style="color:blue int currTestNo;
<span style="color:blue int cpDoNothingElapsedTime;
<span style="color:blue int assemblyDoNothingElapsedTime;
<span style="color:blue int relativeSpeed;
<span style="color:blue int noTests = 10000000;
<span style="color:green // C++ DoNothing
timer->Restart();
<span style="color:blue for (currTestNo = 0; currTestNo < noTests; currTestNo ++)
CppDoNothing();
timer->Stop();
cpDoNothingElapsedTime = timer->ElapsedMilliseconds;
<span style="color:green // Assembly DoNothing
timer->Restart();
<span style="color:blue for (currTestNo = 0; currTestNo < noTests; currTestNo ++)
DoNothing();
timer->Stop();
assemblyDoNothingElapsedTime = timer->ElapsedMilliseconds;
<span style="color:blue if (referenceElapsedTime > 0)
{
relativeSpeed = assemblyDoNothingElapsedTime / cpDoNothingElapsedTime;
Console::WriteLine(String::Format(L<span style="color:#a31515 "Speed vs C++ is {0}x, or {1} ms vs {2} ms", relativeSpeed, assemblyDoNothingElapsedTime, cpDoNothingElapsedTime));
}
<span style="color:blue else
{
Console::WriteLine(String::Format(L<span style="color:#a31515 "Speed is {0} ms C++ DoNothing, {1} assembly DoNothing ", cpDoNothingElapsedTime, assemblyDoNothingElapsedTime));
}
<span style="color:blue return 0;
}
[/code]
<pre style="font-family:Consolas; background-color:white <span style="color:blue void CppDoNothing( <span style="color:blue void )
{
<span style="color:blue return;
} <pre style="font-family:Consolas; background-color:white .CODE
DoNothing PROC EXPORT
xor eax, eax
ret
DoNothing ENDP
END[/code] [/code]
View the full article