Post#19 » Tue Mar 18, 2025 10:16 pm
I performed a disassembly comparison test to more understand.
The posted version improves upon the first by using the full 64-bit timestamp counter for more accurate and robust timing measurements, especially over longer intervals. The use of shrd to shift the 64-bit difference and the adjusted right shift after multiplication suggest a refined approach to converting cycles to a time unit, making it a more precise implementation of the timing function.
Key Observations
Timestamp Precision:
The first version uses only the lower 32 bits of the timestamp counter, which limits its range and can lead to overflow in long intervals (approximately 4 billion cycles, or a few seconds on modern CPUs).
The second version uses the full 64-bit timestamp, providing a much larger range and greater accuracy, especially for longer measurements.
Timestamp Difference Calculation:
The first version computes a 32-bit difference, while the second version computes a 64-bit difference using subtraction with borrow (sbb), ensuring correctness even if the lower 32 bits wrap around.
Scaling Adjustments:
The second version shifts the 64-bit timestamp difference right by 2 bits (shrd eax, edx, 2) before multiplication, possibly to adjust the scale or account for a specific timing factor.
After multiplication with 10624DD3h (a constant often used to convert cycles to microseconds on certain CPUs), the first version shifts by 6 bits, while the second shifts by 4 bits, reflecting different scaling due to the initial 2-bit shift.
Initialization and Padding:
The first version explicitly zeros out memory locations before storing timestamps, which is unnecessary since they are overwritten.
The second version skips this step and adds nop instructions, likely for alignment or to match the instruction count of the first version.