Skip to content

Do we still need all the time stamping modes? #1829

@guyharris

Description

@guyharris

While looking at #1407 and the fix for it in pull request nmap/npcap/#19, and working on changes to make the time stamp type per-instance rather than global (as per my comments in the pull request), I was looking at packetWin7/npf/npf/time_calls.h, and wondered whether all the code there is still needed, given that Npcap either doesn't support anything prior to Vista or doesn't support Vista (the home page says both "Npcap works on Windows 7 and later by making use of the new NDIS 6 Light-Weight Filter (LWF) API." and "Npcap 0.9984 installer for Windows Vista/2008, 7/2008R2, 8/2012, 8.1/2012R2, 10/2016 (x86 and x64).").

The Windows System Information page "Acquiring high-resolution time stamps" has a bunch of information on time stamps; it's oriented towards userland, but some applies to kernel-mode code as well.

It says, for the following platforms ("QPC" means "QueryPerformanceCounter()"; presumably what it says also applies to KeQueryPerformanceCounter() in kernel mode):

"Windows Vista and Windows Server 2008

All computers that shipped with Windows Vista and Windows Server 2008 used a platform counter (High Precision Event Timer (HPET)) or the ACPI Power Management Timer (PM timer) as the basis for QPC. Such platform timers have higher access latency than the TSC and are shared between multiple processors. This limits scalability of QPC if it is called concurrently from multiple processors."

So it sounds as if, on Vista/WS2008, the performance counter is in sync on all CPUs, but that may impose a latency slowing down fetches, and may further slow down if multiple processors are accessing the QPC.

"Windows 7 and Windows Server 2008 R2

The majority of Windows 7 and Windows Server 2008 R2 computers have processors with constant-rate TSCs and use these counters as the basis for QPC. TSCs are high-resolution per-processor hardware counters that can be accessed with very low latency and overhead (in the order of 10s or 100s of machine cycles, depending on the processor type). Windows 7 and Windows Server 2008 R2 use TSCs as the basis of QPC on single-clock domain systems where the operating system (or the hypervisor) is able to tightly synchronize the individual TSCs across all processors during system initialization. On such systems, the cost of reading the performance counter is significantly lower compared to systems that use a platform counter.

Furthermore, there is no added overhead for concurrent calls and user-mode queries often bypass system calls, which further reduces overhead. On systems where the TSC is not suitable for timekeeping, Windows automatically selects a platform counter (either the HPET timer or the ACPI PM timer) as the basis for QPC."

So it sounds as if, on W7/WS2008R2, the performance counter is in sync on all CPUs, and on newer systems, the performance hit for using it may be reduced.

"Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2

Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2 use TSCs as the basis for the performance counter. The TSC synchronization algorithm was significantly improved to better accommodate large systems with many processors. In addition, support for the new precise time-of-day API was added, which enables acquiring precise wall clock time stamps from the operating system. For more info, see GetSystemTimePreciseAsFileTime. ..."

So it sounds as if, on Windows 8/WS2012 and later, all machines use TSCs, so the performance hit on constant-rate TSC-less machines goes away as those machines aren't supported, and the multiprocessor overhead may be further reduced. In addition, there's a kernel equivalent to GetSystemTimePreciseAsFileTime(), namely KeQuerySystemTimePrecise().

All the extra work here may date back to Windows NT 3.x, Windows NT 4.0, and Windows 2000/Windows XP/Windows Server 2003, where they had to deal with machines that didn't have high-resolution counters that would deliver results synchronized across processors, so they offered timers based on:

  • raw KeQueryPerformanceCounter() - high precision, but doesn't
    necessarily give consistent results on an MP machine;

  • KeQuerySystemTime() - low precision, but gives consistent results on an MP machine (as it fetches a shared clock value updated by clock interrupts);

  • KeQueryPerformanceCounter() with separate per-CPU time bases - high precision, may give less inconsistent results on an MP machine if different per-CPU performance counters have different time bases; may give out-of-order time stamps;

  • KeQueryPerformanceCounter() with separate per-CPU time bases and a check to make sure time doesn't go backwards from the last result - high precision, may give less inconsistent results on an MP machine if different per-CPU performance counters have different time bases; won't give out-of-order time stamps;

I couldn't find any mail on any WinPcap mailing lists about this (maybe I'll ask Loris Degioanni to see if he remembers any of that), but Loris sent a message to winpcap-users and another message to winpcap-bugs (scroll down to Loris Degioanni's mail from 2005-04-27) talking about time stamping.

At least in Windows 7 and later, it may be possible to just use KeQueryPerformanceCounter(), without the extra per-CPU stuff (TIMESTAMPMODE_SINGLE_SYNCHRONIZATION), to get high-precision time stamps that, on most machines, are synchronized across CPUs and are reasonably quick to fetch. On Vista/WS2008, it appears that it'll give time stamps synchronized across CPUs but they might not be quick to fetch (and less so the more CPUs/cores you have).

TIMESTAMPMODE_SYNCHRONIZATION_ON_CPU_WITH_FIXUP and TIMESTAMPMODE_SYNCHRONIZATION_ON_CPU_NO_FIXUP, neither of which appear to have been documented, may have been experiments to try to synchronize KeQueryPerformanceCounter() across CPUs, but may have been abandoned.

I'm not sure TIMESTAMPMODE_RDTSC is useful any more. It was only supported on 32-bit x86, it didn't deal with cross-CPU synchronization, and it may not have dealt with variable CPU speed machines (hence the reference to "SpeedStep machines"); KeQueryPerformanceCounter() may have at least worked around the latter, and, on Windows 7, is apparently highly likely to be running on machines where the time stamp counters acn be synchronized across machines without too much pain (and falls back on a slower-to-access per-system timer for machines where that can't be done). Perhaps its only advantage was speed, because it just went straight to the (current) CPU's time stamp counter.

So perhaps we can leave just TIMESTAMPMODE_SINGLE_SYNCHRONIZATION and TIMESTAMPMODE_QUERYSYSTEMTIME. If there's a backwards compatibility concern, we could also leave TIMESTAMPMODE_RDTSC for 32-bit x86.

So what you'd get for time stamps with those three modes is:

  • TIMESTAMPMODE_SINGLE_SYNCHRONIZATION: high-precision, guaranteed to be monotonic (never going backwards) and advancing at a constant rate(?), not guaranteed to be synchronized with the system clock, may be slow to fetch on Windows Vista and on some machines with Windows 7, guaranteed(?) to be consistent between CPUs;

  • TIMESTAMPMODE_QUERYSYSTEMTIME: low-precision, not guaranteed to be monotonic (the system clock can be turned backwards), not guaranteed to advance at a constant rate (if time adjustments to sync with UTC are done either by slowing down or speeding up the clock), guaranteed to by synchronized with the system clock, not sure how fast it'd be to fetch relative to the others, guaranteed to be consistent between CPUs;

  • TIMESTAMPMODE_RDTSC: high-precision, guaranteed to be monotonic (never going backwards) and advancing at a constant rate(?), not guaranteed to be synchronized with the system clock, probably pretty fast to fetch, not guaranteed to be consistent between CPUs.

A fourth mode could be added for #1407, and we could work on allowing per-instance modes rather than a single global mode set from the Registry (I'm in the middle of that; most of this text was extracted from a huge comment I was putting into npf/time_calls.h), adding an ioctl to set the per-instance mode, adding a packet.dll routine to do that ioctl, and using that routine in packet-npc.c to support pcap_set_tstamp_type().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions