Monday, July 6, 2009

Some problems with GPS and NTPD

I found some problems when trying to set up the ntpd server connected to a GPS.

First, I had to solve how to create the devices required by ntpd: /dev/gps0 and /dev/gpspps0. This is easy: just create a udev rules file, for example /etc/udev/rules.d/90-gps.rules:

SUBSYSTEM=="pps", MODE="0660" GROUP="uucp" SYMLINK="gps%k"
KERNEL=="ttyS0", SYMLINK="gps0"

the above creates the following links:
gps0 -> ttyS0
gpspps0 -> pps0

and the target devices are accessible to members of the uucp group. Therefore, the 'ntp' user must belong to the uucp group:

# usermod -G ntp,uucp ntp

Then ntpd experienced some serious jitter problems in the system clock, and I found out that the ntpd driver was not using the PPS signal, but only the NMEA output. Running ntpd in debug mode (-d flag) showed a 'permission denied' error when opening /dev/gpspps0. The file permissions are ok, but the problem was in the selinux layer. I inadvertently had the selinux enabled. Disabling it fixed the problem (set to 'disable' in /etc/selinux/config).

Also, the serial port had to be carefully set up before ntpd was started, otherwise, some interactions in the tty driver would be executed, and the DCD interrupt detection would be lost, as explained in the previous posting. The minimum settings of the serial port at start-up are:

# stty -F /dev/gps0 raw ispeed 4800 ospeed 4800 -hupcl

In particular, finding the selinux problem was quite confusing sometimes. If ntpd is started from the command-line as root, the "permission denied" error does not show up. However, if started using the 'service ntpd start' command, the error appears.

Wednesday, July 1, 2009

Synchronizing an NTP server to a GPS/PPS

The goal now is to have a Garmin GPS 18LVC driving a PPS (pulse-per-second) signal to the ntpd server for a highly accurate time reference server.

1. The GPS device

There are at least a couple of ways to propagate the PPS signal to the ntpd server, plus some variants in each case. However, the GPS device must be seen as a device that sources two different types of data: the absolute date and time, and the 1-Hz clock signal (PPS).

The first one provides the complete information about when it is now, including a complete time and date, but with poor accuracy because this information is sent over the data lines of the serial port and encoded using some type of protocol, i.e. NMEA. PPS provides a very accurate clock
(about 1 uS in the GPS 18LVC device) but without any reference to the absolute time. This clock is wired to the DCD (Data Carrier Detect) pin of the serial port. In other words, PPS tells us with good precision when each second begins, but it doesn't tell us which second it is. This timing
information must be combined with the protocol messages sent by the GPS to have both precision and a complete timestamp at the same time.

The Garmin 18LVC model speaks the NMEA protocol, which sends out every second the NMEA messages that have been previously selected with the PGRMO command. Only the GPRMC message is necessary to get the current date and time, and it is important to keep to amount of information sent out over the serial port every second to a minimum. Once the PGRMO command has selected the message types, the GPS will retain this configuration in its internal memory.

The output from the GPS can be seen using a terminal program, such as minicom, if properly configured at 4800 baud, no parity, 1 stop bit.

2. NTPD reference clocks


The ntpd server supports several types of drivers. Here, the term 'driver' has nothing to do with a 'kernel driver'. ntps drivers are low-level callback functions that are registered within the ntpd core and implement the access to several types of local clocks, such as GPSs , radio clocks, etc. Each driver is identified by a pseudo-IP address identifier. The identifiers involved here are:

127.127.20.x : NMEA Reference Clock driver
127.127.28.x : SHM (shared memory) driver

2.1 NMEA reference clock

The NMEA clock driver assumes that a GPS device sending out NMEA messages is connected to the system via a serial port, named /dev/gpsX and its PPS signal, wired through the DCD pin, is accessible from a /dev/gpsppsX device.

/dev/gpsX is actually a link to some /dev/ttySX serial device, and /dev/gpsppsX is a link to a /dev/ppsX device which, in turn, is provided by the kernel PPS API. This API collects and distributes a precision kernel clock information from/to userland programs, and supports some predefined client drivers, such as the DCD pin connected to a 8250 UART. The DCD pin is sensed using a new serial line discipline, named PPS, which is an extension of the TTY line discipline. The sensing takes place during interrupt time, so it provides a very precise timestamping of the DCD events.

This PPS API, also known as LinuxPPS, is not yet available in the kernel, so a patch must be applied to the kernel mainline tree.

Note the /dev/gpsppsX device is optional and if ntpd cannot open this device at startup, it will silently fall back to the NMEA-only functionality. This means that it will use the arrival time of the NMEA messages to discipline the system clock, which will result in a very poor precision, usually, worse than using a remote NTP server over the internet. Running ntpd in debug mode will, however, log this condition.

The /etc/ntp.conf must contain these lines:


server 127.127.20.0 mode 1 minpoll 4 prefer
fudge 127.127.20.0 flag3 1 flag2 0 time1 0.0

Here note that a single NTP device provides both the timestamp and the PPS timing. The meaning of these flags are as follows:

mode=1, means that only the GPRMC messages of the NMEA protocol will be analyzed. flag3=1 tells ntpd to use the PPS line discipline of the kernel, and flag2=0 tells the driver to use the rising edge of the DCD signal to signal the start of each second.

In order to activate the PPS line discipline on the serial port connected to the GPS, it is necessary to run the 'ldattach' utility, which actually will stay in the background to keep the serial port open and the discipline active:

# ldattach pps /dev/ttyS0


2.2 SHM reference clock

The SHM driver accepts delayed timing information from a System-V IPC shared memory (with key "NTPx"). The timing information is written there by some external process, whatever it is. This process would read the timing information from the GPS and write it to the shared memory so that ntpd can process it. There are some user-space utilities that can do this job, for example, gpsd and shmpps. Gpsd is a general-purpose daemon that has been designed to talk to most types of GPS models using a wide variety of protocols and, in addition, is capable of processing the PPS signals and sending timing information to ntpd via a shared memory device. Or, at least,
this is what it claims to do... because I couldn't achieve this.

2.2.1 gpsd

Actually gpds feeds two devices to ntpd, one with the absolute timestamp parsed from the NMEA messages, or any other protocol supported by gpsd, and another feeding the PPS timing information. Ntpd sees both devices as two different SHM devices, so the ntp.conf file must be like this:

server 127.127.28.0 minpoll 4
fudge 127.127.28.0 refid GPS
server 127.127.28.1 minpoll 4 prefer
fudge 127.127.28.1 refid PPS

2.2.2. shmpps

shmpps is a much simpler daemon that does exactly one job: detect the PPS changes, get a timestamp for each change, and send them to ntpd via a single shared memory. No absolute time is passed to ntpd, so ntpd should still use a NMEA device (on the same serial port) to get the absolute time reference.

So, /etc/ntp.conf must look like this:

server 127.127.20.0 mode 1 minpoll 4 prefer
fudge 127.127.20.0 flag3 0 flag2 0 refid NMEA
server 127.127.28.0 minpoll 4
fudge 127.127.28.0 refid PPS

Both gpsd and shmpps share a common problem: they detect DCD changes from user-space, that is, a program that blocks on the TIOCMGET ioctl until a change is detected and then gets a timestamp. In this case, the latency is larger that in the PPS kernel API, where the timestamp is read at interrupt time.

In conclusion, the LinuxPPS approach offers the best precision and simplicity as it requires no external daemons, but the kernel must be patched. On the other side, the SHM devices offer worse precision but are easier to set up.

3. How to enable support for PPS API

Here I explain how to patch and build the software modules involved in having GPS/PPS connected to ntpd. I also rebuilt the RPMs for these modules so that instalations is easier. The process involved in rebuilding the RPMs is also described below.

3.1. Building the Linux kernel

There are two ways to get LinuxPPS: via git or via patches. The latest version of LinuxPPS is available only via the git repository and it contains an entire Linux kernel tree, but only for one
kernel version version (2.6.28-rc6 at the time of writing). However, the patches are against a wider repertory of kernel versions, but the LinuxPPS implementations are older.

I'd rather not use the kernel available from the git repository because I prefer to stick to the same kernel version that is already installed in the target system, in my case, a Fedora 10 (2.6.27.5-117) but, at the same time, I want to have the latest LinuxPPS implementation.

So, I decided to patch the 2.6.27 kernel manually. This shouldn't take much time, as the patches look quite straightforward to apply.

First, I downloaded the entire git repository and diff'ed it against a stock 2.6.28-rc6 kernel, the resulting patch is the LinuxPPS implementation.

To get the original kernel that comes with my Fedora 10, I had to download the source RPM for the installed kernel version (2.6.27.5-117) and install it. This must be done on the target PC.

Then, I prepared a source tree for the patch and build:

# cd ~/rpmbuild
# rpmbuild --target i686 -bp SPECS/kernel.spec

Note the spec file already creates the kernel config file so there is no need to configure the kernel to match our current settings.

Just to test, I tried to apply the LinuxPPs patch directly to my 2.6.27.5 kernel but, of course, it didn't work as there were too many differences.

Then, I split the patch into two parts: one containing the new files specific to the pps implementation, and another for the existing kernel files. The first sub-patch was applied successfully, and the second one had to be manually applied, but it wasn't too painful. At the end,
I diff'ed against the original 2.6.27.5 kernel and saved them to a patch file (patch-2.6.27.5-ppsapi).

Then, I configured the kernel to enable PPS support:

make menuconfig
Device drivers -> PPS support
* Enable high-resolution timestamps
* Enable 'ktimer' and 'line discipline' as modules

General setup -> append to kernel version: "pps" to have this tag
on the kernel version string.

Then I compiled the kernel and the modules

# make
# make install

Check that /etc/grub.conf has the correct entries (it should have) so that when the system boots again, the new kernel is used

Now, the kernel header files must be prepared for compilation of the user-space tools:

# cd /usr/include
# mv linux linux.orig
# mv asm asm.orig
# mv asm-generic asm-generic.orig
# ln -s /lib/modules/2.6.27.5pps/build/include/linux
# ln -s /lib/modules/2.6.27.5pps/build/include/asm
# ln -s /lib/modules/2.6.27.5pps/build/include/asm-generic
# cp /lib/modules/2.6.27.5pps/build/Documentation/pps/timepps.h .

Note the timepps.h header is required for ntpd to detect the presence of the PPSAPI in the Linux kernel. Otherwise, ntpd wouldn't complain and silently revert to using the NMEA protocol only.

3.2. Building the 'ntpd' server


Now, it is time to patch and compile the NTP server (ntpd). I used the latest available release, 4.2.4p7 at the time of writing.

Initially, I used the nmea.patch from the LinuxPPS homepage but the resulting ntpd failed to detect the PPS signal from the GPS. After some detailed debugging using gdb, I found out that the DCD interrupts from the UART were disabled by ntpd during initialization, preventing the PPS signal from reaching the processor.

A more detailed debugging showed that the call to tcsetattr() in the refclock_setup function, indirectly caused the DCD interrupts to be disabled, though the bit mask of c_iflag should not cause this problem. I believe that changing some of the c_iflag bits causes the UART to
be incorrectly reprogrammed, perhaps this is why the ldattach utility requires patching. So, I decided to patch ldattach so that the c_iflag settings are the same as the ones set by ntpd. This is a dirty hack, but it seems to be how the LinuxPPS patches work here.

If you are experiencing problems and you suspect that PPS is not being read by ntpd, try reading the IER register (Interrupt Enable Register) of your 8250 UART (or compatible) and check that bit 3 is set. The IER register is at offset 0x01 of the UART (i.e. address 0x3F9).

To compile ntpd, download ntp-4.2.4p7 and apply the patch-ntp-4.2.4p7-ppsapi, then configure it to enable the NMEA driver (ID 127.127.20.u):

# ./configure --disable-all-clocks --disable-parse-clocks --enable-NMEA --enable-linuxcaps

Additionally, to may want to enable support for SHM drivers, in case you want to experience with user-space drivers, which don't require kernel patching but are more likely to be affected by latencies and be less precise. If so, add the "--enable-SHM" argument to the configure command.

Now, run 'make' and use the produced 'ntpd' driver and utilities.

Remember that the Linux PPSAPI must be enabled in the kernel, and that the correct kernel include files must be visible under /usr/include, as explained in the previous chapter.

3.3. Building the ldattach utility

The ldattach utility is used to set the line discipline associated to a serial port. LinuxPPS uses a new line discipline, named PPS, that detects changes in the DCD line of the serial port and feeds those changes into the kernel PPS API.

A small patch must be applied to ldattach, but it must be slightly different to the one proposed in the LinuxPPS homepage. I believe the spirit behind this patch is to set the same terminal config both by ldattach and ntpd so that no change is done and the UART registers preserve the
DCD detection.

ldattach is provided by the util-linux package in the Fedora 10 distribution.

The easiest way is to install the source RPM, add the new patch in the spec file and rebuild the rpm.

3.4 Rebuilding the RPMs

The affected RPMs are: kernel, kernel-headers, ntp, ntpdate, util-linux-ng and their debug and devel variants.

Rebuilding an RPM so that more patches are applied is quite straighforward. Usually, an RPM that comes from a distribution package has already several patches that get applied when the rpm is built. All we have to do is add the corresponding pps patches to each spec file and rebuild the RPM.

In general, the process is as follows: the source RPM is installed and this installs a spec file in the SPECS directory, and a source tarball and the patch files under the SOURCES directory. The SPECS and SOURCES directory can be found under the buildroot directory of the rpmbuild utility, usually under ~/rpmbuild. Then, the additional pps patch must be copied in the SOURCES
directory, and then edit the spec file to add the new patch file using a %patch clause. Number the patch clause so that our patch is applied after the others.

Also, it is a good idea to add the 'pps' suffix in the release string of the RPM, otherwise it would be impossible to tell a pps-capable rpm from a non-capable one.

The kernel RPM is the most complex of these RPMs, and the spec file had to be changed more:

* The 'buildid' macro was defined to be ".pps"
* Patches are applied using the "ApplyPatch" macro
* The timepps.h header must be copied to /usr/include when building
the kernel-headers rpm.

As described in the above paragraphs, before building the util-linux and ntp packages, the links to the new kernel include files must be done. Alternatively, the pps kernel rpm can be built and installed first, and then rebuild the rest of the packages.