Friday, January 26, 2007

Time Bugs

A short while ago I received a bug report for Mulgara, where certain Date/Time stamps were being randomly changed. I dreaded to think what this could be, and avoided it in the hope that the reporter (DavidM - but not the DavidM who wrote the storage layer!) or someone actively working in the storage layer (like Andrae) would find the problem.

Another report yesterday made me realize that DavidM didn't know where in the system the problem was to be found. Rather than simply tell him to look at SPObject and SPDateTimeImpl, I decided to have a look myself. After all, I know this area reasonably well.

The latest bug report claims that whenever the following timestamp is entered:
  April 26, 1981, 02:56:00
Then the system always returns:
  April 26, 1981, 03:56:00
(one hour later)

First, I ran the test queries and confirmed that the problem occurs for me too, and that it is definitely the string pool causing the problem. To help me narrow things down more quickly, I copied out some of the relevant code from the string pool so I could play with SPDateTimeImpl in an isolated environment.

I quickly realized that the problem was occurring when a third-party library was parsing XSD dates. So I reported this to the developers list, and stopped looking. There were some alternative library suggestions, and Brian said he would fix it.

Third Party F/OSS

It was still bothering me that this library was failing for us. Given that it was open source software, I decided that I could at least report the problem to the developers. I don't know about taking the time to debug the code, but at least I could give them a heads-up.

(I won't mention the name of the library, because I don't want anyone to think there is anything wrong with their library. Read on.)

So I wrote an example piece of code that parses one of these bad date/times, and sent it to the developers of this library.

A couple of hours later a developer wrote back to tell me that the library works correctly for him, and that the problem may be related to timezone issues. I'm in CST and he is in BST. He gave me a test case, and asked for some details of my system.

So I ran the test, and discovered 2 things. First, the time was still being printed incorrectly, though it worked for him. Second, the tests that he wrote all passed.

Initially I thought that the tests must have been written poorly, but on inspection I discovered that they were perfectly correct. In this case, he was comparing the results of their parsed date/time with a Calendar object from the Sun JDK. Of course, since the printed data was incorrect, I expected it to compare incorrectly to the Calendar object from the Sun JDK. But this wasn't happening.

The next step was to work out why this was happening, so I asked the third-party library to print its internal long value, and compare it to the internal long value from the Calendar. But the Calendar was also giving the wrong value. That's when I realized something was screwy with the JDK, and not the third-party library.

Timeslip

A quick test class shows the problem:
import java.util.Calendar;
public class TimeTest {
public static void main(String[] args) {
Calendar cal = Calendar.getInstance();
cal.clear();
cal.set(1981, Calendar.APRIL, 26, 2, 56);
System.out.println("Time @2:56 in millis: " + cal.getTimeInMillis());
cal.clear();
cal.set(1981, Calendar.APRIL, 26, 3, 56);
System.out.println("Time @3:56 in millis: " + cal.getTimeInMillis());
}
}
If you live in Australia or Great Briton, then this code will work perfectly. For instance, running this is Australia will give you:
Time @2:56 in millis: 357065760000
Time @3:56 in millis: 357069360000
Note that there is a time difference between 2:56 and 3:56 of 3600000 milliseconds.

However, if you happen to live in the Central American timezone, then you will get this:
Time @2:56 in millis: 357123360000
Time @3:56 in millis: 357123360000
Note that these numbers are identical! They both represent 3:56am on that day.

I tried this with Java 1.4, 1.5 and 1.6 on my Mac. I also tried with with Java 1.5 on x86/Linux and Windows. All of these configurations return identical results.

Could Sun have created a problem here? Why does it only affect timezones in the USA?

At this point it was getting late, and I was startled at what I was seeing. Andrae was online, so I showed him. I was too tired to look in the source for java.util.Calendar and java.util.GregorianCalendar, so he offered to look for me. But rather than letting me go to bed, he insisted I should blog about it so that it was written up somewhere.

Fortunately, it takes a little while for me to write a blog entry. This allowed Andrae time to discover what was happening. It comes down to the following:

The Uniform Time Act of 1966 (15 U.S. Code Section 260a) [see law], signed into Public Law 89-387 on April 12, 1966, by President Lyndon Johnson, created Daylight Saving Time to begin on the last Sunday of April and to end on the last Sunday of October. Any State that wanted to be exempt from Daylight Saving Time could do so by passing a state law.

According to comments in java.util.GregorianCalendar, the hour sequence on that night is:
  12 Std -> 1 Std -> 3 Dst -> 4 Dst
This means that there was no such time as 02:56 on that morning. So the Mulgara date/time bug is not a bug at all, but required behavior.

Disambiguation

The Mulgara behavior may be correct from one perspective, but it can still be troublesome. A time like this can be supplied from a non-US source, but it cannot be entered into a US system. This demonstrates that Mulgara date/times do not support timezones, even though they are described in the XSD specification that they implement.

I'm thinking that the default behavior in Mulgara should be to assume the local timezone (as it does now), but to allow for explicit timezones as well. That way any ambiguity can be removed.

Date/times on the input should allow the timezone to be included, as per the spec. I was disappointed to learn that we don't support that at the moment, but it should be easy enough to add the option to the parser.

That leaves the issue of the output. At the moment, it will be converted to the client's local timezone though it will be printed without timezone information. I'd rather not change the default behavior, so I think we should introduce an option of printing the timezone. Strictly speaking, the timezone used won't matter, as the times will be converted according the timezone being used. However, it would still be nice to specify the exact zone to be used. The time is stored relative to UTC, and the conversion to the time in a specific locality is only really relevant when it needs to be presented to a person. This means that we only need to worry about it at the client end. The infrastructure doesn't exist to store a timezone at the server, so it is only really feasible to make the conversion at the client end anyway.

I expect to put in a system property to control the output format in a few days. I have a lot of travel ahead of me tomorrow, so perhaps I'll get the time then.

No comments: