Sunday, May 31, 2009

Random numbers in Java

A few updates have been made to the site's section on generating random numbers in Java. Random numbers can crop up in all sorts of applications, and it's worth having a good understanding of them.

Traditionally, the bread-and-butter means of random number generation has been the java.util.Ranom class. The technique it uses (see the section on the java.util.Random algorithm for more details) is still suitable for some very casual applications of random number generation, such as some simple games. But in many cases, you should probably be thinking of moving away from java.util.Random towards a higher quality generator. As discussed in the section, problems with java.util.Random include its low period, biases in the different bits of the numbers generated, and its unsuitability for generating combinations of values. Using a weak random generator can have side effects such as the following:
  • it can skew the results of a test harness that introduces subtle biases in the code path due to the random number generator;
  • when testing the performance of data structures, and in various simulations, a generator such as java.util.Random will not produce a good range of possible combinations of values, giving false results;
  • the series of numbers can be predictable, leading to disasterous results if used as the basis for security (e.g. generating a random encryption key, nonce or session ID).
One interesting technique, published in 2003 by Goerge Marsaglia, is the XORShift generator. This generates medium-quality random numbers in a few simple machine instructions. For example, given a Java long, x, we can generate the next random number as simply as follows (in fact, various combinations of shift values are possible, as discussed in Marsaglia's paper):


x ^= (x << 21);
x ^= (x >>> 35);
x ^= (x << 4);


continually executing these three lines will cycle through all (2^64)-1 possible values of x (i.e. all values of a long except zero) in pseudorandom order. Initialising x with the value of System.nanoTime(), or some other "random" seed, gives us a fast, medium-quality generator suitable for, say, generating random game data. It will make an excellent choice in many J2ME games, for example.

We also consider a Java implementation of the combined generator suggested by the Numerical Recipes authors, which generates numbers within a similar order of execution time as java.util.Random, but with a higher period and quality.

For applications where security depends on the quality and unpredictability of the random numbers generated, Java's SecureRandom class provides cryptographic strength random numbers, though is some 20-30 times slower than the other techniques.

Saturday, May 30, 2009

New article: how to choose a Java collection

A new addition to the Java Collections section of the Javamex site looks at a question that, perhaps unsurprisingly, crops up fairly frequently: which collections class should you use for a given task? The various collection classes provide a powerful means of managing objects and data in memory, but they're so powerful that choosing between them can sometimes be a bit daunting.

The approach that I take is to split the question firstly into two sub-questions. Firstly, what is the general type of structure that you need? That is, how is the data basically going to be organised? Usually, there is not too much confusion between a list and a map, especially if you consider whether or not you need to answer the question "for a given X, what is the Y"? But the difference between a set and a list is sometimes not well understood, or at least, not considered. With a little bit of thought, it is usually possible to decide in advance whether the purpose of your collection is to decide "is something there or not"? The trick is for programmers to remember to ask that question in the first place, and not simply plump for a list regardless.

Then, having decided on a list, map, set or queue, the next subquestion is obviously which particular flavour is required. In the case of a list, the cases where you wouldn't plump for an ArrayList are relatively uncommon and it should be clear in your head if you do choose something else that you have a "special case".

But in the case of maps, sets and queues, there are definitely more "horses for courses". But, especially in the first two, there are essentially two factors to consider: concurrency and ordering. As in the article, when the various choices are presented in tabular form according to these criteria, things become a little easier. As mentioned, a key rule of thumb is to choose the class that provides the minimum features that you require. If you don't need ordering, don't pay for it.

Java queues present a slightly complex set of choices, but in at least some cases-- especially a DelayQueue or SynchronousQueue-- it should be really clear that you need that class for a special case scenario. It should also be borne in mind that the executors framework means that in many common producer-consumer scenarios, you don't explicitly have to deal with the underlying job queue.

As usual, further questions and comments to any of the articles on Javamex are welcome on either this blog or the associated Java forum.

Wednesday, May 27, 2009

Patently stupid...

Like anyone, I have my fair share of stop-the-world-I-want-to-get-off moments. But I really did have to check the calendar to make sure it wasn't April 1st today when I read through IBM's patent application for their "solution for providing real-time validation of text input fields using regular expression evaluation during text entry" (20090132950). Wading through all the verbiage, it really does appear that IBM think they have "invented" evaluating a regular expression inside an onChange event handler.

In terms of this specific patent application, I don't think many programmers are too worried that they're suddenly going to be prosecuted for calling RegExp in the wrong place-- needless to say, plenty of prior art is being cited, in case it were needed.

But the case does leave some other more interesting questions. Why does IBM of all companies think it needs to stoop this low? And how can it help us ditinguish the "bleeding obvious" from the "possibly patentable"?

I'm really at a loss to answer the first question. Possibly we're simply looking at a clerical error on the part of an overenthusiastic junior employee. But for a company that is supposedly at the forefront of computing research and invention, claiming patent rights on what amounts to a chain of JavaScript API calls doesn't really help them uphold their reputation.

In answer to the second question, I'm reminded of a comment by Donald Knuth, that he regularly sees patents granted to "solutions" to problems that he sets as undergraduate homework questions. If "how do you validate input as the user types into a web form?" had been posted as a question on one of the various Internet programming forums (and probably it has...), it would almost certainly have been tagged as "smells like homework". Few programmers, I suspect, would have tagged it as "goodness me how novel-- I think we should patent this".

I really have no idea how in the field of software patents you can concretely separate "significant invention" from "doing the bleeding obvious" (though, as in this case, I can sometimes recognise the latter when I see it!). But in order for a software solution to be patentable, I would at least expect to see:

- evidence of significant empirical research necessary to find the solution
- an invention of an actual algorithm, with perhaps some guage of number of API calls per other instructions/lines of code in order to determine if a new algorithm had actually been invented.

I would also like to see severe financial penalities for cases such as this, where a company is clearly attempting to use its abundant resources to abuse the system. The small developers that the patent system was originally designed to protect really have to think twice before committing to the thousands of dollars per year that it costs to keep a patent going. A company such as IBM really has nothing to loose by continually paying for nonesensical patent applications out of petty cash on the off-chance that at some point, one of them will sucessfully fly over the cuckoo's nest.

In the case of this particular patent application, I really really hope for the sake of human sanity that common sense fianlly prevails...

Saturday, May 23, 2009

Java on your Kindle

It's excellent to see an array of Java programming books now available on the Kindle. Various of these books, which you may not have considered buying due to their cost, are available at a sometimes significantly reduced price on the Kindle.

Of notable interest to Javamex readers will be Brian Goetz's venerable Java Concurrency in Practice, which is really something of a bible for information on the Java Memory Model and the Java 5 concurrency library. As I say in the review, I can pretty much guarantee that if you're writing concurrent code (as many of us either are or will soon have to, not just on the server, but increasingly client-side), then Java Concurrency in Practice will allow you to fix various bugs in your code that you were possibly unaware of (it certainly fixed some in mine!), as well as help you make decisions about how to architecture your program around the Java 5 concurrency utilities.

For a more light-hearted read, but still an extremely enlightening one, Java Puzzlers remains excellent as ever. Josh Bloch's Effective Java got a whole new lease of life when its second edition was published, and its addition to the Kindle repertoire is most welcome.

Also worthy of note are the Core Java books. These, and in particular the second volume of Advanced Features, is not the most succinct of works, but covers various Java topics that you don't readily see explained in detail in other books.

Java for beginners turorial: what would you like to see?

Readers of the Javamex web site may or may not have seen the first pages of the site's Java for beginners tutorial, which covers a number of very basic Java topics such as variables, control structures (for loops, if/else), plus information on Java arrays.

Various topics are going to be added to the tutorial over the coming weeks. But how do you think it should be expanded? Are you just starting out in Java and having trouble with a particular topic? Or are you an experienced Java programmer, but remember having particular trouble with some aspect of learning Java with the tutorials that you used? Any feedback is welcome on this blog entry.

Tuesday, May 12, 2009

The secret life of the Java 'final' keyword

I'll freely admit that an omission that the Javamex site should have addressed sooner is a more explicit discussion of the Java final keyword. Its use for guaranteeing thread-safety is hinted at in some of the articles, but some more explicit information has been long overdue.

As a step towards addressing this gap, the aforementioned article looks at how, as of Java 5, the final keyword has acquired a very important characteristic for thread-safe programming. Most programmers think of final in terms of its impact on program design or presumed optimisation (which, as I'll mention in a moment, is probably a red herring). But its thread-safety guarantee is far more significant: any field declared as final can be safely accessed by another thread once the constructor completes, whereas, subtly, without this or any other thread-safety mechanism, that guarantee does not hold.

The final keyword and optimisation

An issue with the Java final keyword, also criticised by Josh Bloch and Neal Gafter in their excellent Java Puzzlers book, is that it means different things in different places. When applied to a class or method, it means that that class or method cannot be extended or overridden. This has led to a general (false) conception that "final is to do with optimisation", and its importance in thread-safety has been overlooked.

Even when applied to classes and methods, the notion that final is about optimisation is probably false in most cases. The argument appears to stem from languages such as C++, where "compilation" is a one-off process. In such languages, there are optimisations that you can make to method calls and field accesses if you know that the classes and fields in question will never be overridden. But when you're running in a VM, whether a class or method might "potentially" be overridden doens't really matter. What counts is whether it has been overridden at a given moment. It it hasn't at the point of JIT-compilation, and the JVM makes certain optimisations based on that observation, but then later the class/method is overridden, the JVM generally has the luxury (which a C++ compiler or linker generally doesn't) of being able to re-compile.

So when applied to classes and methods (and in fact, local variables inside methods), final is essentially a program design feature. When applied to instance and class variables, final is an important thread-safety mechanism.

Monday, May 11, 2009

CyclicBarrier

A new section of the Javamex web site looks at the CyclicBarrier class. In case you haven't come across it, CyclicBarrier is a construct that makes coordinating threads easier. Unlike CountDownLatch, which is designed for "one-off" coordinated actions, CyclicBarrier is designed to be re-used, so that it is suitable for iterative processes where the participating threads need to repeatedly perform a parallel operation, but periodically "converge" so that their results can be amalgamated.

In the article, we discuss the example of using a CyclicBarrier to coordinate a parallel sort algorithm. Essentially, the sort takes place in three stages. Each of the stages occurs in parallel, but at the end of each stage, a small single-threaded step must take place to amalgamate the results of the previous parallel operation.

Essentially, the code for a worker thread involved in the operation repeatedly carries out an operation and then calls the barrier's await() method. The latter method blocks until all participating threads have also called await() (we sometimes call this "arriving at the barrier"). At that point, CyclicBarrier executes our pre-determined "amalgamation" routine on one of the threads (the last one to arrive at the barrier, in fact), and then relases all the threads to move on to the next stage. Overall, the class takes out a lot of the actual thread coordination work, although, as we discuss in the article, we must still think about regular concurrency issues such as data synchronization and lock contention.

An additional feature of CyclicBarrier which we discuss is that it handles propagation of interruptions to all participating threads. In other words, if any thread involved in the operation is interrupted, then the whole operation will cease once all threads have called the await() method.

Whilst definitely one of the lesser used of the Java 5 concurrency utilities, CyclicBarrier is definitely a useful class that should not be overlooked.