Dashboard > Java UUID Generator > Home > FAQ
Log In   View a printable version of the current page.
Added by Tatu Saloranta, last edited by Tatu Saloranta on Sep 05, 2005  (view change)
Labels: 
(None)


"JUG" - Java Uuid Generator FAQ

1. Why JUG?

1.1. Don't we already have "uuidgen"?

Some do, some don't.

Most platforms have uuidgen command line tool (or something similar), but not all do. Additionally, accessing uuidgen from Java may be tricky (since its location in native OS filesystem depends on OS and possibly other factors).

So, portability is one benefit; Jug works if you have Java 1.2.

Performance may be another benefit when using Jug from Java. Interfacing to native functionality (either via uuidgen or directly to libuuigen) is likely to be slower than calling Jug methods, even if generation itself was faster.

1.1. But how about my (favourite project's) own Unique Identifier generator?

Many projects (and even individual developers...) build their own home-grown unique identifier generation schemes; usually using Java's system timer and IP address, and sometimes adding memory location (via identity hash of one of system classes, or generator singleton class).
Fine; this is usually plenty for single application. But for good interoperability (in this case, avoiding collisions) it may make sense to use a true standard; and not everything people call UUID really is one (they often just mean "unique within my domain and for my needs"). Jug is also well tested, performs about as fast as such a scheme can be implemented, is pretty compact codewise and is licensed under simple open source licenses. What is not to like?

2. Why NOT use JUG?

If you are paranoid about duplicate UUIDs (esp. when using time-based algorithm), there's no way to guarantee that multiple UUID-generators don't produce same UUID. It's still unlikely to happen (due to clock sequence field etc), but potentially a problem.
Uuidgen usually solves this by having a system-wide global lock to prevent possibility of using same timestamps; but with Java the best Jug can guarantee is that there's always max. 1 Jug instance per JVM; other JVMs may have their own copies. (note: in theory it would be possible to add native support for locking, for platforms that have locking functionality... but then it might also be easy to just use native uuidgen functionality as well)

Note, though, that with random- and name-based methods multiple instance of Jug are not a problem; name-based methods base the uniqueness on the name, not timing, and random-based method is based on quality of the random number generator. In latter case it all depends on how random one considers SecureRandom to be.

3. What is the fastest method to use for generating UUIDs?

It depends on your system, random number generators used etc. etc., but here are some quick test results from my work station (Ultra-60 dual 450Mhz SparcII; JDK 1.3.1, default JIT == client) (measurements done using Jug command-line tool, generating 1000 UUIDs for each type):

  • Time-based: 0.03 msec / UUID
  • Random-based: 0.08 msec / UUID
  • Name-based: 0.18 msec / UUID
  • TagURI, no date: 0.18 msec / UUID
  • TagURI, with date: 0.43 msec / UUID

Creating datestamps for tag uris (new Calendar instances for each URI) slows the last entry significantly down it seems. Note also that names & namespaces for the last three methods were relatively short, so the 'real' numbers might be bit worse for them too (esp. since generating the separate names will add cost; for this test 3. and 4. used the same namespace + name for each UUID which is not too realistic)

So, it seems that for default settings, time-based algorithm is the fastest, followed by random-number based one. Name-based algorithms are slow probably due to MD5-hashing cost associated. (as a sidenote, at home on my 800mhz AMD system times were about half of those presented above)

Finally, if performance really is very important for you, there is a further complication when using time-based algorithm; Java's system clock has max. resolution of 1 millisecond (that is, prior to Java 1.5 which also has a higher-resolution timer available on some platforms), instead of 100ns required by UUID specification. This is solved by using additional counter (in Jug), but the downside is that for each separate Java 'time slice' (time period when system clock returns same timestamp) can produce at most 10000 UUIDs. If JDK on the platform does advance in 1 msec ticks, this is good enough for generating up to 10 million UUIDs per second, but on many platforms resolution is coarser (on Windows it used to be 55 msec, meaning max. rate of 180 kUUIDs per second).

... which all means that for generating more than, say, ten thousand UUIDs per second, you may need to look at native implementations.
But often with system like that you aren't really using Java in the first place.

4. Which one should I use, assuming performance is not important?

If you can access the ethernet card address it might be good idea to use time-based algorithm, if you will only be generating UUIDs from single JVM (and won't be using other UUID-tools at the same time). If so, uniqueness is pretty much guaranteed and algorithm is fast as well.

One potential drawback is that in case you consider giving out ethernet address a security problem (which in theory it could be, although there probably aren't any major immediate problems), this method is not for you, since ether address is stored as is in last 6 bytes of UUID (this could be partially solved by hashing the ethernet address, but the standard doesn't mention this solution so it's not implemented yet)

If there will be multiple UUID generators (different JVMs, using native uuidgen), using random-based method may be the best option.
It should be reasonably safe to use (provided JDK's default SecureRandom is implemented as well as it should).

Finally, if it's easy to generate unique names from system (say, URL combined with a sequence number guaranteed to be unique), and especially if these 'human readable' identifiers (such as tagURIs) are otherwise used, it may be a good idea to use one of the name-based algorithms.
It's easy to generate UUIDs from tag-URIs, so one-way conversions can be done on-the-fly.

5. How can I obtain the Ethernet MAC-address of the machine JUG runs on?

Before version 1.0, your options would be limited to using native tools and passing address to JUG, or using dummy randomly generated broadcast addresses.

However, beginning from version 1.0, there exists limited support for C/JNI - based native access for obtaining interface addresses.

To obtain MAC-address of the primary interface, just call:

EthernetAddress primary = NativeInterfaces.getPrimaryInterface();

(Note that if there's a problem in loading the JNI library, an Error is thrown).

To test that you can use JNI code, you can also directly invoke class org.safehaus.uuid.NativeInterfaces: its main() method will try to access the Ethernet address of the primary interface.

Currently there exists binary library files for Linux/x86,
Windows 32 / x86 (ie. 98, ME, NT, 2K, XP), Solaris/Sparc and Mac OS X platforms.
Help with compiling/developing more versions would be greatly appreciated. In some cases existing native code might be usable as is; for example BSD unixes might be able to use Mac OS X code after recompilation.

(1.0.2): Now it is possible to load native code both by using 'standard' library loading methods (which rely on java env. variable 'java.library.path' for locating libs), as well as application-specific loading from any given directory (default being 'jug-native' in current directory). Default is still app-specific method; to enable standard loading, call NativeInterfaces.setUseStdLibDir().

6. Is there a way to synchronize UUIDs produced by JUG instance running on separate JVMs?

By default (and always with pre-2.0 Jug), Jug does not try to prevent multiple instances running from separate JVMs. The reason is that JVMs do not offer a generic mechanism for instances running on separate JVMs (or even via multiple class loaders!) to communicate easily.

Starting with 2.0, there is a file-locking based synchronization mechanism that can be used to synchronize access, so that basically only one instance can ever run.
See next section on details on how to use this feature.

7. What about cases where system reboots, and system time is set to an earlier timestamp?

By default (and always with pre-2.0 Jug), Jug has no way of knowing that system time has gone backwards between last run, and new startup. Although Jug does keep track of used timestamps when it is running (to prevent problems in cases where system time is moved backwards by system administrator), there was no mechanism to prevent problems during time Jug was not running.

Starting with 2.0, there is a file-locking based synchronization mechanism that can be used to synchronize access, so that basically only one instance can ever run.

To enable this feature, you need to:

TO BE WRITTEN

Site running on a free Atlassian Confluence Open Source Project License granted to Safehaus. Evaluate Confluence today.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.4 Build:#809 Jun 12, 2007) - Bug/feature request - Contact Administrators