Physical Units

When I first wrote this note, perhaps 4 or 5 years ago, there didn't seem to be a lot of interest in standardizing physical units, except among the scientific community. One example is the package called "Java Addition to Default Environment" (described in JScience - Java Tools and Libraries for the Advancement of Sciences), developed by J-M. Dautelle. This is mostly run-time, but it now has a package of "measurements" that is strongly typed, and in fact there seems to be a growing awareness that a run-time facility, although it can be very comprehensive, does not protect you against some potentially catastrophic errors. Your system may run beautifully for years because it seldom executes a particular path in the code, and then crash unexpectedly in the middle of the night! Although this provides a lot of flexibility, from a practical point of view this does not seem adequate. Compile-time checking, on the other hand, runs into various language limitations, and it is hard to form a consensus that accommodates all the various requirements. Over the last few years, a large number of different packages have been developed, typically within individual industries, in response to their particular needs, but it appears that (speaking as of today - August 2010!) we have not made great progress in agreeing on an industry-wide standard. A colleague, Denis Garneau, recently retired from IBM, and I have been trying to come up with a type-safe implementation of physical units in Java for some time now, and we have come up with some ideas, based partly on our experience with business data types, which I am describing below. This preliminary work came to the attention of Werner Keil of Creative Arts and Technologies, who is active in the Java standards activity, but so far it is not clear how our work should be integrated with the packages that Werner and his colleagues are developing.

There is no doubt that the run-time approach avoids a number of problems, some of which will be described in what follows. The JScience project provided a large number of units and sophisticated dimensional analysis - mostly at run-time. The units supported by this package were grouped into two classes: SI and Non-SI. SI also contains symbols for the 20 SI prefixes (positive and negative powers of 10). There are apparently other approaches to units being worked on in various places - NASA says they now have their own, which they may share with the Java community at some time in the future - but standardization of the units of measure is still very much a work in progress. NASA was probably strongly motivated by the loss of the Mars Lander, alluded to in my web page entitled "Smart Data".

A related problem with either type of facility is that it cannot support all the units in use worldwide, so we need a way for an application to add its own unit definitions. It seems to make sense to partition symbols by dimension - at least the non-SI symbols. It is reasonable to assume that the SI symbols are pretty much standarized (except for the spelling issue for meters/metres, liters/litres, etc.), but the non-SI measures make for a pretty unwieldy collection. As J-M. Dautelle points out, the American and British measures are often different, e.g. ton, gallon, cup, and the two sides of the Atlantic are supposed to be using the same language! So, it seems as though we should group the non-SI units by dimension and language. However, my hope is that we can gradually extend the concepts described below - one dimension at a time! - without running into an unmanageable combinatorial complexity! We'll see!

Distance

Let's start with a basic dimension: distance. The approach I propose applies to just about all physical units - and of course so do the problems. Anyway, to get us started, I suggest two classes relating to the distance dimension: DistanceQuantity and DistanceUnit. These in turn extend two abstract classes: DimensionQuantity and AbstractUnit, respectively. These names are a bit different from earlier names given in this web page - the new ones are due to Werner Keil, and will hopefully maintain compatibility with his standardization work.

DimensionQuantity is an abstract class with the following instance variables:

double scalar; // value in reference units double units; // value in units (Unit unit) Unit unit; // reference to an object of class Unit

  
  double scalar;                     // value in reference units                           
  double units;                      // value in units (Unit unit)
  Unit unit;                         // reference to an object of class Unit

We will hold a physical dimension in two forms (for performance reasons): a "reference" unit, into which all values of that dimension will be converted by their constructors, and an amount in the unit in which the data was originally created (on the assumption that the user will probably want to see it in that form at some point). E.g. if the user entered a distance as "6 inches", s/he may want to see it in that form again, not 15.24 cm. A facility will also be provided to convert quantities to other units of the same dimension.

AbstractUnit is an abstract class with three instance variables:


String name; // e.g. "Angstrom" double multFactor; // e.g. 1E-10 double addFactor = 0.0; // only used for temperatures

In our example, DistanceQuantityextends DimensionQuantity which provides a number of general methods, used to build dimension-specific methods.

DistanceUnit contains a number of useful distance units, defined as public static, plus constructors for building new ones. Each such definition gives a public name for the unit, and specifies how to convert it to the reference unit. Here is a possible set of distance units:

public static final DistanceUnit m = new DistanceUnit("m",1.0);// this is the reference Unit public static final DistanceUnit REF_UNIT = m; // reference Unit - this allows REF_UNIT to be used in a number of contexts public static final DistanceUnit in = new DistanceUnit("in",0.0254); public static final DistanceUnit km = new DistanceUnit("km",1.0e+3); public static final DistanceUnit mile = new DistanceUnit("mile",1609.0); public static final DistanceUnit verst = new DistanceUnit("verst", ft, 3500.0); // perhaps this should be in a Russian section? public static final DistanceUnit angstrom = new DistanceUnit("angstrom",1.0e-10); public static final DistanceUnit AU = new DistanceUnit("AU",1.5e+11); public static final DistanceUnit parsec = new DistanceUnit("parsec",3.08e+16);

  
  public static final DistanceUnit m = new DistanceUnit("m",1.0);// this is the reference Unit
  public static final DistanceUnit REF_UNIT = m; // reference Unit - this allows REF_UNIT to be used in a number of contexts
  public static final DistanceUnit in = new DistanceUnit("in",0.0254);
  public static final DistanceUnit km = new DistanceUnit("km",1.0e+3);
  public static final DistanceUnit mile = new DistanceUnit("mile",1609.0);
  public static final DistanceUnit verst = new DistanceUnit("verst", ft, 3500.0);  // perhaps this should be in a Russian section?
  public static final DistanceUnit angstrom = new DistanceUnit("angstrom",1.0e-10);
  public static final DistanceUnit AU = new DistanceUnit("AU",1.5e+11);
  public static final DistanceUnit parsec = new DistanceUnit("parsec",3.08e+16);

The DistanceUnit class has (at least) two constructors:


Constructors
public DistanceUnit (String, double) public DistanceUnit constructor specifying display String and multiple of reference unit
public DistanceUnit (String, DistanceUnit, double) public DistanceUnit constructor specifying display String, DistanceUnit and multiple of that unit

Here you will see the standardization problem. My colleague points out that the SI units have standard abbreviations, and these could certainly be used as variable names, except for the cases where the units are derived. However, many measures in common use are not only not standard, but also language-dependent. For instance, we cannot make "foot" a world-wide standard as the corresponding French term is "pied". Some symbols could be written using Unicode characters, as these are acceptable within Java identifiers, but this is somewhat unwieldy if these terms are going to be in common use. For instance, angstrom, shown in the example, is more correctly \u00C5ngstr\u00F6m, which represents "Ångström". I suggest that a practical compromise is to use the DistanceUnit class, to define "locally defined" units for an application - some of these might eventually themselves become de facto standards...

Note that the above definitions provide both a Java variable name, and a String for input/output, which would presumably support a wider choice of characters. The latter, however, introduces the question of "localization" - since I suspect there are more languages than unit names, we should instead have a HashMap for each language, where we can look up the string representing the unit in a particular language.

There are also the intertwined questions of grammar and syntax: in English, it is customary to use the singular for one unit, and plural (generally ending in "s") for zero or more than one. Where should this information be held - along with any exceptions, such as "foot" ("feet"), which naturally does not take the "s"?

Here is a preliminary list of possible constructors and methods for the Distance class - as in all OO applications, I'm sure real-life applications will see the need for additional ones. Among the constructors you will notice one that has novalues defined - this is defined as non-public, so it cannot be used outside the class.

Constructors for DistanceQuantity:


Constructors
public DistanceQuantity (double, DistanceUnit) public Distance constructor with value initialized from double and DistanceUnit
DistanceQuantity () non-public Distance constructor with no values set

Initial (and probably partial) set of methods (all public):


returns	Methods
DistanceQuantity	add (DistanceQuantity) add one distance to another, giving result in reference units
DistanceQuantity	subtract (DistanceQuantity) subtract one distance from another, giving result in reference units
boolean	eq (DistanceQuantity) return true if one distance is equal to another
boolean	ne (DistanceQuantity return true if one distance is not equal to another
boolean	gt (DistanceQuantity) return true if one distance is greater than another
boolean	ge (DistanceQuantity) return true if one distance is greater than or equal to another
boolean	lt (DistanceQuantity) return true if one distance is less than another
boolean	le (DistanceQuantity) return true if one distance is less than or equal to another
DistanceQuantity	multiply (double) multiply a distance by a scalar, giving a distance
DistanceQuantity	divide (double) divide a distance by a scalar, giving a distance
AreaQuantity	multiply (DistanceQuantity) multiply one distance to another, giving an area
VolumeQuantity	multiply (AreaQuantity) multiply a distance by an area, giving a volume
SpeedQuantity	divide (TimeInterval) divide a distance by a time interval, giving a speed
TimeInterval	divide (SpeedQuantity) divide a distance by a speed, giving a time interval
DistanceQuantity	convert (DistanceUnit) convert distance to another unit
String	showInUnits (DistanceUnit, int) show distance in specified unit, and specified number of decimal places (rounded if necessary)

This may not be a complete set of methods, but it should be enough to give you a flavour of what I'm driving at. Notice that these methods seem to fall into groups:

the first eight and the last two belong to a "family" of methods that all involve a single dimension
the next two involve multiplication or division of the dimension by a scalar (double floating-point in our case - more scalar formats could be added)
the remainder are "mixed dimension" methods

Here is an example of how such units might be referenced. And my distances are totally arbitrary! Of course, in real life, the "trip leg" information would be read from a database.

final DistanceUnit klik = new DistanceUnit("kilometre", DistanceUnit.km, 1.0); LinkedList<TripLeg> trip = new LinkedList<TripLeg>(); trip.add(new TripLeg("YKK", "NYC", new DistanceQuantity(1200.2, klik))); trip.add(new TripLeg("NYC", "LAX", new DistanceQuantity(5000.0, DistanceUnit.km))); DistanceQuantity totDist = new DistanceQuantity(0, klik); for (TripLeg t : trip) { totDist = totDist.add(t.getDist()); } System.out.println(totDist.showInUnits(klik, 2));

                
		final DistanceUnit klik = new DistanceUnit("kilometre", DistanceUnit.km,
				1.0);
						
		LinkedList<TripLeg> trip = new LinkedList<TripLeg>();

		trip.add(new TripLeg("YKK", "NYC", new DistanceQuantity(1200.2, klik)));

		trip.add(new TripLeg("NYC", "LAX", new DistanceQuantity(5000.0, DistanceUnit.km)));

		DistanceQuantity totDist = new DistanceQuantity(0, klik);

		for (TripLeg t : trip) {

			totDist = totDist.add(t.getDist());
		}
		
		System.out.println(totDist.showInUnits(klik, 2));

Notice that we have used the DistanceUnit class to define a "local usage" unit called a "klik", defined as 1.0 kilometres.

Relating Centimetres to Square Centimetres

An earlier attempt at getting the units right when, say, multiplying distance by distance to get area used the reference units to effect the right relationship (each dimension has its own reference unit). But then I realized that this is somewhat dangerous, as the combinatorics may get out of hand, so in the implementation of such mixed dimension operations, I convert the values to a relevant unit first. E.g. if distances are converted to centimetres before being multiplied, you can be sure that the resulting area will be in square centimetres. The result can then be converted to the desired unit for further processing.

Mass and Weight

Now, we can set up a similar system for "mass", defining grams, kilos, pounds, etc., but some really interesting problems start to show up. Let us define both mass and weight as separate classes, which makes sense from a scientific point of view. In a store, I measure the weight of a bag of tomatoes, rather than its mass, but mass really means the amount of matter in the bag of tomatoes, whereas weight merely represents the attraction that the Earth exerts on said bag. In common parlance, we use the same units for both, except for units explicitly introduced for scientific reasons such as the "slug" (14.59 kg mass), but they have very different dimensions ("weight" is a force, and its dimensions are "mass" times "acceleration" (md/t²)), so I feel it makes sense to define separate classes for them. And you can indeed define a set of units and methods for both classes, without having them interfere at all.

The converse argument is that, on the surface of the Earth, where we measure most of the things we deal with in everyday life, most people do not distinguish between these two dimensions. In fact, it might be rather hard to insist that nails, potatoes, cement, etc. should all be measured in terms of mass, rather than weight, if both classes are provided in a system of units of measurement. Rocket science will tend to involve mass, rather than weight, as weight is meaningless in outer space - which probably explains why NASA decided it had to have its own units.

So, ideally, we need some classes that only distinguish between mass and weight when you absolutely have to! In the "Units of Measure" project (UOMo) being worked on by Werner Keil and his colleagues, I see that there is a Mass class, but no Weight class. If they felt a need for a Weight class, they would probably treat it as a compound dimension (md/t²), which I believe is supported by their project - I am pretty sure JScience does support this.

However, I thought I would add a class for "weight", in case the users of these classes ever need to send parcels from the Earth to the Moon, or vice versa, and have to figure out the postage. At first I thought we should just add a bunch of methods connecting these dimensions, e.g.


returns	Methods
WeightQuantity	multiply (AccelerationQuantity) multiply a mass by an acceleration, giving a weight

However, this results in a logical inconsistency. Since in this case we would want 1 kilo mass to convert to 1 kilo weight, we would have to force the acceleration to be 1 - i.e. 1 gravity. In fact, we all know that the relationship depends on what planet we are on, or, more generally, what gravitational field we are in. So let us also define a class called GFactor, with values such as "nullG" (0.0), "earthG" (1.0), "moonG" (0.17), etc. Now we can define a more pragmatic relationship between mass and weight, defining a method in the Mass class, as follows:


returns	Methods
WeightQuantity	calcWeight (GFactor) determine the weight for this mass, given a G factor

and its converse (in the WeightQuantity class):


returns	Methods
MassQuantity	calcMass (GFactor) determine the mass for this weight given a G factor

In fact, we can keep Acceleration, with the same dimensions as GFactor, but with a different reference unit - Acceleration would use 1 metre/sec², while GFactor would use 1 "G", which is defined to be 9.80665 metres/sec².

SI Prefixes

The Unit-API project implements the standard SI prefixes as methods of a utility class called MetricPrefix, ranging from YOTTA to YOCTO - personally I feel the use of methods leads to rather awkward constructions, even though the prefixes are capitalized, e.g. DistanceQuantity q = new DistanceQuantity(200, GIGA(DistanceUnit.km));

I have experimentally set up a class called SIPrefix, containing a single field, which is the multiplier value. Individual prefixes can then be defined as static instances of the class SIPrefix. This allows what I consider to be a cleaner syntax:

DistanceUnit hectom = new DistanceUnit("hectometre", DistanceUnit.km, SIPrefix.deci);
DistanceQuantity q = new DistanceQuantity(200, hectom);

as the prefix "deci" essentially means 0.1, so 0.1 x 1000 = 100. I am proposing this as a way of avoiding the use of "prefix" methods, which bothers me - very subjective, I agree! And I do understand that the method technique is probably already in widespread use - so it's probably too late!

Temperature

This uses the addFactor value defined in the abstract class Unit. I am not sure if other dimensions require this, but temperatures certainly do. This factor provides an offset to be used when converting temperatures to the reference unit (Celsius or Centigrade).

I will try to add more as I figure it out!