Value units in XML Files:
- Units for Quantities:
Four possible methods for handling the units of quantitative values in XML
come to mind:
1. Do not notate units in the XML, and allow only one unit to be used
that is specified in separate documentation.
Example: altitude="30000"
2. Allow (or require) units to be specified with each value.
Example: <length> 1.2 miles </length>
<param altitude="30000 ft"
length="23 km"
height="1509 meters" />
(Hereafter dispensing with angle-bracket (<,>) parts ...)
3. Define separate tag and/or attribute names for each combination of value and each
possible unit:
Example: altitude_ft="30000"
altitude_meters="9144"
altitude_miles="5.8"
altitude_km="9.144"
length_miles="1.2"
length_ft="6240"
length_km="1.9"
length_meters="1900"
length_yards="2190"
length_inches=".."
length_cm=".."
...
4. Allow only one unit to be used, and specify it as part of the tag name.
Example: altitude_ft="30000"
Next, let's discuss these while considering some of the primary principles
of XML, which include the recording of sufficient information within
self-contained files to consistently and unambiguously decode and
process information for both humans and computers, and while allowing
much freedom of organization and content.
Comparative Pro's and Con's:
Method 1:
Pro:
Simpler to develop code for, since there is no dynamic unit detection or conversion.
Con:
This method has the disadvantage that units are implied, and can
easily be misunderstood or misinterpreted, leading to serious, but possibly
undetected errors. It is also more difficult for anyone to find out what the
units are, since they would have to look elsewhere for documentation, which
may be out of date, could be lost or may be unavailable later or at other sites.
This is basically the "old-way" of recording computer data.
Method 2:
Pro:
This method is natural for humans to read, write, and interpret.
It corresponds to how values are noted in general literature, dash-boards,
verbal communications, and scientific and technical documentation.
The units are clear and contained within the XML, so they can be correctly
interpreted without separate documentation. Being self-contained, the
units are clearly communicated with the values to anyone at any time, even
if the documentation is separated or lost. The potential for
misinterpretation is minimized.
A single keyword is used for each distinct quantity, and many units
can be conveniently used for all quantities. This permits a high degree
of code re-use, is easy for computers to decode, and easy for programmers
to write and maintain. It is consistent with the modern object-oriented
concept of polymorphism. For example, the SI (System Internal)
system of units is composed of 7 primary kinds of units. Therefore a small
number of unit converters can be used to convert units for hundreds or
thousands of quantities. Consider that a given file may have hundreds
or thousands of distance or power measurements.
It also accommodates a great degree of flexibility in units that
can be accepted. For example, one supplier of data have all their data in
English units, while another has all theirs in metric and might
prefer to keep it that way as well. It may be difficult or impractical to get
multiple organizations to agree on units. Already existing data is what
it is, unless separately converted.
Con:
Requiring units may seem "wordy". Allowing default units allows the
potential of miscommunication, like method 1.
Method 3:
Pro:
This method minimizes the potential for miscommunication and
records all units within the single self-contained file, just as
method 2 above does.
Con:
It produces an explosion in the number of keyword tags needed,
and/or limits the units that can be used. It does not exploit re-use
of conversion code. Every quantity-unit combination has to be documented,
coded, interpreted, and converted separately.
For example, consider a system containing 100 length/distance quantities
and 20 power quantities. Length can be specified in 8 units:
km, meters, cm, mm, miles, yards, feet, inches. Power can be
specified in 8 units as well: dB, watts, mw, uw, kw, Mw.
So a total of 920 keyword tags would be required (8 * 100 + 6 * 20).
But method 2 would require only 120 quantity-names and two unit converters,
which would be simpler to specify in xml as well as develop code for.
Method 4:
Pro:
No confusion. Unit is stated where used, similar to methods (2) and (3) above.
Con:
This method has the disadvantage that only one unit can be accepted for each quantity and
lacks flexibility. It forces data suppliers to convert their units,
which may require processing database files, or writing custom export conversion
code for each measurement requiring conversion.
Summary:
All methods have pros and cons. Assuming higher weights are assigned to
communicating data correctly, with greater flexibility, and easiest interpretation
by the widest audience, than other concerns, such as minimizing file size; then
methods (2)+(3) appear better than the (1) and (4).
Of methods (2) and (3), method (2) has the same advantages of (3), while
having fewer serious disadvantages. Therefore, method-2 is generally recommended,
as long as default units are discouraged.
|