XML Parse Library

Project: sourceforge.net/projects/xmlparselib
December 9, 2016

XML-Parse library is a lightweight set of re-usable functions for general purpose parsing, checking, and creating xml files. It can support stream-oriented, SAX or DOM parsing styles, and includes an optional xsd schema validator and graphical schema generator. It supports all valid XML, and includes checking for validity. This library has minimal dependencies, and is totally self-contained. It is written in C and is both speed and memory efficient, and is simple to use. Primary core functions have been posted, and additional advanced and useful XML-related utilities will be added. Released under MIT License.

Files:     xml_parse_lib.h,     xml_parse_lib.c,     and     examples.

The XML-Parse library contains functions for parsing and/or creating xml files in a variety of ways. You should use whichever set makes sense for your needs. The functions support the following alternative ways of working with XML files:

  1. Read whole xml-files into a tokenized tree-structure in memory, and then operate on, traverse, access, or further decode values out of the tree. Your custom application code is usually required to access and operate on the tokenized-values.
  2. Read xml-files, parse and interpret them as they are being read. Your custom application-specific code can be interspersed with the re-usable parsing calls to interpret, convert, operate-on or store values immediately as input-stream is read, instead of storing in an intermediate tokenized-tree structure. This method reduce time and memory requirements, and supports streaming operations.
  3. Build xml-tree structures with convenient reusable routines from data in your application, and or modify values in read-in trees.
  4. Write-out valid xml-files automatically from xml-trees that were constructed or read-into memory by your application.
  5. Check xml-trees against an arbitrary xml schema definition (XSD).


High-Level Functions:
  • Xml_Read_File - Reads any XML file and parses it into a token-tree of character strings. Expands xml-escapes on read-in during population of the xml-tree.
    Formal:
            Xml_object *Xml_Read_File( char *fname );

  • Xml_Write_File - Writes an XML tree structure out to a file in XML format. Intercepts dis-allowed characters and replaces them with xml-escape symbols on write-out.
    Formal:
            void Xml_Write_File( char *fname, Xml_object *object );

An example of using these functions to make a general-purpose xml-parser that will read any xml-file can be downloaded from: test_general_parser.c.


Medium-Level Functions:

First a note about terminology. Consider the following XML example:

  <tag1 attrib11="value11"> 
    <tag2> my contents2 </tag2>
  </tag1>
In this example, there are two tag-elements: tag1 and tag2. The element tag1 has one attribute called attrib11. All attributes are name and value pairs. The second element-tag is contained within the first tag-element, because it occurs before the corresponding closing tag, and is therefore considered a child of the first tag-element. The second element-tag has contents equal to my contents2.

  • Functions for creating new XML token-trees:
    • xml_tree_init() - Start a new xml-token-tree.
      Formal:
              struct xml_private_tree *xml_tree_init();

    • xml_tree_add_tag - Add a new tag element. The first tag-element in a tree is called the root-tag. By xml-rules, there should only be one root tag. Other tags should be children, grand-children, etc., of the root tag. Tags added in sequence to on another become siblings, all being children of the most recent parent.
      Formal:
              void xml_tree_add_tag( struct xml_private_tree *xml_tree, char *tagname );

    • xml_tree_add_attribute - Add attributes to a tag-element.
      Formal:
              void xml_tree_add_attribute( struct xml_private_tree *xml_tree, char *name, char *value );

    • xml_tree_add_contents - Set the contents for a tag-element.
      Formal:
              void xml_tree_add_contents( struct xml_private_tree *xml_tree, char *contents );

    • xml_tree_begin_children - Cause future tags to be children of the previous tag.
      Formal:
              void xml_tree_begin_children( struct xml_private_tree *xml_tree );

    • xml_tree_end_children - Must be called some time after a matching xml_tree_begin_children call. Causes future tags to be siblings of the previous tag, not children.
      Formal:
              void xml_tree_end_children( struct xml_private_tree *xml_tree );

    • xml_tree_cleanup - Releases all working variables for a given tree.
      Formal:
              Xml_object *xml_tree_cleanup( struct xml_private_tree **xml_tree );

    An example of using these functions to construct an arbitrary xml-file can be downloaded from: test_general_xml_generator.c.

  • Functions for traversing existing XML token-trees:
    • xml_tree_start_traverse - This function should be called once when starting a tree-traversal, before any of the following traversal calls (usually after xml-token-tree read-in or construction), to set the starting position at the root-tag of the xml tree. It returns the root-tag name and contents. The maxlen parameter sets the limit on string-size that can be returned. Supplied string containers must be pre-allocated for the caller.
      Formal:
              void xml_tree_start_traverse( struct xml_private_tree *xml_tree, Xml_object *roottag, char *tag, char *contents, int maxlen );

    • xml_tree_get_next_tag - Sequences to the next tag at the current level (sibling), if any, and gets the tag-name, contents, and returns true, returns zero. The maxlen parameter sets the limit on string-size that can be returned. Supplied string containers must be pre-allocated for the caller.
      Formal:
              int xml_tree_get_next_tag( struct xml_private_tree *xml_tree, char *tag, char *contents, int maxlen );

    • xml_tree_get_next_attribute - Returns the next attribute name-value pair for the current tag-element and returns 1, otherwise returns zero when no more attributes are in the current tag.
      Formal:
              int xml_tree_get_next_attribute( struct xml_private_tree *xml_tree, char *name, char *value, int maxlen );

    • xml_tree_descend_to_child - Descends to the child-level of the current tag and returns 1, otherwise returns zero if the current tag has no child-tag-elements. If it returns 1, then the next tag returned will be the first child, then its siblings, and so on.
      Formal:
              int xml_tree_descend_to_child( struct xml_private_tree **xml_tree, char *tag, char *contents, int maxlen );

    • xml_tree_ascend() - Call after last child at a given level. Ascends to parent level.
      Formal:
              void xml_tree_ascend( struct xml_private_tree **xml_tree );

    An example of using these functions to traverse an arbitrary xml-token-tree can be downloaded from: test_xml_token_traverser.c.


Low-Level Functions:
  • xml_parse( file, tag, content, maxlen, linenumber ) - Gets next tag, attributes, and contents from an xml file. This is the core routine on which all xml file-reading methods are based. It can be called by applications that wish to parse and interpret values/contents during read-in. See the source-code of Xml_Read_File() for an example of how to use it. The xml_parse routine expands any xml-escaped symbols found in the content-string.
    Inputs:
    • File pointer to read from.
    • Maximum string lengths to return (buffer-sizes).
    Outputs:
    • Tag character string. (Text between next <...> brackets.)
    • Contents character string. (Text after > ... up to next < bracket.)
    Formal:
            void xml_parse( FILE *fileptr, char *tag, char *content, int maxlen, int *lnn );

  • xml_grab_tag_name( tag, name, maxlen ) - Pulls the tag-name off the tag-string, and shortens the tag-string by removing the name from the tag-string. Use after calling xml_parse, which gets the next tag-string from a file.
    Inputs:
    • Tag string, as returned by xml_parse().
    • Maximum string length to return (buffer-size).
    Outputs:
    • Tag name character string.
    Formal:
            void xml_grab_tag_name( char *tag, char *name, int maxlen );

  • xml_grab_attrib( tag, name, value, maxlen ) - Grabs the next name-value pair within an xml-tag-string, if any. Use after calling xml_parse and xml_grab_tag_name, to get the following tag attribute string. Then call this sequentially to grab each name = "value" attribute pair, if any, until exhausted. If the tag is closed by "/", then the last name returned will be "/" and the value will be empty. This routine expands any xml-escaped symbols in the value-string before returning.
    Inputs:
    • Remaining tag string.
    • Maximum string lengths to return (buffer-sizes).
    Outputs:
    • Attribute-Name character string.
    • Attribute-Value character string.
    Formal:
            void xml_grab_attrib( char *tag, char *name, char *value, int maxlen );

Unit Converters
  • accept_distance - Accepts units of meter, mm, cm, km, feet, ft, foot, yrd(s), yard(s), Mile(s). Units are case insensitive. Default unit is assumed "meter". Returns value in meters. Returns true on success, zero on failure. Allows attribute or content values to be expressed as, for example: "3.2 cm", "0.9 feet"; yet all are converted to consistent meter units. See discussion on specifying unit values in xml files.
    Formal:
            int accept_distance( char *wrd, float *dist );

  • accept_time - Accepts units of sec[onds], min[ute(s)], hr(s), hour(s), day(s), week(s), month(s), year(s). Units are case insensitive. Default unit is assumed "seconds". Returns value in seconds. Returns true on success, zero on failure.
    Formal:
            int accept_time( char *wrd, float *t );

  • accept_frequency - Accepts units of khz, mhz, ghz, thz. Units are case insensitive. Default unit is assumed Hz. Returns value in Hz. Returns true on success, zero on failure.
    Formal:
           int accept_frequency( char *wrd, float *freq );

  • accept_bool - Accepts Y[es], N[o], T[rue], F[alse], On, Off. (Portions of keyword shown in square brackets are optional.) Values are case insensitive. Returns true or false, accordingly.
    Formal:
            int accept_bool( char *word );

  • accept_temperature - Accepts units of "C[elsius]", "F[ahrenheit]", "K[elvin]", or "degrees C[elsius]", "degrees F[ahrenheit]", or "degrees K[elvin]". Units are case insensitive. Default unit is assumed "Celsius". Returns value in Celsius. Returns true on success, zero on failure.
    Formal:
            int accept_temperature( char *wrd, float *t );

  • accept_power - Accepts units of W, Watt, Watts, Kw, Mw, Gw. Units are case insensitive. Default unit is assumed Watts. Returns value in Watts. Returns true on success, zero on failure.
    Formal:
           

  • accept_dbvalue - Accepts units of dB. Units are case insensitive. Default unit is assumed linear (not dB). Returns value in linear. If in dB, then conversion to liner depends on whether Energy or Power quantity is specified in the unites field by "E" or "P". Returns true on success, zero on failure.
    Formal:
            int accept_dbvalue( char *wrd, float *value, char units );




Examples - (These files are included in the download package.)

  1. General XML Streaming Reader Shell
  2. XML Test-file Generator
  3. General XML Automatic Parsing
  4. General XML Traversal





Download -

Or,





SourceForge.net Logo