Elements for Columns

Define the variables for the SAS data set.

Syntax

COLUMN name="name" retain="NO | YES" class="ORDINAL | FILENAME | FILEPATH"
TYPE
DATATYPE
DEFAULT
ENUM
FORMAT width="w" ndec="d"
INFORMAT width="w" ndec="d"
DESCRIPTION
LENGTH
PATH syntax="type"
INCREMENT-PATH syntax="type" beginend="BEGIN | END"
RESET-PATH syntax="type" beginend="BEGIN | END"
DECREMENT-PATH syntax="type" beginend="BEGIN | END"

Elements

COLUMN name="name" retain="NO | YES" class="ORDINAL | FILENAME | FILEPATH"
is an element that contains a variable definition. For example, <COLUMN name="Title">.
name="name"
specifies the name for the variable. The name must be a valid SAS name, which can be up to 32 characters.
Requirement:The name= attribute is required.
retain="NO | YES"
is an optional attribute that determines the contents of the input buffer at the beginning of each observation.
NO
sets the value for the beginning of each observation either to MISSING or to the value of the DEFAULT element if specified.
YES
keeps the current value until it is replaced by a new, nonmissing value. Specifying YES is much like the RETAIN statement in DATA step processing. It forces the retention of processed values after an observation is written to the output SAS data set.
class="ORDINAL | FILENAME | FILEPATH" XMLV2 Only
is an optional attribute that determines the type of variable.
ORDINAL
specifies that the variable is a numeric counter variable that keeps track of the number of times the location path, which is specified by the INCREMENT-PATH element or the DECREMENT-PATH element, is encountered. (This is similar to the _N_ automatic variable in DATA step processing.) The counter variable increments or decrements its count by 1 each time the location path is encountered. Counter variables can be useful for identifying individual occurrences of like-named data elements or for counting observations.
Restriction:When exporting an XML document, variables with class="ORDINAL" are not included in the output XML document.
Requirements:You must use the INCREMENT-PATH element or the DECREMENT-PATH element. The PATH element is not allowed.

The TYPE element must specify the SAS data type as numeric, and the DATATYPE element must specify the type of data as integer.

FILENAME
generates a character variable that contains the filename and extension of the input document. This functionality can be useful when you assign a libref for the XML engine that is associated with a physical location of a SAS library to determine which file contains a particular value.
Requirement:The TYPE element must specify the SAS data type as character, and the DATATYPE element must specify the type of data as string.
FILEPATH
generates a character variable that contains the pathname, filename, and extension of the input document. This functionality can be useful when you assign a libref for the XML engine that is associated with a physical location of a SAS library to determine which file contains a particular observation.
Requirement:The TYPE element must specify the SAS data type as character, and the DATATYPE element must specify the type of data as string.
Requirement:At least one COLUMN element is required.
Interaction:COLUMN can contain one or more of the following elements that describe the variable attributes: DATATYPE, DEFAULT, ENUM, FORMAT, INFORMAT, DESCRIPTION, LENGTH, TYPE, PATH, INCREMENT-PATH, DECREMENT-PATH, and RESET-PATH.
TYPE
specifies the SAS data type (character or numeric) for the variable, which is how SAS stores the data. For example, <TYPE> numeric </TYPE> specifies that the SAS data type for the variable is numeric.
Requirement:The TYPE element is required.
Tips:To assign a floating-point type, use
<DATATYPE> float </DATATYPE>
<TYPE> numeric </TYPE>
.

To apply output formatting in SAS, use the FORMAT element.

To control data type conversion in input, use the INFORMAT element. For example, <INFORMAT> datatime </INFORMAT>.

DATATYPE
specifies the type of data being read from the XML document for the variable. For example, <DATATYPE> string </DATATYPE> specifies that the data contains alphanumeric characters.
The type of data specification can be
string
specifies that the data contains alphanumeric characters and does not contain numbers used for calculations.
integer
specifies that the data contains whole numbers used for calculations.
double
specifies that the data contains floating-point numbers.
datetime
specifies that the input represents a valid datetime value, which is either
  • in the form of the XML specification ISO 8601 format. The default form is: yyyy-mm-ddThh:mm:ss.ffffff.
  • in a form for which a SAS informat (either supplied by SAS or user-written) properly translates the input into a valid SAS datetime value. See also the INFORMAT element.
date
specifies that the input represents a valid date value, which is either
  • in the form of the XML specification ISO 8601 format. The default form is: yyyy-mm-dd.
  • in a form for which a SAS informat (either supplied by SAS or user-written) properly translates the input into a valid SAS date value. See also the INFORMAT element.
time
specifies that the input represents a valid time value, which is either
  • in the form of the XML specification ISO 8601 format. The default form is: hh:mm:ss.ffffff.
  • in a form for which a SAS informat (either supplied by SAS or user-written) properly translates the input into a valid SAS date value. See also the INFORMAT element.
Restriction:The values for previous versions of XMLMap syntax are not accepted by versions 1.9 and 2.1.
Requirement:The DATATYPE element is required.
DEFAULT
is an optional element that specifies a default value for a missing value for the variable. Use the DEFAULT element to assign a nonmissing value to missing data. For example, <DEFAULT> single </DEFAULT> assigns the value single when a missing value occurs.
Default:By default, the XML engine sets a missing value to MISSING.
ENUM
is an optional element that contains a list of valid values for the variable. The ENUM element can contain one or more VALUE elements to list the values. By using ENUM, values in the XML document are verified against the list of values. If a value is not valid, it is either set to MISSING (by default) or set to the value specified by the DEFAULT element. Note that a value specified for DEFAULT must be one of the ENUM values in order to be valid.
<COLUMN name="filing_status">
   .
   .
   .
   <DEFAULT> single </DEFAULT>
   .
   .
   .
   <ENUM>
      <VALUE> single </VALUE>
      <VALUE> married filing joint return </VALUE>
      <VALUE> married filing separate return </VALUE>
      <VALUE> head of household </VALUE>
      <VALUE> qualifying widow(er) </VALUE>
   </ENUM>
</COLUMN>
FORMAT width="w" ndec="d"
is an optional element that specifies a SAS format for the variable. A format name can be up to 31 characters for a character format and 32 characters for a numeric format. A SAS format is an instruction that SAS uses to write values. You use formats to control the written appearance of values. Do not include a period (.) as part of the format name. Specify a width and length as attributes, not as part of the format name.
For a list of the SAS formats, including the ISO 8601 SAS formats, see SAS Formats and Informats: Reference.
width="w"
is an optional attribute that specifies a format width, which for most formats is the number of columns in the output data.
ndec="d"
is an optional attribute that specifies a decimal scaling factor for numeric formats.
Here is an example:
<FORMAT> E8601DA </FORMAT>
<FORMAT width="8"> best </FORMAT>
<FORMAT width="8" ndec="2"> dollar </FORMAT>
INFORMAT width="w" ndec="d"
is an optional element that specifies a SAS informat for the variable. An informat name can be up to 30 characters for a character informat and 31 characters for a numeric informat. A SAS informat is an instruction that SAS uses to read values into a variable (that is, to store the values). Do not include a period (.) as part of the informat name. Specify a width and length as attributes, not as part of the informat name.
For a list of the SAS informats, including the ISO 8601 SAS informats, see SAS Formats and Informats: Reference.
Here is an example:
<INFORMAT> E8601DA </INFORMAT>
<INFORMAT width="8"> best </INFORMAT>
<INFORMAT width="8" ndec="2"> dollar </INFORMAT>
width="w"
is an optional attribute that specifies an informat width, which for most informats is the number of columns in the input data.
ndec="d"
is an optional attribute that specifies a decimal scaling factor for numeric informats. SAS divides the input data by 10 to the power of this value.
DESCRIPTION
is an optional element that specifies a description for the variable, which can be up to 256 characters. The following example shows that the description is assigned as the variable label.
<DESCRIPTION> Story link </DESCRIPTION>
LENGTH
is the maximum field storage length from the XML data for a character variable. The value refers to the number of bytes used to store each of the variable's values in the SAS data set. The value can be 1 to 32,767. During the input process, a maximum length of characters is read from the XML document and transferred to the observation buffer. For example, <LENGTH> 200 </LENGTH>.
Restriction:LENGTH is not valid for numeric data.
Requirement:For data that is defined as a STRING data type, the LENGTH element is required.
Tip:You can use LENGTH to truncate a long field.
PATH syntax="type"
specifies a location path that tells the XML engine where in the XML document to locate and access a specific tag for the current variable. In addition, the location path tells the XML engine to perform a function, which is determined by the location path form, to retrieve the value for the variable. The XPath forms that are supported allow elements and attributes to be individually included in the generated SAS data set.
syntax="type"
is an attribute that specifies the type of syntax used in the location path. The syntax is valid XPath construction in compliance with the W3C specifications. The XPath form supported by the XML engine allows elements and attributes to be individually included in the generated SAS data set.
Default:XPath
Requirements:The value must be XPath or XPathENR.

If an XML namespace is defined with the NAMESPACES element, you must specify the type of syntax as XPathENR (XPath with Embedded Namespace Reference). This is because the syntax is different from the XPath specification. For example, syntax="XPathENR".

To specify the PATH location path, use one of the following forms:
CAUTION:
These forms are the only XPath forms that the XML engine supports.
If you use any other valid W3C form, the results will be unpredictable.
element-form
selects PCDATA (parsed character data) from a named element. The following element forms enable you to select from a named element, conditionally select from a named element based on a specific attribute value, or conditionally select from a named element based on a specific occurrence of the element using the position function:
<PATH> /LEVEL/ITEM </PATH>
<PATH> /LEVEL/ITEM[@attr="value"] </PATH>
<PATH> /LEVEL/ITEM[position()=n]|[n] </PATH>
The following examples illustrate the element forms. For more information about the examples, see Specifying a Location Path on the PATH Element.
  • The following location path tells the XML engine to scan the XML markup until it finds the CONFERENCE element. The XML engine retrieves the value between the <CONFERENCE> start tag and the </CONFERENCE> end tag.
    <PATH> /NHL/CONFERENCE </PATH>
  • The following location path tells the XML engine to scan the XML markup until it finds the TEAM element where the value of the founded= attribute is 1993. The XML engine retrieves the value between the <TEAM> start tag and the </TEAM> end tag.
    <PATH> /NHL/CONFERENCE/DIVISION/TEAM[@founded="1993"] </PATH>
  • The following location path uses the position function to tell the XML engine to scan the XML markup until it finds the fifth occurrence of the TEAM element. The XML engine retrieves the value between the <TEAM> start tag and the </TEAM> end tag.
    <PATH> /NHL/CONFERENCE/DIVISION/TEAM[position()=5] </PATH>
    You can use the following shorter version for the position function:
    <PATH> /NHL/CONFERENCE/DIVISION/TEAM[5] </PATH>
attribute-form
selects values from an attribute. The following attribute forms enable you to select from a specific attribute or conditionally select from a specific attribute based on the value of another attribute:
<PATH> /LEVEL/ITEM/@attr </PATH>
<PATH> /LEVEL/ITEM/@attr[attr2="value"] </PATH
The following examples illustrate the attribute forms. For more information about the examples, see Specifying a Location Path on the PATH Element.
  • The following location path tells the XML engine to scan the XML markup until it finds the TEAM element. The XML engine retrieves the value from the abbrev= attribute.
    <PATH syntax="XPath"> /NHL/CONFERENCE/DIVISION/TEAM/@abbrev </PATH>
  • The following location path tells the XML engine to scan the XML markup until it finds the TEAM element. The XML engine retrieves the value from the founded= attribute where the value of the abbrev= attribute is ATL. The two attributes must be for the same element.
    <PATH> /NHL/CONFERENCE/DIVISION/TEAM/@founded[@abbrev="ATL"] </PATH>
Requirements:Whether the PATH element is required or allowed is determined by the class="ORDINAL" attribute for the COLUMN element. If the class="ORDINAL" attribute is not specified, which is the default, PATH is required and INCREMENT-PATH, DECREMENT-PATH, and RESET-PATH are not allowed. If the class="ORDINAL" attribute is specified, PATH is not allowed, INCREMENT-PATH or DECREMENT-PATH is required, and RESET-PATH is optional.

If an XML namespace is defined with the NAMESPACES element, you must include the identification number in the location path preceding the element that is being defined. The identification number is enclosed in braces. For example, <PATH syntax="XPathENR">/Table/Hurricane/{1}Month</PATH>. See Including Namespace Elements in an XMLMap.

The XPath construction is a formal specification that puts a path description similar to UNIX on each element of the XML structure. XPath syntax is case sensitive. For example, if an element tag name is uppercase, it must be uppercase in the location path. If it is lowercase, it must be lowercase in the location path. All location paths must begin with the root-enclosing element (denoted by a slash '/'), or with the "any parent" variant (denoted by double slashes '//'). Other W3C documented forms are not currently supported.

INCREMENT-PATH syntax="type" beginend="BEGIN | END"
specifies a location path for a counter variable, which is established by specifying the COLUMN element attribute class="ORDINAL". The location path tells the XML engine where in the input data to increment the accumulated value for the counter variable by 1.
syntax="type"
is an optional attribute that specifies the type of syntax in the location path. The syntax is valid XPath construction in compliance with the W3C specifications. The XPath form supported by the XML engine allows elements and attributes to be individually included in the generated SAS data set. For example, syntax="XPath".
Default:XPath
Requirements:The value must be XPath or XPathENR.

If an XML namespace is defined with the NAMESPACES element, you must specify the type of syntax as XPathENR (XPath with Embedded Namespace Reference). This is because the syntax is different from the XPath specification. For example, syntax="XPathENR".

beginend="BEGIN | END"
is an optional attribute that specifies to stop processing when either the element start tag is encountered or the element end tag is encountered.
Default:BEGIN
Requirements:If an XML namespace is defined with the NAMESPACES element, you must include the identification number in the location path preceding the element that is being defined. The identification number is enclosed in braces. For example, <INCREMENT-PATH syntax="XPathENR">/Table/Hurricane/{1}Month</INCREMENT-PATH>.

The XPath construction is a formal specification that puts a path description similar to UNIX on each element of the XML structure. Note that XPath syntax is case sensitive. For example, if an element tag name is uppercase, it must be uppercase in the location path. If it is lowercase, it must be lowercase. All location paths must begin with the root-enclosing element (denoted by a slash '/') or with the "any parent" variant (denoted by double slashes '//'). Other W3C documented forms are not currently supported.

If the variable is not a counter variable, PATH is required and INCREMENT-PATH and RESET-PATH are not allowed. If the variable is a counter variable, PATH is not allowed and either INCREMENT-PATH or DECREMENT-PATH is required.

RESET-PATH syntax="type" beginend="BEGIN | END"
specifies a location path for a counter variable, which is established by specifying the COLUMN element attribute class="ORDINAL". The location path tells the XML engine where in the XML document to reset the accumulated value for the counter variable to zero.
syntax="type"
is an optional attribute that specifies the type of syntax in the location path. The syntax is valid XPath construction in compliance with the W3C specifications. The XPath form supported by the XML engine allows elements and attributes to be individually included in the generated SAS data set. For example, syntax="XPATH".
Default:XPath
Requirements:The value must be XPath or XPathENR.

If an XML namespace is defined with the NAMESPACES element, you must specify the type of syntax as XPathENR (XPath with Embedded Namespace Reference). This is because the syntax is different from the XPath specification. For example, syntax="XPathENR".

beginend="BEGIN | END"
is an optional attribute that specifies to stop processing when either the element start tag is encountered or the element end tag is encountered.
Default:BEGIN
Requirements:If the variable is not a counter variable, RESET-PATH is not allowed. If the variable is a counter variable, RESET-PATH is optional.

If an XML namespace is defined with the NAMESPACES element, you must include the identification number in the location path preceding the element that is being defined. The identification number is enclosed in braces. For example, <RESET-PATH syntax="XPathENR">/Table/Hurricane/{1}Month</RESET-PATH>.

The XPath construction is a formal specification that puts a path description similar to UNIX on each element of the XML structure. Note that XPath syntax is case sensitive. For example, if an element tag name is uppercase, it must be uppercase in the location path. If it is lowercase, it must be lowercase. All location paths must begin with the root-enclosing element (denoted by a slash '/') or with the "any parent" variant (denoted by double slashes '//'). Other W3C documented forms are not currently supported.

DECREMENT-PATH syntax="type" beginend="BEGIN | END"
specifies a location path for a counter variable, which is established by specifying the COLUMN element attribute class="ORDINAL". The location path tells the XML engine where in the input data to decrement the accumulated value for the counter variable by 1.
syntax="type"
is an optional attribute that specifies the type of syntax in the location path. The syntax is valid XPath construction in compliance with the W3C specifications. The XPath form supported by the XML engine allows elements and attributes to be individually included in the generated SAS data set. For example, syntax="XPath".
Default:XPath
Requirements:The value must be XPath or XPathENR.

If an XML namespace is defined with the NAMESPACES element, you must specify the type of syntax as XPathENR (XPath with Embedded Namespace Reference). This is because the syntax is different from the XPath specification. For example, syntax="XPathENR".

beginend="BEGIN | END"
is an optional attribute that specifies to stop processing when either the element start tag is encountered, or the element end tag is encountered.
Default:BEGIN
Requirements:If the variable is not a counter variable, DECREMENT-PATH is not allowed. If the variable is a counter variable, either DECREMENT-PATH or INCREMENT-PATH is required.

If an XML namespace is defined with the NAMESPACES element, you must include the identification number in the location path preceding the element that is being defined. The identification number is enclosed in braces. For example, <DECREMENT-PATH syntax="XPathENR">/Table/Hurricane/{1}Month</DECREMENT-PATH>.

The XPath construction is a formal specification that puts a path description similar to UNIX on each element of the XML structure. XPath syntax is case sensitive. For example, if an element tag name is uppercase, it must be uppercase in the location path. If it is lowercase, it must be lowercase in the location path. All location paths must begin with the root-enclosing element (denoted by a slash '/'), or with the "any parent" variant (denoted by double slashes '//'). Other W3C documented forms are not currently supported.