X11workbench Toolkit  1.0
XML-specific Text Utilities

Specialized text utility functions for parsing XML data. More...

Typedefs

typedef struct tagCHXMLEntry CHXMLEntry
 Descriptor for parsed XML entry. More...
 

Functions

static const char * InternalParseXML (CHXMLEntry **ppOrigin, int *pcbOrigin, CHXMLEntry **ppCur, char **ppData, int *pcbData, char **ppCurData, const char *ppXMLData, const char *pXMLDataEnd)
 Parses contents of an XML tag, returning as WBAlloc'd string list similar to environment strings. More...
 
CHXMLEntryCHParseXML (const char *pXMLData, int cbLength)
 Parses contents of an XML tag, returning as WBAlloc'd string list similar to environment strings. More...
 
void CHDebugDumpXML (CHXMLEntry *pEntry)
 Parses contents of an XML tag, returning as WBAlloc'd string list similar to environment strings. More...
 
char * CHParseXMLTagContents (const char *pTagContents, int cbLength)
 Parses contents of a single XML tag, returning as WBAlloc'd string list similar to environment strings. More...
 
const char * CHFindNextXMLTag (const char *pTagContents, int cbLength, int nNestingFlags)
 Parses contents of a XML to find the next tag, skipping comments along the way. More...
 
const char * CHFindEndOfXMLTag (const char *pTagContents, int cbLength)
 Parses contents of an XML tag to find the end of it. More...
 
const char * CHFindEndOfXMLSection (const char *pTagContents, int cbLength, char cEndChar, int bUseQuotes)
 Parses XML text for the end of a 'section', typically ending in '>' ')' or ']'. More...
 

Detailed Description

Specialized text utility functions for parsing XML data.

Typedef Documentation

◆ CHXMLEntry

Descriptor for parsed XML entry.

This structure describes an XML entry. It is expected to be a part of an array of such structures, in which the next element will be referenced by 'iNextIndex', and the first contents referenced by 'iContentsIndex', with respect to the beginning of that array.

You can construct an array of CHXMLEntry structures from XML data using CHParseXML()

This structure deliberately uses integer offsets instead of pointers. The offsets refer to the starting pointer of the CHXMLEntry array, so you'll always need to pass the index along with the origin pointer to refer to a particular element. The basic reason for doing this is to allow the array to be relocateable without any pointer fixups.

If you want to create a C++ object to wrap it, you could return pointers by using 'GetXXX()' methods, particularly those that can operate on const objects. Then you would store the origin pointer along with the index inside of the C++ object, and calculate the correct addresses in the 'GetXXX()' methods as required by the application.

typedef struct tagCHXMLEntry
{
int iNextIndex; // 0-based index for next item at this level; <= 0 for none. 0 marks "end of list" for top level
int iContainer; // 0-based index for container; <= 0 for none.
int iContentsIndex; // 0-based first array index for 'contents' for this entry; <= 0 for none
int nLabelOffset; // BYTE offset to label (zero-byte-terminated) string (from beginning of array)
// for this entry; <= 0 for 'no label'
int nDataOffset; // BYTE offset to data (zero-byte-terminated) string (from beginning of array)
// for the entry data; <= 0 for 'no data'

Function Documentation

◆ CHDebugDumpXML()

void CHDebugDumpXML ( CHXMLEntry pEntry)

Parses contents of an XML tag, returning as WBAlloc'd string list similar to environment strings.

Parameters
pEntryA pointer to an XML data entry as returned by CHParseXML

Call this function to do a 'debug dump' of XML contents using the debug I/O function 'WBDebugPrint()'.

Header File: conf_help.h

◆ CHFindEndOfXMLSection()

const char* CHFindEndOfXMLSection ( const char *  pTagContents,
int  cbLength,
char  cEndChar,
int  bUseQuotes 
)

Parses XML text for the end of a 'section', typically ending in '>' ')' or ']'.

Parameters
pTagContentsA pointer to the string position just past the tag name
cbLengthThe (maximum) length of the XML data to parse
cEndCharThe ASCII character that the section ends with, typically '>' ')' or ']'
bUseQuotesA flag that is non-zero to ignore content within quoted strings, zero to ignore quote marks
Returns
A pointer to the 'cEndChar' at the end of the XML section, or 'one byte past the end' if not found.

Generic XML parsing. Parse the XML to find the end of the section, which can be a tag or a block of text that is delimited using '[]' or '()' or a character of your own choosing. The returned pointer will either be the end of the string, or a pointer to the ending character 'cEndChar'. The end of the string is defined by either a 0-byte terminator, or 'cbLength' bytes.

According to the XML spec, a '<' or '>' can ONLY be considered as part of a tag if it's outside of a 'CDATA' section, quoted string, or a comment as defined in section 2.4 of the XML spec. Additionally, special characters can appear inside of a quoted string that's part of a tag, A 'CDATA' section is marked with '' and ends with '' outside of '( )' nesting. Values within tags include the use of '[' and ']' to allow embedding tags within a tag, and so this function will consider the 'nesting' to find the end of the section as marked by 'cEndChar'.

See also
https://www.w3.org/TR/REC-xml/ for additional information on XML

Header File: conf_help.h

Definition at line 2941 of file conf_help.c.

◆ CHFindEndOfXMLTag()

const char* CHFindEndOfXMLTag ( const char *  pTagContents,
int  cbLength 
)

Parses contents of an XML tag to find the end of it.

Parameters
pTagContentsA pointer to the string position just past the tag name
cbLengthThe (maximum) length of the XML data to parse
Returns
A pointer to the '>' at the end of the XML tag, or 'one byte past the end' if no end-of-tag found.

Generic XML tag parsing. Parse the tag to find its end. The returned pointer will either be the end of the string, or a pointer to the ending '>'. The end of the string is defined by either a 0-byte terminator, or 'cbLength' bytes.

According to the XML spec, a '<' or '>' can ONLY be considered as part of a tag if it's outside of a 'CDATA' section, quoted string, or a comment as defined in section 2.4 of the XML spec. Additionally, special characters can appear inside of a quoted string that's part of a tag, A 'CDATA' section is marked with '' and ends with '' outside of '( )' nesting. Values within tags include the use of '[' and ']' to allow embedding tags within a tag, and so this function will consider the 'nesting' to find the end of the tag.

See also
https://www.w3.org/TR/REC-xml/ for additional information on XML

Header File: conf_help.h

Definition at line 3077 of file conf_help.c.

◆ CHFindNextXMLTag()

const char* CHFindNextXMLTag ( const char *  pTagContents,
int  cbLength,
int  nNestingFlags 
)

Parses contents of a XML to find the next tag, skipping comments along the way.

Parameters
pTagContentsA pointer to the string position just past the tag name
cbLengthThe (maximum) length of the XML data to parse
nNestingFlagsA bit flag indicating whether or not this tag is nested within another tag
Returns
A pointer to the '<' at the beginning of the XML tag, or 'one byte past the end' if no tag found. Returns NULL on error.

Generic XML tag parsing. Parse XML text to find the next tag. The returned pointer will either be the end of the string, or a pointer to the '<'. The end of the string is defined by either a 0-byte terminator, or 'cbLength' bytes.

NOTE: this function assumes that you are parsing outside of a tag. If you are parsing within a tag, use 'CHFindEndOfXMLTag()' to find the end of the tag first, then use THIS function to search for the next tag beyond that point.

Values of 'nNestingFlags' can be as follows:

CHPARSEXML_DEFAULT  0   Default behavior (just look for '<')
CHPARSEXML_PAREN    1   Stop on detection of '(' or ')'
CHPARSEXML_BRACKET  2   Stop on detection of '[' or ']'

According to the XML spec, a '<' or '>' can ONLY be considered as part of a tag if it's outside of a 'CDATA' section, quoted string, or a comment as defined in section 2.4 of the XML spec. Additionally, special characters can appear inside of a quoted string that's part of a tag, A 'CDATA' section is marked with '' and ends with '' outside of '( )' nesting. Values within tags include the use of '[' and ']' to allow embedding tags within a tag, and so this function will consider the 'nesting' to find the next tag.

See also
https://www.w3.org/TR/REC-xml/ for additional information on XML

Header File: conf_help.h

Definition at line 2889 of file conf_help.c.

◆ CHParseXML()

CHXMLEntry* CHParseXML ( const char *  pXMLData,
int  cbLength 
)

Parses contents of an XML tag, returning as WBAlloc'd string list similar to environment strings.

Parameters
pXMLDataA pointer to the XML data as a const char *
cbLengthThe length of the XML data (can include a terminating 0 byte)
Returns
A WBAlloc()'d array of CHXMLEntry structures, followed by the actual XML data, or NULL on error.

Generic XML parsing. Parse the XML data, returning a WBAlloc()'d array of CHXMLEntry structures. The last structure in the list will have '-1' as its iNextIndex value. The actual XML data related to the parsed information follows the array, and the index values in the CHXMLEntry structures are with respet to the beginning of the array (in bytes). Caller must free any non-NULL pointer returned by this function, using WBFree()

XML data contents can take 2 basic forms. One of them stores name/value pairs within a tag

<example name="value" name2="value2" />

Another stores name/value pairs as embedded content, NOT within the tag

<example><name>value</name><name2>value2</name2></example>

Both formats are recognized and parsed equivalently with this function. In fact, they can be combined.

Note that if there is no closing tag or self-closing '/' mark, subsequent tags will be parsed as if they are embedded content, without any error messages for missing closing tags. This function does not check whether or not all tags have been properly closed. Instead, if it finds the end of the XML text before all closing tags are found, it "closes" them all and returns without an error.

This function attempts to comply with the XML standard as best at it can, with respect to the use of 'CDATA', and substitution of '&' '>' and others. ' ' tags will be parsed somewhat differently as they will never contain things that should be parsed anyway, and the entire tag will be stored as if it were embedded data [basically if you want to accept these, parse them yourself, and all that goes with it]. Typically it will be used within <script> tags for HTML, as one example.

data that is surrounded by tags will have a single '=' prepended to it, and it will be de-quoted following the '='. Multiple '=' and other special characters can appear in the text data. '&' '<' and '>' will be translated in the data value so that they can be included in the value.

Additionally, outside of CDATA, a "raw" '<' or '&' can NOT even appear in a quoted string. That is part of the XML spec, and HTML is treated the same way.

See also
https://www.w3.org/TR/REC-xml/ for additional information on XML

Header File: conf_help.h

Definition at line 2465 of file conf_help.c.

◆ CHParseXMLTagContents()

char* CHParseXMLTagContents ( const char *  pTagContents,
int  cbLength 
)

Parses contents of a single XML tag, returning as WBAlloc'd string list similar to environment strings.

Parameters
pTagContentsA pointer to the string position just past the tag name
cbLengthThe length of the tag contents up the trailing '>'. Preceding characters (such as '-->' or '/>') will be ignored, as well as the trailing '>'
Returns
A WBAlloc'd string list, similar in format to 'environ'; i.e. "VALUE=xxxx xxxx xxxx\0" (or an embedded XML section), with the possibility of embedded quotes (not doubled nor '\'d). and ending in a zero byte. The end of the list is marked with an additional '\0'. Returns NULL on error.

Generic XML tag parsing. Parse the tag to 'just past the tag name', find the ending '>', and pass that length as 'cbLength'. Caller must free any non-NULL pointer returned by this function, using WBFree()

Header File: conf_help.h

Definition at line 2600 of file conf_help.c.

◆ InternalParseXML()

static const char* InternalParseXML ( CHXMLEntry **  ppOrigin,
int *  pcbOrigin,
CHXMLEntry **  ppCur,
char **  ppData,
int *  pcbData,
char **  ppCurData,
const char *  ppXMLData,
const char *  pXMLDataEnd 
)
static

Parses contents of an XML tag, returning as WBAlloc'd string list similar to environment strings.

Parameters
ppOriginA pointer to the 'origin' pointer for CHXMLEntry array

Internal function for use by CHParseXML, to recurse levels of XML

Definition at line 2411 of file conf_help.c.