1. GLS Library String and Character Termination

The GLS library functions are intended to be used in many different contexts. In particular, some APIs that programmers will use along with the GLS library assume that all character strings are terminated with a null character, others assume that each string consists of a pointer and length which indicates the number of bytes in the string. The GLS library is intended to be used with both.

Therefore, each of the GLS library functions that take a string argument allow you to pass them either a null-terminated string, or a string whose end is determined by the separate length that you pass them.

Multi-Byte Character String Termination

Each multi-byte character string that is passed to a GLS library function is represented by two arguments,
  ..., mbs, mbs_byte_length, ...
If mbs_byte_length is the value IFX_GL_NULL then the function will assume that mbs is a null-terminated string; otherwise the function assumes that mbs_byte_length is the number of bytes in the multi-byte character string. The null-terminator of a multi-byte string consists of one byte whose value is zero.

Multi-byte character strings which are not null-terminated are called length-terminated multi-byte strings and can contain null characters, but these null characters do not indicate the end of the string.

If mbs_byte_length is neither IFX_GL_NULL nor greater than or equal to zero, then the function gives the IFX_GL_PARAMERR error.

Multi-Byte Character Termination

Many GLS library functions operate on just one multi-byte character. Each multi-byte character that is passed to a GLS library function is represented by two arguments,
  ..., mb, mb_byte_limit, ...
If mb_byte_limit is IFX_GL_NO_LIMIT then the function will read as many bytes as necessary from mb to form a complete character; otherwise, it will not read more than mb_byte_limit bytes from mb when trying to form a complete character.

1. If mb is a character in a null-terminated multi-byte string, then mb_byte_limit must be equal to IFX_GL_NO_LIMIT. For example, if mbs points to a string of multi-byte characters that are null terminated,

  for ( mb = mbs; *mb != '\0'; mb += bytes )
    {
    if ( (bytes = ifx_gl_mblen(mb, IFX_GL_NO_LIMIT)) == -1 )
      /* handle error */
    }
2. If mb is a character in a multi-byte string which is not null-terminated or a character in a buffer by itself, then mb_byte_limit must be equal the number of bytes between where mb points and the end of the buffer which holds the string or character. For example, if mbs points to a string of multi-byte characters that are not null terminated and mbs_bytes is the number of bytes in that string,
  for ( mb = mbs; mbs_bytes > 0; mb += bytes, mbs_bytes -= bytes )
    {
    if ( (bytes = ifx_gl_mblen(mb, mbs_bytes)) == -1 )
      /* handle error */
    }
or if mb points to one multi-byte character and mb_bytes is the number of bytes in the buffer that holds the character,
  if ( (bytes = ifx_gl_mblen(mb, mb_bytes)) == -1 )
    /* handle error */

If the function cannot determine whether mb is a valid multi-byte character, because it would need to read more than mb_byte_limit bytes from mb or if mb_byte_limit is less than or equal to zero, then the function gives the IFX_GL_EINVAL error.

Wide-Character String Termination

Each wide-character string that is passed to a GLS library function is represented by two arguments,
  ..., wcs, wcs_char_length, ...
If wcs_char_length is the value IFX_GL_NULL then the function will assume that wcs is a null-terminated string; otherwise the function assumes that wcs_char_length is the number of characters in the wide-character string. The null-terminator of a wide-character string consists of one gl_wchar_t whose value is zero.

Wide-character strings which are not null-terminated are called length-terminated wide-character strings and can contain null characters, but these null characters do not indicate the end of the string.

If wcs_char_length is neither IFX_GL_NULL nor greater than or equal to zero, then the function gives the IFX_GL_PARAMERR error.

2. GLS Library Memory Allocation

Memory Allocation by GLS Library Functions

No GLS library function allocates memory that remains after the function returns. If a function allocates memory, this memory is only for temporary purposes and is freed before the function returns. Therefore, the caller of each function must allocate any memory needed by the function.

Memory Allocation by GLS Library Callers

Multi-byte character string allocation

Since the number of array elements in a multi-byte character string does NOT equal the number of characters in the string, the allocation of a multi-byte character string is NOT the same as the "old" single-byte method. For example, to statically allocate 20 multi-byte characters use,

gl_mchar_t mbs[20*IFX_GL_MB_MAX];

To dynamically allocate 20 multi-byte characters use,

gl_mchar_t *mbs = (gl_mchar_t *) malloc(20*IFX_GL_MB_MAX);

or to dynamically allocate a more precise estimate use,

gl_mchar_t *mbs = (gl_mchar_t *) malloc(20*ifx_gl_mb_loc_max());

To statically allocate 20 multi-byte characters plus a null-terminator use (note that the null-terminator only requires one byte),

gl_mchar_t mbs[20*IFX_GL_MB_MAX+1];

To dynamically allocate 20 multi-byte characters plus a null-terminator use,

gl_mchar_t *mbs = (gl_mchar_t *) malloc(20*IFX_GL_MB_MAX+1);

or to dynamically allocate a more precise estimate use,

gl_mchar_t *p = (gl_mchar_t *) malloc(20*gl_mb_loc_max()+1);

Wide-Character String Allocation

Since the number of array elements in a wide-character string equals the number of characters in the string, the static allocation of a wide-character string looks the same as the "old" single-byte method. For example, to statically allocate 20 wide-characters use,

gl_wchar_t wcs[20];

To dynamically allocate 20 wide-characters use,

gl_wchar_t *wcs = (gl_wchar_t *) malloc(20*sizeof(gl_wchar_t));

To statically allocate 20 wide-characters plus a null-terminator use (note that the null-terminator requires the space allocated for an entire wide-character),
gl_wchar_t wcs[21];

To dynamically allocate 20 wide-characters plus a null-terminator use,

gl_wchar_t *wcs = (gl_wchar_t *) malloc(21*sizeof(gl_wchar_t));

3. Keeping Multi-Byte Strings Consistent

Truncating Long Multi-Byte Strings

Sometimes the caller of GLS library functions will need to truncate a long character string so that it fits into a smaller buffer. Truncating a string that consists of just single-byte characters is easy. This is because truncating at an arbitrary byte location in the string will still result in a complete character string, albeit shorter.

However, truncating a string that can contain even one multi-byte character is difficult. This is because truncating at an arbitrary byte location in the string can result in truncating a multi-byte character in its middle such that the truncated string ends with the first 1, 2 or 3 bytes of a character without the character's remaining bytes.

If such a situation occurs, then subsequent traversal of the truncated string could result in reading beyond the end of the buffer.

Therefore, all GLS library functions which traverse one multi-byte character or traverse length-terminated multi-byte characters strings give a special error if they detect that an otherwise valid character has been truncated: IFX_GL_EINVAL.

If it is known that no truncation occurred to the string, then IFX_GL_EINVAL can be considered the same as IFX_GL_EILSEQ. However, if it is possible that truncation has occurred, then IFX_GL_EINVAL indicates to the caller that they need to further truncate the string so that the last byte of the string is the last byte of the last character in the string.

Depending upon your application, you may either end up making the truncated string even shorter than originally indented or you may have to replace the first 1, 2, or 3 bytes of the truncated character with a padding character that is appropriate for your application.

Even though the GLS library functions can be used to detect this situation after it has occurred, it is much better to use them to avoid the situation.

Fragmenting Long Multi-Byte Strings

Sometimes the caller of GLS library functions will need to fragment a long character string into two or more non-adjacent buffers to meet the memory management requirements of their component. Fragmenting a string that consists of just single-byte characters is easy. This is because fragmenting at arbitary byte locations in the string will still result in the fragments being consistent character strings.

However, fragmenting a string that can contain even one multi-byte character is difficult. This is because fragmenting at arbitrary byte locations in the string can result in fragmenting a multi-byte character in its middle such that one fragment ends with the first 1, 2 or 3 bytes of a character and the next fragment starts with the remaining bytes.

If the only thing you ever will do with these fragments is to concatenate them back together to form one string, then no special processing needs to be done. However, if you traverse the fragments as multi-byte strings, this can result in reading beyond the end of one fragment or finding an illegal character at the beginning of another.

Therefore, all GLS library functions which traverse one multi-byte character or traverse length-terminated multi-byte characters strings give a special error if they detect that an otherwise valid character has been truncated at the end of a fragment: IFX_GL_EINVAL. It is impossible to detect that the beginning of a fragment contains the remaining bytes of the last character in the previous fragment without looking at the previous fragment first. This is because the last 1, 2 or 3 bytes of a multi-byte character may look exactly like a valid character.

If it is known that no fragmentation occurred to the string, then IFX_GL_EINVAL can be considered the same as IFX_GL_EILSEQ. However, if it is possible that fragmentation has occurred, then IFX_GL_EINVAL indicates to the caller that they need to fragment the string so that the last byte of each fragment is the last byte of the last character in the fragment and so that the first byte of each fragment is the first byte of the first character in the fragment.

Depending upon your application, you may either end up making a fragment even shorter than originally indented or you may have to replace the first 1, 2, or 3 bytes of the fragmented character with a padding character that is appropriate for your application and shift these bytes to the beginning of the next fragment.

Even though the GLS library functions can be used to detect this situation after it has occurred, it is much better to use them to avoid the situation.

ACKNOWLEDGEMENT

Portions of this description were derived from the X/Open CAE Specification: "System Interfaces and Headers, Issue 4"; X/Open Document Number: C202; ISBN: 1-872630-47-2; Published by X/Open Company Ltd., U.K.