The CHAR
signature
The CHAR signature defines a type char of characters and provides basic operations and predicates on values of that type. There is a linear ordering supported on characters. In addition, there is an encoding of characters into a contiguous range of non-negative integers that preserves the linear ordering.
There are two structures matching the CHAR signature. The Char
structure defines a superset of the usual ASCII characters and locale-independent operations on them. For this structure, Char.maxOrd
= 255.
The optional WideChar
structure defines wide characters, which are represented by a fixed number of 8-bit words (bytes). If the WideChar is provided, it is distinct from the Char structure.
Synopsis
signature CHAR
structure Char
: CHAR
structure WideChar
: CHAR
Interface
eqtype char
eqtype string
val minChar : char
val maxChar : char
val maxOrd : int
val ord : char -> int
val chr : int -> char
val succ : char -> char
val pred : char -> char
val < : (char * char) -> bool
val <= : (char * char) -> bool
val > : (char * char) -> bool
val >= : (char * char) -> bool
val compare : (char * char) -> order
val contains : string -> char -> bool
val notContains : string -> char -> bool
val toLower : char -> char
val toUpper : char -> char
val isAlpha : char -> bool
val isAlphaNum : char -> bool
val isAscii : char -> bool
val isCntrl : char -> bool
val isDigit : char -> bool
val isGraph : char -> bool
val isHexDigit : char -> bool
val isLower : char -> bool
val isPrint : char -> bool
val isSpace : char -> bool
val isPunct : char -> bool
val isUpper : char -> bool
val fromString : String.string -> char option
val scan : (Char.char, 'a) StringCvt.reader -> 'a -> (char * 'a) option
val toString : char -> String.string
val fromCString : String.string -> char option
val toCString : char -> String.string
Description
-
eqtype char
-
-
eqtype string
-
-
minChar
-
is the least character in the ordering. It always equals
chr 0
.
-
maxChar
-
is the greatest character in the ordering.
-
maxOrd
-
is the greatest character code; equals
ord maxChar
.
-
ord c
chr i
-
returns the integer code of the character c and the character whose code is i, respectively. The function chr raises Chr if i < 0 or i >
maxOrd
. When chr is restricted to the interval [0,maxOrd
], these two functions denote the character encoding function and its inverse.
-
succ c
-
returns the character immediately following c in the ordering, or raises Chr if c =
maxChar
. When defined,succ c
is equivalent tochr(ord c + 1)
.
-
pred c
-
returns the character immediately preceding c, or raises Chr if c =
minChar
. When defined,pred c
is equivalent tochr(ord c - 1)
.
-
c < d
c <= d
c > d
c >= d
-
compare characters in the character ordering. Note that the functions ord and chr preserve orderings.
-
compare (c, d)
-
returns LESS, EQUAL, or GREATER, according as c precedes, equals, or follows d in the character ordering.
-
contains s c
-
returns
true
if character c occurs in the string s; otherwisefalse
.Implementation note:
In some implementations, the partial application of contains to s may build a table, which is used by the resulting function to decide whether a given character is in the string or not. Hence it may be expensive to compute
val p = contains s
, but fast to computep c
for any given character c.
-
notContains s c
-
returns
true
if character c does not occur in the string s;false
otherwise. Equivalent tonot(contains s c
).Implementation note:
As with contains, notContains may be implemented via table lookup.
-
toLower c
toUpper c
-
returns the lowercase (respectively, uppercase) letter corresponding to c if c is a letter; otherwise returns c.
-
isAlpha c
-
returns
true
if c is a letter (lowercase or uppercase).
-
isAlphaNum c
-
returns
true
if c is alphanumeric (a letter or a decimal digit).
-
isAscii c
-
returns
true
if c is a (seven-bit) ASCII character, i.e., 0 <=ord
c <= 127. Note that this function is independent of locale.
-
isCntrl c
-
returns
true
if c is a control character. Equivalent tonot o isPrint
.
-
isDigit c
-
returns
true
if c is a decimal digit (0-9).
-
isGraph c
-
returns
true
if c is a graphical character, that is, it is printable and not a whitespace character.
-
isHexDigit c
-
returns
true
if c is a hexadecimal digit (0-9, a-f, A-F).
-
isLower c
-
returns
true
if c is a lowercase letter.
-
isPrint c
-
returns
true
if c is a printable character (space or visible), i.e., not a control character.
-
isSpace c
-
returns
true
if c is a whitespace character (space, newline, tab, carriage return, vertical tab, formfeed).
-
isPunct c
-
returns
true
if c is a punctuation character: graphical but not alphanumeric.
-
isUpper c
-
returns
true
if c is an uppercase letter.
-
fromString s
scan getc strm
-
scan a character (including space) or an SML escape sequence representing a character from the prefix of a string or a character stream. After a successful conversion, fromString ignores any additional characters in s. If no conversion is possible, e.g., if the first character is non-printable (i.e., not in the ASCII range 0x20-0x7E) or starts an illegal escape sequence, NONE is returned.
The allowable escape sequences are:
\a Alert (ASCII 0x07) \b Backspace (ASCII 0x08) \t Horizontal tab (ASCII 0x09) \n Linefeed or newline (ASCII 0x0A) \v Vertical tab (ASCII 0x0B) \f Form feed (ASCII 0x0C) \r Carriage return (ASCII 0x0D) \\ Backslash \" Double quote \^c A control character whose encoding is C - 64, where C is the encoding of the character c, with C in the range [64,95]. \ddd The character whose encoding is the number ddd, three decimal digits denoting an integer in the range [0,255]. \uxxxx The character whose encoding is the number xxxx, four hexadecimal digits denoting an integer in the ordinal range of the alphabet. \f...f\ This sequence is ignored, where f...f stands for a sequence of one or more formatting characters.
In the escape sequences involving decimal or hexadecimal digits, the sequence of digits is taken to be the longest sequence of such characters. If the resulting value cannot be represented in the character set, NONE is returned.
-
toString c
-
returns a printable string representation of the character, using, if necessary, SML escape sequences. Printable characters, except for
#"\\"
and#"\""
, are left unchanged. Backslash#"\\"
becomes"\\\\"
; double quote#"\""
becomes"\\\""
. The common control characters are converted to two-character escape sequences:Alert (ASCII 0x07) "\\a" Backspace (ASCII 0x08) "\\b" Horizontal tab (ASCII 0x09) "\\t" Linefeed or newline (ASCII 0x0A) "\\n" Vertical tab (ASCII 0x0B) "\\v" Form feed (ASCII 0x0C) "\\f" Carriage return (ASCII 0x0D) "\\r"
The remaining characters whose codes are less than 32 are represented by three-character strings in ``control character'' notation, e.g.,#"\000"
maps to"\\^@"
,#"\001"
maps to"\\^A"
, etc. All other characters (i.e., those whose codes are 127 or greater) are mapped to four-character strings of the form"\\ddd"
, whereddd
are the three decimal digits corresponding to a character's code.
-
fromCString s
-
scans a character (including space) or a C escape sequence representing a character from the prefix of a string. After a successful conversion, fromCString ignores any additional characters in s. If no conversion is possible, e.g., if the first character is non-printable (i.e., not in the ASCII range 0x20-0x7E) or starts an illegal escape sequence, NONE is returned.
The allowable escape sequences are given below (cf. Section 6.1.3.4 of the ISO C standard ISO/IEC [CITE]9899:1990/).
\a Alert (ASCII 0x07) \b Backspace (ASCII 0x08) \t Horizontal tab (ASCII 0x09) \n Linefeed or newline (ASCII 0x0A) \v Vertical tab (ASCII 0x0B) \f Form feed (ASCII 0x0C) \r Carriage return (ASCII 0x0D) \? Question mark \\ Backslash \" Double quote \' Single quote \^c A control character whose encoding is C - 64, where C is the encoding of the character c, with C in the range [64,95]. \ddd The character whose encoding is the number ddd, where ddd consists of one to three octal. \uxxxx The character whose encoding is the number xxxx, where xxxx is a sequence of hexadecimal digits.
In the escape sequences involving octal or hexadecimal digits, the sequence of digits is taken to be the longest sequence of such characters. If the resulting value cannot be represented in the character set, NONE is returned.
-
toCString c
-
returns a printable string corresponding to c, with non-printable characters replaced by C escape sequences. Specifically, printable characters, except for
#"\\"
,#"\""
,#"?"
and#"'"
are left unchanged. Backslash#"\\"
becomes"\\\\"
; double quote#"\""
becomes"\\\""
, question mark#"?"
becomes"\\?"
, single quote#"'"
becomes"\\'"
. The common control characters are converted to two-character escape sequences:Alert (ASCII 0x07) "\\a" Backspace (ASCII 0x08) "\\b" Horizontal tab (ASCII 0x09) "\\t" Linefeed or newline (ASCII 0x0A) "\\n" Vertical tab (ASCII 0x0B) "\\v" Form feed (ASCII 0x0C) "\\f" Carriage return (ASCII 0x0D) "\\r"
All other characters are represented by one to three octal digits, corresponding to a character's code, preceded by a backslash.
Discussion
In WideChar, the functions toLower, toLower, isAlpha,..., isUpper are locale-dependent. In Char, these functions are locale-independent, with the following semantics:
isUpper c |
true if #"A" <= c andalso c <= #"Z"
|
isLower c |
true if #"a" <= c andalso c <= #"z"
|
isDigit c |
true if #"0" <= c andalso c <= #"9"
|
isAlpha c |
true if isUpper c orelse isLower c
|
isAlphaNum c |
true if isAlpha c orelse isDigit c
|
isHexDigit c |
true if isDigit c orelse (#"a" <= c andalso c <= #"f") orelse (#"A" <= c andalso c <= #"F")
|
isGraph c |
true if #"!" <= c andalso c <= #"~"
|
isPrint c |
true if isGraph c orelse c = #" "
|
isPunct c |
true if isGraph c andalso not (isAlphaNum c)
|
isCtrl c |
true if not (isPrint c)
|
isSpace c |
true if (#"\t" <= c andalso c <= #"\r") orelse c <= #"\ "
|
isAscii c |
true if 0 <= ord c andalso ord c <= 127
|
toLower c |
chr (ord c + 32) if isUpper c ; otherwise, c
|
toUpper c |
chr (ord c - 32) if isLower c ; otherwise, c
|
See Also
Locale, MultiByte, STRING