The SGML-based markup languages allow document authors to use special sequences of characters from the ASCII range (the first 128 code points of Unicode) to represent, or reference, any Unicode character, regardless of whether the character being represented is directly available in the document's encoding. This is generally done through some kind of "escaping" mechanism. For example, the widely used encodings based on ISO 8859 can only represent, at most, 256 unique characters as one 8-bit byte each.ĭocuments are rarely, in practice, ever allowed to use more than one encoding internally, so the onus is usually on the markup language to provide a means for document authors to express unencodable characters in terms of encodable ones. Sometimes, though, for reasons of convenience or due to technical limitations, documents are encoded with an encoding that cannot represent some characters directly. Ideally, when the characters of a document utilizing a markup language are encoded for storage or transmission over a network as a sequence of bits, the encoding that is used will be one that supports representing each and every character in the document, if not in the whole of Unicode, directly as a particular bit sequence. That is, a document consists, at its most fundamental level of abstraction, of a sequence of characters, which are abstract units that exist independently of any encoding. Markup languages are typically defined in terms of UCS or Unicode characters. List of numeric character references for the printable ASCII characters: Numerical character reference of U+00DF ß LATIN SMALL LETTER SHARP S Unicode character In SGML, HTML, and XML, the following are all valid numeric character references for the Latin small letter sharp s ß Numerical character reference of U+00C6 Æ LATIN CAPITAL LETTER AE Unicode character In SGML, HTML, and XML, the following are all valid numeric character references for the Latin capital letter AE Numerical character reference of U+03A3 Σ GREEK CAPITAL LETTER SIGMA In SGML, HTML, and XML, the following are all valid numeric character references for the Greek capital letter Sigma When the document is interpreted by a markup-aware reader, each NCR is treated as if it were the character it represents. NCRs are typically used in order to represent characters that are not directly encodable in a particular document (for example, because they are international characters that do not fit in the 8-bit character set being used, or because they have special syntactic meaning in the language). Since WebSgml, XML and HTML 4, the code points of the Universal Character Set (UCS) of Unicode are used. It consists of a short sequence of characters that, in turn, represents a single character. Please help improve this article by introducing citations to additional sources.įind sources: "Numeric character reference" – news Ī numeric character reference ( NCR) is a common markup construct used in SGML and SGML-derived markup languages such as HTML and XML. Relevant discussion may be found on the talk page. This article relies largely or entirely on a single source.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |