
Of these three, only UTF-8 should be used for Web content. There are three different Unicode character encodings: UTF-8, UTF-16 and UTF-32. Add to that the figure for ASCII-only web pages (since ASCII is a subset of UTF-8), and the figure rises to around 80%. In fact, in January 2012 Google reported that over 60% of the Web in their sample of several billion pages was now using UTF-8. Numerous scripts, such as Arabic and Indic, require additional rules to transform the character sequence in memory to an appropriate sequence of font glyphs for display.Īny barriers to using Unicode are very low these days. Support for a given encoding, even a Unicode encoding, does not necessarily imply that a user agent will correctly display the text. This significantly reduces the complexity of dealing with a multilingual site or application.Ī Unicode encoding also allows many more languages to be mixed on a single page than any other choice of encoding. Its use alsoĮliminates the need for server-side logic to individually determine the character encoding for each page served or each incoming form submission. (The declaration just helps the browser interpret the sequences of bytes in which the text is stored.)Ī Unicode-based encoding such as UTF-8 can support many languages and can accommodate pages and forms in any mixture of those languages. However, it is important to understand that just declaring an encoding inside a document or on the server won't actually change the bytes you need to save the text in that encoding to apply it to your content. Details Applying an encoding to your contentĬontent authors should declare the character encoding of their pages using one of the methods described in Declaring character encodings in HTML. In addition to declaring the encoding of the document inside the document and/or on the server, you need to save the text in that encoding to apply it to your content.ĭevelopers also need to ensure that the various parts of the system can communicate with each other. If you really can't use a Unicode encoding, check that there is wide browser support for the page encoding that you have selected, and that the encoding is not on the list of encodings to be avoided according to recent specifications.Ĭheck whether your choice will be affected by HTTP server-side settings. Quick answerĬhoose UTF-8 for all content and consider converting any content in legacy encodings to UTF-8. If you need to better understand what characters and character encodings are, see the article Character encodings for beginners.
CODA2 ENCODING TYPE HOW TO
how to actually produce a document in that encoding. This article offers simple advice on which character encoding to use for your content, and how to apply it, ie. In this context, that key is called a character encoding. Like codes used in espionage, the way that the sequence of bytes is converted to characters depends on what key was used to encode the text. Sometimes more than one byte is used to represent a single character. But content is stored in a computer as a sequence of bytes, which are numeric values. Characters represent letters of the alphabet, punctuation, etc.

Which character encoding should I use for my content, and how do I apply it to my content?Ĭontent is composed of a sequence of characters.
