Conversion Tools: Unicode
AutoUniConv is an automatic Unicode converter.
You do not have to know the input's charset - AutoUniConv automatically
identifies the charset and converts it to Unicode afterwards. The most
common Unicode Transformation Format schemes (UTF-8, UTF-16, UTF-32) are
supported.
AutoUniConv is a C/C++ library with an easy to use interface that does
not have additional software dependencies.
license: commercial
convmv
can convert a single filename, a directory tree or all files on a filesystem to a different encoding. It only converts the encoding of filenames, not files contents. A special feature of convmv is that it also takes care of symlinks: the encoding of the symlink's target will be converted if the symlink itself is being converted.
It is also possible to convert directories to UTF-8 which are already partially UTF-8 encoded.
license: GPL
uni2ascii
converts UTF-8 Unicode to any of a variety of 7-bit ASCII
equivalents: hexadecimal and decimal HTML numeric character entities,
\u-escapes, standard hexadecimal, and raw hexadecimal. Such ASCII
equivalents are useful when including Unicode text in program source, when
entering text into Web programs that can handle the Unicode character set
but are not 8-bit safe, and when debugging.
license: GNU General Public License (GPL)
See
Linux-Magazin 9/2000 p. 136ff. (in German) .
#!/usr/bin/perl -piw
no warnings utf8; # warnings off, bug
tr/\80-\xff//CU # conversion to UTF-8
from Linux-Magazin 9/2000 p. 136ff.
#!/usr/bin/perl -piw
tr/\0-\xff//UC # conversion from UTF-8 to Latin-1 (8859-1)
Chinese Big5 -> Unicode online tools .
It is a Perl programe, for later GB(simple chinese)/JP -> Unicode online.
platform: Web-based
license: free using online
contributor <help_AT_hitstar.com>
GNU Recode
2utf
Translates various charsets to UTF-8
platform: Linux
utf2any translates a file encoded in UTF-7 or UTF-8
(Unicode) into any 7- or 8-bit text format.
Currently, mapping tables are supplied for
LaTeX, HTML, iso-8859-1 and iso-8859-15. These
tables don't provide a complete mapping, but
they can be easily extended to personal needs.
ftp.dante.de tex-archive/support
platform: Linux, Unix, MS-DOS
license: GPL
The
Unicode to UTF-8 Converter
takes Unicode values (in hexadecimal) and
encodes them as UTF-8, optionally displaying the resulting character and/or
the Unicode description thereof.
license: GNU General Public License (GPL)
hutrans
converts plain text into UTF-8 Unicode encoding. The riginal file
should contain HTML-style tags for any non-ASCII character. This program is a
complement to the functionality of uhtrans, and should be typically used along
with it.
platform: Linux
license: BSD type
ptrans
converts UTF-8 Unicode files into
plain text. Along with other programs found on the same web page, it completes
a suite of i18n tools allowing you to convert text files from any character
encoding to any other character encoding, and to and from UTF-8 Unicode encoding.
platform: Linux
license: BSD type
Letter Database
offers Online Conversion of Languages, Character Sets, Names, etc.
Convert Character Set
is meant to convert text strings between
different character set encodings. It features conversion between single
byte character sets, from single byte to multi-byte character sets
(UTF-8), and from multi-byte to single byte. All conversion output can be
saved with numeric entities (browser character set independent). The main
requirement is that a character has to be in both character sets, or it
will return an error.
license: Freeware
|