SYNOPSIS
unihist ([option flags])
DESCRIPTION
unihist generates a histogram of the characters in its input, which
must be encoded in UTF-8 Unicode. By default, for each character it
prints the frequency of the character as a percentage of the total, the
absolute number of tokens in the input, the UTF-32 code in hexadeci-
mal, and, if the character is displayable, the glyph itself as UTF-8
Unicode. Command line flags allow unwanted information to be sup-
pressed. In particular, note that by suppressing the percentages and
counts it is possible to generate a list of the unique characters in
the input.
Output is produced ordered by character code. To sort it in descending
order of frequency, pipe the output into the command:
sort -k1 -n -r
By default, unihist handles all of Unicode. To reduce memory usage and
increase speed, it may be compiled so as to handle only the Basic Mono-
lingual Plane (plane 0) by defining BMPONLY.
COMMAND LINE FLAGS
-c Suppress printing of counts and percentages.
-g Suppress printing of glyphs.
-h Print usage information.
-u Suppress printing of the Unicode code as text.
-v Print version information.
SEE ALSO
uniname (1)
REFERENCES
Unicode Standard, version 5.0
AUTHOR
Bill Poser
billposer@alum.mit.edu
LICENSE
GNU General Public License
Man(1) output converted with
man2html