************************************************************************* There's a problem when two abbreviations overlap, because the number of characters saved by using them will not be what was expected. Right now this is handled, somewhat, by doing a dry run of the replacements, noting any that happen fewer than 2 times, and eliminating them before doing it for real. A better solution might be to find all abbreviations that overlap and for every set of conflicting ones, test all 2^N-1 possibilities of which ones to use and then use the best one. Ug. ************************************************************************* Add various options for lossy compression: - Ignore different whitespace inside replacements - Ignore different capitalization ************************************************************************* Add ability to define blacklist on the command line? Maybe not: undoing an abbreviation you don't like is easy to do with sed (at least if you haven't also asked for lossy compression). ************************************************************************* Replacement character improvements - Add more, improve mneumonic algorithm. - Try to ensure that the ones that are used are in common fonts. ************************************************************************* Allow for compressing files that already have UTF-8 characters. *************************************************************************