Documentation on the workings of PhysCalc ----------------------------------------- Physcalc is divided into the following modules: PhysMain - Includes the main() function, the command parser, and the help routines. PhysNode - Routines for allocating, deallocating, and manipulating nodes. PhysConv - Routines for manipulating dimensioned numbers and doing unit conversions. PhysSolv - Routines for parsing and evaluating expressions. PhysOper - Functions which do the actual math; PhysCalc's built in operators and functions. PhysMLib - Standard routines lifted from some of my other programs, put here for completeness. The Origins and Evolution of PhysCalc Initially, PhysCalc was two unrelated programs: a program to do unit conversion, and a program to evaluate simple arithmetical expressions using the standard rules of precedence. The program to do unit conversion, called Convert, was a translation and enhancement of a BASIC program from Byte magazine (Vol 10/No 3 [March 85] by David Kahn). It was fixed at six fundamental dimensions, and came with many of the units found in PhysCalc. Before being integrated into PhysCalc, I added the feature of having it read in the unit data from a file, instead of just having it be static data inside the program. The program to evaluate expressions (SOLVE.C) was my own creation, as part of my effort to understand the nature of compilers. All it did was take expressions involving parenthesis, addition, subtraction, multiplication, and division, and computed the result using standard algebraic precedence of operations. After some more tinkering, I created a mechanism whereby instead of recursively calling a new function for each operator, function, or level of precedence, I simply had one routine which used a table of information and a stack to do most of the recursive-descent parsing. Somewhere along the way, I had the notion to combine the two programs together to compute arithmetic using dimensioned quantities. The SOLVE program already had a generic NUMBER type which could be redefined to be a dimensioned number. I soon added user-defined variables and functions, plus a help feature. The program initially fell naturally into three main components: main, solve, and convert. I divided the program up into modules mostly because they were too much to conveniently handle in one large text file. The Main module included the main() function, and the functions for the commands LET, DEFINE and HELP, and some miscellaneous functions. The Solve module contained the routines to parse and evaluate expressions - there was no distinction made between parsing and evaluating. The next step was to make the distinction between parsing and evaluating, which required an intermediate tree structure to hold the information. I put routines to do the support work for this into another file, Nodes. After the routines to parse and evaluate (in file Solve) became too big and numerous, I decided to split off the built-in operator functions and function functions, and the associated parsing data, into another file, Oper. When I decided to make this an entirely stand-alone program that could be packed up, sent off, and compiled elsewhere, I put the few functions I was using from my private library into another separate small module, MLib. At this point, the program was pretty powerfull and flexible, but the source code was slowly turning into mush as I added features. I had one large header file (PhysCalc.h) which contained all the declarations, includes, and structures common to all the modules, the result of which 90% of my functions were being declared for use by all modules, and the program was still essentially one large heavily-interconnected set of functions with no true modularity to my grouping of functions. So I concentrated on trying to really modularize the program. I explicitly declared functions to be LOCAL, IMPORT, or EXPORT, and did some rearranging to make as many functions LOCAL as possible. I tried to narrow down the scope of as many things as reasonably possible. I divided up the header file in two - one for structure declarations, includes, and constants, and the other for function and variable declarations. Internal program structure I have tried to separate the source code into several semi-separate chunks for different aspects of the program: main program and command handling, parsing and evaluating, built in functions and operators, definitions and declarations, unit conversion and handling, node handling. If major modifications are needed in the future, I will reorganize it along object-oriented lines. Expressions are parsed using a flexible recursive descent parser. The program does not use a parse table, but has prefix/infix/postfix and priority information provided for each operator, and so does not need separate routines to parse each operator or level of precedence. Function parse() in PHYSSOLV.C contains the guts of this mechanism. Expressions are parsed into a tree. Each node is either a kind of number (integer, real, fractional, dimensional), a string identifying a variable, or a list of sub-nodes. Depending on the node type and where the node is being used, the sub-nodes represent operands for a built in operator, a function name and parameters, or the information associated with a user defined variable or function. Upon evaluation, the tree is traveled and all variables are replaced with their associated values and functions are expanded and evaluated. If the program encounters an unknown, the expression tree at that point is left unevaluated. Documentation for the structures declared in PHYSCALC.H ------------------------------------------------------- typedef struct d_num { /* Structure for holding a dimensional number */ double num; /* scalar value */ int di[ MAXDIM ]; /* dimensions */ } DNUM; This is the structure for a dimensioned number, containing the scalar value, "num", and the dimensions "di". "di" is an array of integers, one for each possible fundamental dimension. The integer corresponding to a dimension indicates the power of that dimension (squared, cubed, ect.). A power of zero indicates that the dimensioned number does not have that dimension. "DNUM" is a shorthand reference to structure d_num. struct fractstruct { /* Fraction structure */ int numer,denom; }; Struct fractstruct represents a fraction. "numer" and "denom" contain the values of the numerator and denominator, respectively, of the fraction. The above two structures are analagous to the regular C numeric data types int, double, and so on. At the time I created them, I did not know much about object oriented programming and I did not worry about how they were used and accessed. If I wrote them again, I would limit access to the members through a few functions and macros to increase modularity and encapsulation. typedef struct nodestruct { /* Holds multiple var types */ int type; union { char str[1]; /* String */ double dbl; /* double number */ long lng; /* long integer */ struct fractstruct frt; /* Fraction */ struct d_num dmn; struct nodestruct *list[1]; /* List of ptrs to nodes */ } *data; /* Pointer to the above union */ } NODE,*NODEP; /* sizeof(NODE) should be 4 bytes */ This structure is the foundation of PhysCalc, allowing functions to pass around and work on data for which the type cannot be known at compile time. "NODE" and "NODEP" are shorthand references to this structure. The compile time sizeof() this node is correct, but for "*data", it is misleading, since at run time only as much room as is needed by a particular node is allocated, and the string and list types are of variable size. In a way, this is two structures, a node, and the data it points to. Since there should be a one to one correspondence between a node and it's data, it is conceptually one big structure. It could probably (and perhaps should) have been defined as one piece, without the extra pointer to the data. It ended up this way because it enable changing the node's type without risking changing the pointer to it. Nodes are the building blocks of expressions. Nodes can point to other nodes, or contain data. A node represents an arbitrarily complex expression. Nodes for functions or operators contain pointers to other nodes which contain the function name, and the expressions to be passed as parameters. "type" contains the type of the node - which way union "data" is to be interpreted. Type can be set to SNODE, INODE, FNODE, DNODE, NNODE, or LNODE+i, indicating "str", "lng", "frt", "dbl", "dmn", and "list" respectively. "*data" then points to the actual contents of the node. Types INODE and DNODE indicate usage of components "lng" and "dbl" respectively, and these are just plain numbers (long or double). Types FNODE and NNODE are for "frt" and "dmn", fractions or dimensional numbers. These are structures themselves instead of basic types. Type SNODE is for string nodes. The data pointer points to a NULL terminated array of characters. A string can be any size, but requires at least one character (the NULL), so "str" is declared to be one character. I would have declared component "str" without a size, but then sizeof() won't work. A string node is usually interpreted as referring to a variable, named in the string. Type LNODE indicates that the node is a function. The contents is a NULL-terminated list of pointers to nodes. The first pointer should be to a string node containing the name of the function, and the rest of the pointers are to expressions for each parameter. Types greater than LNODE are reserved for built-in operators. Operator number n has node type (LNODE + n). The list of pointers should contain one or two expressions (depending on the kind of operator). Module PhysNode.C contains the code for creating, destroying, copying, and manipulating nodes. For example, for node n, the expression (3 + f(x)) would be represented: ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º Type = (LNODE + 1) º Úĺ type = INODE º º data->list[0] = ÄÄÄÄÄ×ÄÄÄÙ º data->lng = 3 º º data->list[1] = ÄÄÄÄÄ×Ä¿ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ º data->list[2] = NULL º ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ Úĺ type = SNODE º ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ º data->str = "f" º º type = LNODE ºÙ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ º data->list[0] = ÄÄÄÄÄ×ÄÄÄÙ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º data->list[1] = ÄÄÄÄÄ×ÄÄÄÄĺ Type = SNODE º º data->list[2] = NULL º º data->str = "x" º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ extern struct dimstruct { /* Holds a dimension definition */ struct dimstruct *next; /* ptr to next in list */ char *name; /* dimension name */ int dimension[ MAXDIM ]; /* dimensionality */ }; This is the structure for the storage of dimension information. The information is stored in a linked list, each element containing the name of the dimension an it's associated dimensions. "next" is the pointer to the next dimension in the list. "name" points to the name of the dimension. "dimension" is an array of integers representing the dimensions, like in structure d_num. extern struct varstruct { /* Holds a user-variable */ struct varstruct *next; char *name; NODEP value; int rflag; /* flag to prevent infinite recursion */ }; Variables are contained in a linked list. "next" points to the next variable in the list. "name" points to the name of the variaable. "value" points to a node containing an expression associated with the variable. "rflag" is a flag to prevent recursion when evaluating variables. If "rflag" is TRUE, the variable is left unevaluated because the variable is already being evaluated. extern struct ufstruct { /* Holds a user-function */ struct ufstruct *next; char *name; NODEP dummy; /* List-node of dummy strings */ NODEP value; }; User defined functions are also held in a linked list, where "next" points to the next function definition in the list. "name" points to the name of the function. "dummy" points to a special node (type LNODE) which is not a function, but a list of pointers to string nodes. These string nodes are the parameters to the function. "value" points to the function expression. Any variables in that expression matching one of the strings in "dummy" are replaced with the corresponding expression passed to the function upon evaluation. NOTE: PhysCalc was originally written to convert everything to uppercase and then work with it. It has been retrofitted to not convert filenames. Hopefully, the retrofit did not cause other case-sensitivity problems.