Jud Cole is the President of Blink, Inc and has been programming on microcomputers since 1979 in many languages including Assembler, C, Pascal, PL/I, dBase and CA-Clipper. After working as a consultant programmer in the early 1980's he worked at IBM for three years in their PC division doing training and support on their entire range of PC products. Following this he wrote contract database applications interfacing CA-Clipper databases to external devices such as magnetic card readers, vehicle tachographs and mechanised storage systems. During 1989 and 1990 he developed BLINKER, the first dynamic overlay linker. Since then he has been enhancing and promoting BLINKER and speaking at user groups and conferences.
The aim of this section is to impart a greater understanding of how a CA-Clipper application works internally, with a view to writing more compact and efficient CA-Clipper applications.
This section describes how the CA-Clipper 5.x Virtual Memory Manager uses conventional memory, expanded memory and disk space to store both data and CA-Clipper code. We will examine the PUBLIC, PRIVATE, LOCAL and STATIC variable classes, storage of memory variable values of different types, and how the dynamic paging system manages CA-Clipper 5.x code at application run time.
The level of expertise of the reader is expected to be medium to high, assuming an in-depth knowledge of CA-Clipper programming, and a good knowledge of PCs, networks and programming techniques in general.
Terms and definitions Conventional memory is the memory which exists on all PCs and compatibles, and typically consists of 512kb or 640kb of memory on the mother board. Due to the architecture of the early IBM PCs, the maximum amount of conventional memory is usually limited to 640kb, although certain memory managers can provide another one or two hundred kb on some machines. In theory, the maximum conventional memory on a 8088 / 8086 processor is determined by the 1 Mb address space.
Expanded memory is memory which is also accessible on all PC compatibles. Programs which are to use expanded memory have to be explicitly written to do so. Expanded memory may be provided in the form of hardware or software emulation, and in later versions of the specification, known as EMS, is limited to 32 MB. It is managed in pages, typically 16 kb in size, which may be brought into an area of conventional memory to be accessed by a program.
The currently executing program will request a particular page of expanded memory from the expanded memory manager, and will provide an address in conventional memory at which to place the page. On return from the manager, the data contained in the requested page can be read or written to as if it were permanently resident in conventional memory.
Extended memory is memory accessible by the 80286 and later processors, and exists outside of the 1 Mb range of conventional memory. These processors can access extended memory directly when running in one of their enhanced modes. When running in an 8086 emulation mode, however, a programming interface to extended memory is necessary, and a number of these have been specified. The most widely used of these interfaces is known as the XMS specification.
Software memory managers will often manage extended memory and provide both EMS and XMS programming interfaces to it for maximum versatility.
Virtual memory is a technique which has been used for many years to enable programmers to write programs requiring more memory than is directly available on the destination machine. The technique provides a simple interface to memory for storing and retrieving code and data, whilst hiding the fact that the information may be stored on one or more alternative devices until it is needed again.
The virtual memory manager, which may be implemented in hardware, software, or a combination of the two, monitors the frequency and duration of usage of the information, and decides where to keep each piece of information for maximum overall performance of the system. Typically, the least recently used information will be saved out to slower devices, while the more recently or more often used information will be kept in fast, real memory.
CA-Clipper's virtual memory manager
CA-Clipper 5.x contains its own virtual memory manager, known as the VMM, to manage the data and CA-Clipper code belonging to the application. By default, the CA-Clipper VMM allocates all available conventional memory and up to 8 MB of expanded memory for this purpose. In addition, if all the available memory has been used up, the VMM will swap out data, but not code, to a temporary file on disk.
CA-Clipper currently makes no direct use of extended memory, so if the application will be running on a 386 PC or above, then obtaining a memory manager such as QEMM, 386MAX or the one supplied with MS DOS 6.0 will be a good investment.
Once a CA-Clipper application has loaded into memory and started executing, the application allocates the remaining real memory according to parameters set with the CLIPPER environment variable or the // command line options. The format of these is the same, and consists of the //, the letter or group of letters denoting the area, e.g. E for EMS, a ':' and a number indicating the size in Kb to be used for that area.
The parameters controlling allocation of memory are X:nnn and E:nnn. The X parameter specifies how much conventional memory to eXclude from use by CA-Clipper, and takes a value from 0 to 256 kb. The E parameter specifies how much expanded memory to allocate to the VMM, and takes a value from 0 to 8192 kb.
For example : TEST //E:1000
which would limit CA-Clipper to using 1 Mb of expanded memory.
It is worth noting in passing that the default of all available EMS up to a maximum of 8MB, or the E value if one is specified, is allocated to the VMM in one block at the start of the application. This means it is not available to any other part of the system until the application terminates and the memory is freed. In addition, the application could run out of conventional memory if there is too much EMS available to it, since a table of proportional size to the amount of EMS used is allocated in conventional memory. Depending on the amount of data manipulated by the application, a suitable maximum value may be 1000 to 2000, representing 1 - 2 Mb of expanded memory.
The other CA-Clipper parameters relevant to the VMM are the SWAPPATH:'path' and SWAPK:nnn parameters. If the application's conventional memory and EMS memory is fully utilised then the VMM will create a temporary swap file in the directory indicated by the SWAPPATH parameter, or in the current directory if no SWAPPATH is specified. This disk file will be used to store the least recently used data owned by the VMM, and will gradually increase in size until either the application has terminated and the file deleted, or the size limit set by the SWAPK parameter has been reached. The default size limit for the swap file if no SWAPK parameter is specified is 8 MB.
Virtual memory as managed by the VMM is allocated in segments, each of which may contain from 1K to 64K of data. When memory is allocated from the VMM, instead of returning a pointer to real memory it returns a form of segment number to identify the segment, in the same way as DOS returns a handle when a file is opened. Whenever the data within the segment is needed, a request is made to the VMM to return the current location of the segment in real memory where it can be read or written to.
Initially all the segments will be located in real memory, and because each segment is movable, real memory can be organised efficiently by filling up the gaps as segments are freed. Once real memory fills up, the VMM will swap out least recently used segments to EMS if it is available, or disk if not, to make room for new segments. If those segments are used at a later stage in the program, the VMM will swap out other segments to make room and bring the original segments back in, in the same way as an overlay manager manipulates code overlays.
CA-Clipper 5.x also contains a special type of memory manager designed to manage complex data values such as character strings and arrays. The CA-Clipper 5.x object memory is called the Segmented Virtual Object Store (SVOS). SVOS uses virtual memory managed by the VMM to store data values, including character strings, arrays, and dynamically created (macro-compiled) code blocks.
SVOS provides two important functions beyond the basic capabilities offered by the VMM, memory compaction and garbage collection.
Memory compaction consists of automatically compacting stored values on an ongoing basis. This eliminates fragmentation of the virtual memory and reduces swapping, since each segment can be fully utilised before requesting further segments.
Some CA-Clipper 5.x values (e.g., arrays) may be referred to by several program variables or array elements at the same time. The garbage collection routine automatically reclaims space occupied by values which are no longer accessible through any variable or array. By default, this occurs in background when CA-Clipper is in an idle state, e.g. waiting for keyboard input.
The real memory remaining to the VMM is set up as a swap area to bring swapped out pages of data into memory for use in the CA-Clipper program. When a RUN command is performed, as much of the top of the swap space as possible is cleared and returned back to DOS to be combined with the X area, and then the RUN command is issued. In this way more memory is freed up for RUN commands than would have been with Summer '87, although the exact amount will depend on the size and usage of the lower end of the swap space.
CA-Clipper's symbol table
The CA-Clipper language, and the dBase language on which CA-Clipper was originally based, is a dynamic language with a number of very powerful constructs. These allow and cause certain functions normally performed by the compiler to be postponed until run time, such as setting the type of variables and using macros to create new variables not known at compile time.
Because of this dynamic nature, at run time CA-Clipper requires more information about variables and procedures than traditonal lower level languages such as Pascal, C and Modula 2. Some of this information is available at compile and link time, such as the name of the variable, but some of it, such as its type, will only be available once the application has started executing.
For these reasons, each CA-Clipper .OBJ file is created with a symbol table of 16 bytes per symbol, and all code in the .OBJ file refers to that symbol table. At run time the symbol table entry is used to point to the control information and value or code for the symbol. The symbol table is created in its entirety in the root of the application, and can grow to upwards of 64 kb, so it can significantly affect the amount of conventional memory required by the application. This is why even 100% overlayed applications grow when code is added.
CA-Clipper 5.x introduced static and local variables to the language to encourage better and more efficient coding practices. Another important benefit is that these classes of variables do not require a symbol table entry as they cannot be accessed via macros. Changing as many PUBLIC and PRIVATE variables as possible to LOCAL or STATIC variables can therefore significantly reduce the amount of conventional memory required.
The major linkers now available remove the duplicate symbols from the symbol tables in the various .OBJ and .LIB files at link time, creating one large consolidated symbol table. This process, known as symbol table compression, can significantly reduce the run time memory requirement of the .EXE, leaving more memory for the application's data and overlays. All the duplicates are removed except the symbols belonging to procedures declared as static, since these are local to each .OBJ and will have different code associated with each occurrence of the symbol.
It is worth noting that prior to link time symbol table compression, the only way to reduce the number of duplicate symbols was to minimise the number of .OBJ files, but this is no longer necessary.
When compiled, each CA-Clipper procedure or function in the .OBJ file has a separate unit know as a segment, which consists of a small Assembly language header and a string of tokens. The header simply consists of pointers to the CA-Clipper symbol table and the tokenised code and a call to the CLIPPER.LIB procedure __PLANKTON. The tokenised code represents calls to functions within the CA-Clipper library and parameters to those functions. At application run time when the procedure or function is called the __PLANKTON procedure processes these tokens sequentially and performs the appropriate library calls with the parameters held in the tokens. Each token is usually only one byte long, with parameters varying in length, e.g. a real number will take up 8 bytes and a character string will be stored as the length followed by the string. Tokens may also refer to symbols in the symbol table described above, rather than referring to absolute locations, so each reference to a variable will consist of a two byte symbol number.
For example, in the code :
FUNCTION T A = B + C
we would have a symbol table containing :
T A B C
and the tokenised code would consist of (in simplified terms) :
Take symbol 2 (B) Take symbol 3 (C) Add them together Store result in symbol 1 (A)
This tokenised approach has a number of advantages over true compiled code, with only a neglible cost in performance. The code produced is very compact, for example taking only three bytes for a procedure call, as opposed to five for a direct call. It is also very self contained. All external references go via the symbol table, so operations such as incremental linking are made significantly easier. This approach also makes it possible to use the dynamic paging system described below for faster overlayed applications with lower memory requirements.
The size overhead of a simple CA-Clipper compiled .EXE is actually made up of the runtime routines from the CLIPPER.LIB which are called by the processing of the tokens. The apparently large size of even a "Hello world" type program is due to the potential for macro operations, which could execute just about any CA-Clipper command from even a two line program.
Instances of variables and their values
In conventional languages the scope, size and type of a named variable is known at compile time, so the exact amount of space can be reserved for it in memory at run time. This memory will always be used to store the value, no matter how often the value is changed.
The remaining memory above the program's .EXE image is usually known as the heap and is managed by a heap manager, which will allocate blocks of memory of varying size to the program as and when requested. Space for data allocated dynamically at run time, for constructs such as linked lists or buffers, whose sizes are not known at compile or link time, will be allocated from and returned to this heap.
With CA-Clipper, determination of the type and size of all variables and the scope of public and private variables is left until run time, so a more complicated mechanism for storing the values of variables is required.
CA-Clipper 5.x offers several different storage classes for program variables, depending on how they are declared and used in the program. LOCAL and STATIC variables are stored in a dedicated area of real memory, as described below. PRIVATE and PUBLIC variables, known as MEMVAR variables, are created and destroyed dynamically while a program is running, and are stored in VM segments.
For performance reasons, these segments remain locked in real memory during most operations except memory intensive operations and RUN commands. Each MEMVAR uses 20 bytes in a VM segment, so converting PRIVATE and PUBLIC variables to LOCAL and STATIC variables can reduce memory requirements for some applications.
At run time, each instance of a variable is allocated a value, which is represented internally as a data structure called a VALUE. The contents and format of a VALUE differ depending on the type of data it represents. Simple data, such as integers, are stored directly into the VALUE. Larger items, or data of variable length such as strings or arrays, have a "reference" to the string or array stored in the VALUE, and the actual data is stored elsewhere. Internally, CA-Clipper is organized as a stack based machine which uses an area of memory called the Eval Stack to contain temporary variables such as function parameters, intermediate results of expressions and local variables. The Eval Stack is simply a contiguous group of VALUEs that are accessed as a stack, in the same way as the processor stack is used by C programs.
For example, in a CA-Clipper function call, parameters are pushed onto the Eval Stack before the function is executed. The function operates on the top-most items in the Eval Stack and produces a result. After the function completes, the parameter values are popped from the Eval Stack and replaced with the function result.
Each entry in the Eval Stack, i.e. each VALUE, occupies 14 bytes, and for complex data types such as character strings, arrays and code blocks there will be an additional memory requirement handled by the VMM where the actual value is stored.
The Eval Stack is allocated from the default data segment, defined as the start of the group DGROUP, when the program starts executing, so initialisation will fail if DGROUP is too full. This is not usually a problem with pure CA-Clipper applications, but if a number of third party libraries are linked in to the application it may possibly fill up unless they have avoided storing data in DGROUP. The number of kb remaining in DGROUP for CA-Clipper's use can be examined by executing the program, with the //INFO parameter, and the amount of conventional and expanded memory available will be displayed at the same time.
LOCAL variables are the simplest variables, and are allocated as locations within the Eval Stack to store their VALUEs. To manipulate a LOCAL variable, the system simply copies the variable's VALUE from one position in the Eval Stack to another.
Local variables are visible only within the current procedure or function, and are created automatically each time the procedure in which they were declared begins executing. When that procedure terminates through a return, all it's LOCALs are removed from the Eval Stack and any associated VMM memory freed up.
STATIC variables are similar to LOCAL variables, but have a duration of the lifetime of the application. Because of their permanence, they are allocated as fixed locations at one end of the Eval Stack, but are manipulated in the same way as LOCAL variables simply by copying their VALUEs.
This means that every STATIC variable in the system also requires 14 bytes on the Eval Stack in DGROUP, which is another reason for C and ASM programmers to avoid storing data in DGROUP.
PRIVATE and PUBLIC variables are more complex than LOCAL or STATIC variables because in addition to an associated VALUE they also have a name which may be referred to during execution of the program via a macro or its equivalent. MEMVAR variables are allocated locations for their VALUEs in dedicated VM segments and these locations are stored with their names in the symbol table.
When a MEMVAR is manipulated, the symbol table entry is used to point to the VALUE which can then be placed on the Eval Stack in the normal way. FIELD variables differ from the other storage classes because they have no memory location at all, since their values are stored in a database record buffer. To manipulate a FIELD, the system generates a request to the file's database driver, which then creates an appropriate VALUE to be manipulated on the Eval Stack.
An array VALUE contains a reference to the array rather than an actual value, so when an array is assigned to a variable, the system simply overwrites the variable's VALUE with a new VALUE containing a reference to the array. The array itself is simply a group of VALUEs stored in virtual memory, where each element of the array is a VALUE. Any VALUE can contain another reference, so multidimensional arrays are created by having each element refer to another array rather than have an absolute value. When values are assigned to array elements, the VALUE for that element is updated. When an array is assigned to another variable, only a copy of the VALUE referring to the array is made, and the array data itself is not duplicated.
A character string VALUE contains a reference to the character data, which is stored elsewhere in the VM. As with arrays, assigning a character value to a variable simply overwrites the variable's VALUE with a new VALUE containing a reference to the character data.
In a similar way to arrays, assigning a character value from one variable to another simply duplicates the VALUE (i.e., the reference to the data). The character data itself is not duplicated.
This reference-based memory management technique is the same for strings, arrays, and code blocks. CA-Clipper's garbage collector monitors references to objects, and when there are no longer any references to a particular piece of data, the space occupied by that data is automatically reclaimed.
During program execution, when a macro is evaluated to the name of a variable or procedure, the symbol table is searched to find the requested name. Once the name is found, CA-Clipper follows the pointer in the symbol table to the VALUE where all the general information about the symbol is actually stored. The VALUE will indicate whether the procedure or variable being referenced has been defined, and CA-Clipper checks this before continuing any further. If it is undefined and is not a variable being created, CA-Clipper immediately returns an appropriate error - "undefined function" for procedures or functions, and "variable does not exist" for variables. If the procedure or function has been defined correctly, then the VALUE will contain a pointer to the program code to execute for that procedure, and control can be transferred to the procedure.
The remaining case of creating a new variable is handled by adding a new entry to the end of the symbol table. This new entry will have the name of the variable filled in, along with a pointer to a VALUE for the symbol, and will be used from then on to refer to the variable.
Both Summer '87 and CA-Clipper 5.x provide other mechanisms to avoid the creation of these dynamically named variables in the majority of circumstances, such as using an array of elements to store the values, or using code blocks in 5.x. These alternative mechanisms should be used wherever possible, if only because macro operations are inherently very slow, as each name in the symbol table has to be checked until a match is found before execution can continue.
If the use of a macro cannot be avoided, but the name to be created will be one of a known set, then these names should be mentioned explicitly somewhere in one of the programs. The code does not ever have to be executed, but just using the names causes them to be added to the symbol table at compile time, thus avoiding the above situation.
Code blocks are represented internally as strings of tokenised code, in the same way as normal procedures and functions. When a code block is assigned to a variable at run time, a pointer to the tokens making up the code block is stored in the variable, along with information pertaining to the currently active procedure.
Because the code block consists of normal tokens, it will include references to the symbol table, so the equivalent symbol table must be available when the code block is actually evaluated. This is one of the reasons why it will prove difficult (but not impossible) to save code blocks in a database from one application and restore and evaluate them at a later time in the same or another application.
CA-Clipper's dynamic paging system When linked with BLINKER or .RTLink, CA-Clipper 5.x performs its own form dynamic overlaying of compiled CA-Clipper code, which results in extremely fast, memory efficient execution of the code.
During linking all CA-Clipper modules are broken down into pages of 1 kb in size. These pages are stored either in the executable file or in separate overlay files. The manipulation of overlays in these 1 kb pages removes the effect the size of compiled functions or modules has on the memory required to load the overlay. Large modules are broken into multiple pages, and small functions are grouped together in a single page.
At execution time, CA-Clipper 5.x's dynamic overlay manager loads pages based on information embedded in the .EXE by the linker. The dynamic pages are loaded into VM (Virtual Memory) segments, allowing the VMM to manage the overlay pages on a competitive basis with other uses of memory such as the application data.
The paging architecture allows the system to discard low-use sections of code even if the code is still active, and reload it only when control returns to that piece of code. Code pages which are being heavily used are maintained in memory by the VMM's LRU swapping policy.
When possible, the VMM will place dynamic overlay pages in expanded memory, reducing overlay reads. Overlay pages are never written to the VMM disk swap file, however. If a VM segment containing an overlay page is to be removed from memory altogether, it is simply discarded. If it is needed subsequently, it is re-read from the overlay file. In addition to virtual memory, the dynamic overlay manager uses a dedicated area of real memory to cache the most active dynamic overlay pages. This page mechanism is made possible by the nature of the CA-Clipper code. As explained before, it is not actually code but a series of tokens which are processed at run time. This means that the __PLANKTON procedure from CLIPPER.LIB which is processing the tokens can detect when it has reached the end of a page and request the next one to be loaded. All CA-Clipper code is therefore overlayable, so there are no restrictions on which CA-Clipper .OBJs can be placed in the overlay area. It should be noted that linkers which use the dynamic paging mechanism of CA-Clipper 5.x automatically overlay ALL CA-Clipper code unless directed otherwise.