It is the thesis of this work that many computer software systems being built today are similar and should be built out of reusable software components.
The appropriate use of software components is investigated by analogy to the classical engineering question of whether to build an object out of custom-made parts or standard parts and assemblies. The same analogy is used to explain some of the problems with previous work on reusable software. The result of reasoning with the engineering analogy is that the reuse of software results only from the reuse of analysis, design, and code; rather than just the reuse of code.
The concept of domain analysis is introduced to describe the activity of identifying the objects and operations of a class of similar systems in a particular problem domain. A domain analysis is represented by a domain-specific language, a prettyprinter, source-to-source transformations, and software components.
A domain's software components map statements from the domain into other domains which are used to model the objects and operations of the domain. Software components represent implementation choices. The components are managed using a module interconnection language to insure usage constraints.
The source-to-source transformations represent domain-specific optimizations, independent of any implementation, which are used to optimize statements in the domain. The transformations are useful primarily when the domain is used as a modeling domain. A method of automatically producing metarules for a set of transformations is described. The metarules remove the burden of having to suggest individual transformations from the user.
A formal model of the usage constraints and modeling possibilities of a set of domains is presented. It is shown that the reusability question ("Can a particular domain-specific program be refined into an executable language using a given a set of domains?") can be answered using the formal model.
Experiments using a prototype system, Draco 1.0, which embodies the concepts described, are presented and the results discussed. The largest example results in approximately 20 pages of source code and uses eight modeling domains. Each object and operation in the resulting program may be explained by the system in terms of the program specification.
Related work in the areas of automatic programming, program generation, programming languages, software engineering, and transformation systems is presented.
Finally, some future work in this area is outlined.
Each year more than $50,000,000,000 are spent on software production and evolution in the United States [Lehman80]. The traditional term of "maintenance" for all work on a software system after it is initially constructed is misleading. We prefer the term "evolution" after [Lehman80, Scacchi80, vanHorn80] to denote the repair, adaptation, and enhancement of a software system. This huge sum is spent on something which cannot be seen, felt, touched, tasted or heard in the conventional sense. The intangible nature of software has caused much of the problem in its production. There is no sense feedback in the production of software. Over the past years, the problem of software production has been growing rapidly with the increased size of software systems. Today "personal computers" threaten to be able to hold the largest software systems built. Unless techniques to create software increase dramatically in productivity, the future of computing will be very large software systems barely being able to use a fraction of the computing power of extremely large computers.
By "software crisis," we mean that there is a demand for quality software which cannot be met with present methods of software construction. The judgement as to whether the software is needed or whether more software is better is not made here. Some of the points which have brought about the software crisis are listed below:
This dissertation describes a software production technique based on the concept of parts and assemblies. The concept has been very successful in the production of standardized objects such as computer hardware. It is the goal of this work to increase software construction productivity as a partial answer to the software crisis.
The beginning of the software crisis was heralded by the failure of some very large software systems to meet their analysis goals and delivery dates in the 1960's. These systems failed regardless of the amount of money and manpower allocated to the projects. These failures led to a conference on the problem of software construction which marked the beginning of software engineering [Buxton76]. Studies of the process of software construction have identified the phases that a software project goes through and these phases have been combined into a model called the software lifecycle.
If we view the lifetime of a software system as consisting of the phases requirements analysis, design, coding, integration and testing, and evolution, then typical costs of the different phases [Brooks74, Boehm73] excluding requirements analysis are shown in figure 1. This early view of the lifecycle serves our purpose here but it is important to note that more recent views of the lifecycle [Kerola81, Scacchi80] are more sensitive to the needs of the organization requesting the system, the dynamics of the organization building the system, and the information processing abilities of the people developing the system.
9% design
6% coding
15% integration and testing
70% evolution
It is difficult to test high-impact tools for software production for three reasons. One reason is that the tools are used in a complex social setting where not all the users are motivated by a desire for high software production. A second reason is that producing software is very expensive and the data collection required is an added expense to an already expensive process. The third difficulty in testing high-impact tools is that there are no really good system quality metrics with which to judge the resulting system built using the tool. It is difficult to judge the worth of the resulting system to the organization which desired it. Many requests for "maintenance" on a completed system may mean either that the system was built poorly with many errors or that it was built well enough that the users see enhancements which could make a good system even better.
The software production technique described in this dissertation is, in our view, a high-impact tool which inherits the difficulties of testing mentioned above. We have not attempted to statistically verify an increase in software productivity or judge the "goodness" of the systems resulting from the use of the tool. Such a study should be a requirement before any technique is ever used in production.
The idea of using standard parts and forming them into assemblies is a very old idea.
"Eli Whitney of New Haven Conn. received an order for a large number of guns in 1789. Instead of hiring a large number of individual gunsmiths, he designed interchangeable parts and made jigs and fixtures so that relatively unskilled people could make the parts. He missed his delivery schedule, but he is credited with having invented the parts-and-assemblies approach with re-usable parts. The principles and techniques of the parts-and-assemblies approach have since become well known, and automated support for the documentation exists throughout industry." [Edwards74]The parts-and-assemblies approach has been used extensively in engineering and is one of the techniques which has enabled computer hardware engineers to increase the power and capacity of computers in a short time. Henry Ford combined the idea of parts and assemblies with the idea of an assembly line to make model-T Fords. It is important here to understand that the parts-and-assemblies idea does not infer the use of assembly lines.
There are two basic approaches to building things. The craftsman approach relies on a highly skilled craftsman to build an object from raw materials. The raw materials are fashioned into custom parts and fitted together to form custom assemblies. The parts-and-assemblies approach relies on already built standard parts and standard assemblies of parts to be combined to form the object. Each of the approaches has its good and bad points.
With the craftsman approach, the custom parts and assemblies are tailored to the specific problem at hand. These custom parts represent a very efficient implementation; probably better than could be built from standard parts. Given the time, a craftsman always builds a better object than one constructed from standard parts. By "better" here we mean more responsive to the goals of construction [Alexander64]. The craftsman approach has its drawbacks in that craftsmen are expensive to employ and hard to find. Any system built by a craftsman is a custom system and will require custom maintenance. This means that the maintenance must be done by a craftsman who must shape new custom parts to fit with the old custom parts in an object.
The parts-and-assemblies approach offers cheaper construction costs since the object is built from pre-built standard parts. An assembly is a structure of standard parts which cooperate to perform a single function. The use of standard parts and assemblies will supply some knowledge about the failure modes and limits of the parts. This information is unavailable with custom parts. Use of standard parts also creates a language for discussion of future objects and extensions to objects currently under construction. The parts-and-assemblies approach has its drawbacks in that the design of useful standard parts and assemblies is very expensive work and requires craftsman experience. Also, once a set of standard parts is created it may not suffice to construct all the objects desired.
From a different viewpoint, an assembly is a part.
"We understand complex things by systematically breaking them into successively simpler parts and understanding how these parts fit together locally. Thus, we have different levels of understanding, and each of these levels corresponds to an abstraction of the detail at the level it is composed from. For example, at one level of abstraction, we deal with an integer without considering whether it is represented in binary notation or two's complement, etc., while at deeper levels this representation may be important. At more abstract levels the precise value of the integer is not important except as it relates to other data." [Knuth74]Thus, an assembly at a different level of abstraction becomes a part. This idea will become important later when we discuss the problems encountered by previous work on software parts.
From the discussion of the pros and cons of the craftsman and the parts-and-assemblies approaches, it is apparent that the parts-and-assemblies approach is appropriate only to those situations where many similar objects are to be built. Otherwise, the cost of producing the standard parts by a craftsman is much greater than the cost saved by using standard parts. If an object to be built is a one-of-a-kind custom object it should be built by a craftsman; otherwise it should be determined if the parts-and-assemblies approach could be cost effective.
Historically, software construction has taken the craftsman approach. In the early days of computing, the software systems were one-of-a-kind and the craftsman approach was the natural approach. Today quite a few software systems being built by the craftsman approach are similar. In particular, the construction of system software (text editors, assemblers, compilers, etc.), business data processing systems (inventory, accounting, billing, etc.), and simple process control systems are all areas where many thousands of similar systems exist. It is not at all clear that the constructors of these systems are craftsmen. In fact, with the rapidly increasing numbers of analysts and programmers [Lientz80] it is doubtful that the constructors of these systems are craftsmen. In our view, the high cost of custom software systems has never been clearly represented since the use of standard parts and assemblies to build low-cost systems has not been an alternative.
Historically, hardware construction has taken the parts-and-assemblies approach. Even though early computers were one-of-a-kind, the parts-and-assemblies approach was the natural choice since hardware failures were a major problem and the approach is an excellent technique for organizing maintenance. The machines were maintained by replacing assemblies and studying the failure modes of the parts and assemblies. This same maintenance technique is still in use today.
In the next chapter we shall discuss the problems of using the parts-and-assemblies concept in the construction of software. Also, under the assumption that many software systems being constructed today are similar, we shall outline a method for constructing software using parts and assemblies and advocate its use in the construction of similar systems.
The purpose of this dissertation is to apply the parts-and-assemblies concept to software construction. A software component is analogous to a part. From our discussion in Chapter 1, this means that a component can be viewed as either a part or an assembly depending on the level of abstraction of the view. A particular component usually changes from a part to an assembly of subparts as the level of abstraction is decreased. The duality of a component is a very important concept. The failure to deal with this dual view caused some problems with earlier work on reusable software.
The major problem with earlier work on reusable software is the representation of the software to be reused. In program libraries the programs to be reused are represented by an external reference name which can be resolved by a linkage editor. While a functional description of each program is usually given in a reference manual for the library, the documentation for a library program seldom gives the actual code or discusses the implementation decisions. The lack of this information prohibits a potential user of a library program from viewing it as anything other than a part. If the user can treat a library program as an isolated part in his developing system then the program library will be successful. Mathematical function libraries fit well into this context.
Usually, however, a user wishes to change or extend the function and implementation of a program to be reused. These modifications require a view of the program as an assembly of subparts and a part of many assemblies. To decrease the level of abstraction of a library program to view it as an assembly of subparts requires information about the theory of operation of the program and implementation decisions made in constructing the program. To increase the level of abstraction of a library program to view it as part of a collection of assemblies requires information about interconnections between programs in the library and implementation decisions defining common structures. None of this information is explicit in a simple program library. The burden is placed on the user of the library to extract this information.
The view of software components as isolated parts also plagued early work on reusable code modules [Corbin71, Corwin72]. The software components to be reused in this work are code modules hundreds of source lines long. With the code available a knowledgeable human user could form an abstraction of a given code module by examining it, but this is difficult work requiring vast amounts of knowledge from many domains. The problem of understanding a code module is exacerbated by the large size of the code modules. The large size is required to help a potential user of a reusable code module set organize a program using a small number of module names. If the average code size of a reusable code module is small, then there will be too many code module names for the user to organize. If the average code size is large, then the code modules will turn out to be too inflexible to be used in a wide range of systems without human examination and tailoring in each use. As with program libraries, the burden of organizing a specific program is placed on the user because even though the structure between the reusable code modules is more easily discerned than in program libraries, it is not completely explicit.
To avoid the problems encountered with program libraries and reusable code modules we will use the computer to handle a huge number of module names. Each name represents a small flexible software component described at a level of abstraction above programming language source code which will allow us to view the component as an assembly of subparts and a part of assemblies.
In general it seems that the key to reusable software is to reuse analysis and design; not code. In code, the structure of parts which make up the code has been removed and it is not divisible back into parts without extra knowledge. Thus, code can only be viewed as a part. The analysis and design representations of a program make the structure and definition of parts used in the program explicit. Thus, analysis and design is capable of representing both the part view and assembly view while code can only represent the part view. In this chapter a method will be presented which extends the reusable parts theme into all phases of the software lifecycle rather than just the coding phase.
It has been a common practice to name new computer languages after stars. Since the system described in this dissertation is a mechanism which manipulates special-purpose languages it seems only fitting to name it after a structure of stars, a galaxy. Draco, Latin for dragon, is a dwarf elliptical galaxy in our local group of galaxies which is dominated by the large spiral galaxies Milky Way and Andromeda. Draco is a small nearby companion of the Milky Way. At 1.2E+5 solar masses and 68 kiloparsecs from Earth its small size and close distance to home is well suited to the current system, Draco 1.0, which is a small prototype.
The Draco system addresses itself to the routine production of many systems which are similar to each other. The goal of this work is to be able to build large, understandable, maintainable, documented systems which represent an error-free implementation of the user's needs and desires. The particular approach the Draco system takes is the extension of the reusable parts-and-assemblies theme into the analysis and design phases of software construction.
Draco is an interactive system which enables a user to guide the refinement of a problem stated in a high-level, problem-domain-specific language into an efficient, low-level executable program. As the user guides the refinement of his problem he may make individual modeling and implementation choices or rely on tactics (which he defines) to give guidelines for semi-automatic refinement. Draco supplies mechanisms to enable the definition of problem domains as special-purpose, high-level languages and manipulate statements in these languages into an executable form. The notations of these languages are the notations of the problem domain. The user is not asked to learn a new, all-purpose language. When the user interacts with the system it uses the language of the domain. The user specifies his problem in terms of the objects and operations of the problem domain.
If an organization were interested in building many customized systems in a particular application area, say systems for aiding travel agents, they would go out to travel agent offices and study the activities of travel agents. A model of the general activity of being a travel agent would be formed and the objects and operations of the activities identified. At this point, the analyst of the domain of travel agent systems would decide which general activities of a travel agent are appropriate to be included in travel agent systems.
The decision of which activities to include and which to exclude is crucial and will limit the range of systems which can be built from the model later. If the model is too general it will be harder to specify a particular simple travel agent system. If the model is too narrow the model will not cover enough systems to make its construction worthwhile.
Once the analyst has decided on an appropriate model of travel agent activities, he specifies this model to the Draco system in terms of a special-purpose language specific to the domain of travel agents and their notations and actions.
The idea here is not to force all travel agents into the same mold by expecting them all to use the same system. If the model of the domain of travel agents is not general enough to cover the peculiarities which separate one travel agent's actions from another, then the model will fail.
The domain of travel agent systems is specified to Draco by giving its external-form syntax, guidelines for printing things in a pleasing manner (prettyprinter), simplifying relations between the objects and operations, and semantics in terms of other domains already known to Draco. Initially, Draco contains domains which represent conventional, executable computer languages.
Once the travel agent domain has been specified, systems analysts trying to describe a system for a particular travel agent may use the model language as a guide. The use of domain-specific language as a guide by a systems analyst is the reuse of analysis.
Once the specification of a particular travel agent system is cast in the high-level language specific to travel agent systems, Draco will allow the user to make modeling, representation, and control-flow choices for the objects and operations specific to the travel agent system at hand. The selection between implementation possibilities for a domain-specific language is the reuse of design.
Design choices refine the travel agent system into other modeling domains and the simplifying relations of these modeling domains may then be applied. At any one time in the refinement, the different parts of the developing program are usually modeled with many different modeling domains. The simplifying relations are source-to-source program transformations. The individual design choices have conditions on their usage and make assertions about the resulting program model if they are used. If the conditions and assertions ever come into conflict, then the refinement must be backed up to a point of no conflict. The use of strategies based on a formal model to aid in guiding the process of refinement is discussed in Chapter 6.
Eventually, the travel agent system is refined into an executable language and it is output by the system. Along with this final program is a refinement history of the choices made at each point in the refinement. This refinement history can explain every statement in the final program at different levels of abstraction all the way back to the original statement in the high-level travel agent domain. The refinement history is a top-down description of the final program. The process which produces this history is not a top-down process. The refinement history states which components were used in the construction of a particular system. If a component is found to be in error in one system, then the refinement histories of other systems may predict failures in those systems which used the faulty component.
The primary result of this work is the ability to build models of a class of systems and use these models to create member systems of the class in a reliable and timely way. New models are built upon old models to minimize the effort in creating a new model. The programs produced from these models are very efficient with the major optimizations done in the intermediate modeling languages.
A side-effect of this work is that it provides a mechanism for specifying computer science algorithms and representations in such a way that one need not know the implementation details of an algorithm or representation to use it.
To elaborate the brief discussion above, four major themes dominate the way Draco operates: the analysis of a complete problem area or domain (domain analysis), the formulation of a model of the domain into a special-purpose, high-level language (domain language), the use of software components to implement the domain languages, and the use of source-to-source program transformations to specialize the components for their use in a specific system.
The Draco mechanism is a general mechanism which can create (from human analysis) and manipulate (with human guidance) a library of domains. The domains are separate from the mechanism.
Domain analysis differs from systems analysis in that it is not concerned with the specific actions in a specific system. It is instead concerned with what actions and objects occur in all systems in an application area (problem domain). This may require the development of a general model of the objects in the domain, such as a model which can describe the layout of the documents used. Domain analysis describes a range of systems and is very expensive to perform. It is analogous to designing standard parts and standard assemblies for constructing objects and operations in a domain. Domain analysis requires a craftsman with experience in the problem domain.
A Draco domain captures an analysis of a problem domain. The objects in the domain language represent the objects in the domain and the operations in the domain language represent the actions in the domain. This approach follows earlier definitions of a problem domain:
"A model of the problem domain must be built and it must characterize the relevant relationships between entities in the problem domain and the actions in that domain." [Balzer73]It is our view that all languages used in computing capture the analysis of some problem domain. Many people bemoan the features of FORTRAN; but it is still a good language for doing collimated output of calculations, the type of computing high-energy physics has done for many years. This is not to say that FORTRAN is a good analysis of the domain of high-energy physics calculation, but it did find its niche [Wegner78b]. Domains are tailored to fit into a niche as defined by the uses in which man is interested in using computers.
Domain languages usually differ radically in form from standard general-purpose computer languages. Appendix III presents some examples of domain language statements. Most of the examples are tabular forms since these seem to be easy to read. A decision table, document format, and ANOVA table are all good examples of possible constructs for domain languages.
As discussed on page, software components are analogous to both parts and assemblies. A software component describes the semantics of an object or operation in a problem domain. There is a software component for each object and operation in every domain.
Once a software component has been used successfully in many systems, it is usually considered to be reliable. A software component's small size and knowledge about various implementations makes it flexible to use and produces a wide range of possible implementations of the final program. The top-down representation (refinement history) of a particular program is organized around the software components used to model the developing program.
The use of components, which is discussed in Chapter 4, does not always result in a program with a block structure chart in the form of a tree. Usually, as with programs written by human programmers, the block structure chart of the resulting program is a graph as shown in figure 36. An example component for a low-level executable domain language is shown in figure 27.
The source-to-source program transformations [Kibler78] used by Draco strip away the generality in the components. This makes general components practical. The transformations also smooth together components, removing inefficiencies in the modeling domain. This makes small components practical. Since single-function, general components are essential to the parts-and-assemblies approach, the transformations make component-built systems efficient and practical.
A transformation differs from an implementation of a component (a refinement) in that transformations are valid for all implementations of the objects and operations they manipulate. Refinements can make implementation decisions which are limitations on the possible refinements for other components of the domain. In general, transformations relate statements in one problem domain to that same problem domain, while components relate statements in one problem domain to statements in other problem domains. Some source-to-source program transformations for a low-level executable language are shown in figure 11.
This section presents an SADT(TM) model of the use of Draco to produce software. SADT (System
Analysis and Design Technique) is a registered trademark of
SofTech Inc. and has been successfully used to model both
software systems and social systems [Connor80, Ross77]. Its ability to
model both kinds of systems is important here since the
parts-and-assemblies concept on which Draco is based requires social
modeling to show how a craftsman gains enough experience to create a
problem domain for Draco's use. For those readers unfamiliar with SADT,
Appendix I presents a brief introduction to the technique.

The viewpoint or emphasis in the model is showing how the productivity of
the organization may be increased by reusing the analysis and design of
one system to construct another system. From our discussion in Chapter 1
this is only worth-while in problem areas where there is a demand for many
similar systems (A0C2). In particular we wish to show how an organization
might acquire the information to reuse analysis and design while it
produces systems in a conventional manner.

Meanwhile, the production arm of the organization accepts system
requirements for new systems (A0.3I1) and builds working software systems
(A0.3) either using Draco or a conventional method. The result of this
construction of software systems is either craftsman experience building a
custom system in some domain or experience with a Draco domain (A0.3O2).
This experience is used to help the organization establish or revise a
Draco domain (A0.2I2).


Next, a prettyprinter is created (A22.2) which can prettyprint the internal form of the domain back into the external form (domain language).
The third phase in the construction of a domain is the creation of a transformation library for the domain (A22.3) which is prettyprinted into a catalog of transformations for the domain.
The fourth and final phase in the construction of a domain is the creation of a component for each object and operation in the domain (A22.4). Each component contains many refinements which specify the meaning of the component in terms of other domains known to Draco (A22.4I1). As each refinement of a component is put into the domain component library, it is annotated with transformations of interest from the transformation library (A22.4I1) of the domain in which the refinement is written.
Feedback on problems with the definition of a domain is given through the
use of the domain (A22O1).


Once the program has been parsed into the domain internal form (A32.2O1), it is transformed and refined by a system specialist using the TFMREF subsystem into the source code of an executable target language (A32.3). The resulting software system is tested (A32.4) and is either acceptable (A32.4O1) or unacceptable. The refinement record (A32.3O1) of an acceptable system is retained.
The two types of unacceptable systems are those which seem to meet the
requirements but use too much resource (A32.3I2) or those which do not
meet their requirements (A32.1I2). An unacceptable system from a resource
standpoint may benefit from a new implementation. An unacceptable system
from a requirements standpoint requires revision of the domain language
program.
>

Once the problem has been refined into an executable language, the program (A323O2) is prettyprinted to a file.
From the previous model of the use of Draco in an organization to produce software, four major human roles with Draco are apparent.
From the model, we can see that the basic cycle of operation in producing an executable program with Draco is to cast the problem in a domain language, parse the domain language into the domain's internal form, optimize the internal form using transformations, refine the internal form using software components, and iterate transformation and refinement through layers of modeling domains.
The specification of the objects used in the Draco cycle of refinement is discussed in the next section.
A problem domain is a collection of objects and operations, but to specify a problem domain to Draco a few other things must be included. In particular, a domain language parser, a domain language prettyprinter, source-to-source transformations for the domain, and components for the domain must be specified to create a useable Draco domain.
A domain language parser takes the external form (syntax) of domain A, ext[A], and turns this into the internal form of domain A, int[A]. The domain language (the external form) should, if possible, use the notations and forms of the problem domain. The internal form of a domain is a tree with a prefix keyword in each node of the tree to state the purpose of that node. This is similar but not the same as a parse tree in that the prefix keywords are not nonterminal symbols in the grammar. All the manipulations of Draco are performed on this internal form.
In Draco 1.0 the syntax of the domain language is specified in a BNF style with tree-constructing operations included as actions. This scheme of parser description is taken from the META series of metacompilers [Schorre64]. The parser generator generates LL(1) class parsers from these descriptions.
A domain language prettyprinter takes int[A] and produces ext[A]. This activity is essential since Draco must communicate its actions and results in a form people can understand. The external form produced should be pleasing to the eye and produce useful groupings and indentations.
The details of specifying source-to-source transformations are dealt with in Chapter 3. The source-to-source transformations transform parts of the internal form of one domain into the internal form of that same domain, i.e., int[A] into int[A].
The transformations capture rules of exchange relating the objects and operations in the domain. These rules of exchange are independent of the implementations of the domain objects and operations. Each transformation is named and given a characterizing number which relates the importance of performing this transformation in the estimation of the domain designer.
The components for a domain relate the internal form of the domain to the internal form of other domain domains, i.e., int[A] to int[A,B,...,Z]. The details of specifying and using components are discussed in Chapter 4.
The components specify the semantics of the objects and operations in the domain. They do this by relating the objects and operations in one domain to the objects and operations in other (possibly the same) domains. There is a component for each object and operation in a domain. Each component contains many refinements each of which is a possible refinement for the object or operation in the domain which the component represents. Each refinement represents an implementation decision which may preclude the use of other refinements in other components. As an example, a string manipulation domain may support a string implementation as a singly-linked list of characters. This implementation would preclude a move string-pointer operation refinement which can back up in a string.
The details of domain specification may be found in the manual for Draco 1.0 [Neighbors80b]. In the following two chapters we will discuss the specification and use of transformations and components in more detail.
The source-to-source transformations used by Draco relate the objects and operations of a domain by specifying rules of exchange between statements in the domain. These rules of exchange are independent of any implementation decisions which may be made for the domain objects and operations.
Draco uses these transformations to customize the use of a component to its use in a specific system. Once a component is placed into a system, the transformations use the surrounding context information to smooth the component into the context and remove any unused generality.
Program transformations are productions with a left-hand side (LHS), a right-hand side (RHS), and enabling conditions [Standish76a]. The LHS is a pattern which is matched against the program. The enabling conditions are predicates on the parts of the program which are matched by the LHS. If the enabling conditions are true then the RHS is substituted for the LHS in the program. Since transformations are performed on a representation of the source code of a program, they represent optimizations independent of any particular machine.
By "source-to-source program transformations" we mean that the LHS is a pattern on the text, or source code, of the program and that the RHS is also a pattern of source code. In source-to-function transformations, the RHS is a function on the matched part of the program and the result of that function is substituted for the LHS.
In general, source-to-source transformations are not as expressively powerful as source-to-function transformations but their use is prompted by one important reason, the ability to understand what the transformations do. To understand a source-to-source transformation, the user must understand the language being transformed, the language of the transformation pattern matcher, and the language of the enabling conditions. The pattern language and the enabling condition language are usually very simple. To understand a source-to-function transformation, the user must further understand the language of the RHS function. This language is usually very complex and not at all the kind of thing a transformation user, who is concerned about the program and not about the transformation system, cares about learning.
In Draco, the source-to-source transformations should be intelligible to the domain builders, the domain users, and the Draco system specialists whose roles are defined on page. From now on we shall use transformations to denote Draco source-to-source transformations. Transformations are created by the domain builders. The simplicity of source-to-source transformations seems to increase their accuracy and make them more attractive to users.
Practically every transformation has enabling conditions if we wish to
insure strict semantic equivalence. Usually the full enabling conditions
are not checked. As an example the transformation
?X+0 => ?X
may have enabling conditions in that the "+" add operator may change the
type of ?X in some languages or normalize the representation of ?X in some
machines. By the same token the transformation,
(?B+?C)*?A <=> (?B*?A)+(?C*?A)
which requires the conventional enabling condition that ?A is side-effect
free [Standish76a], may alter the behavior of the program. All the
arithmetic operators on computers have side effects based on their range
of number representation. For any particular machine there are values for
?A, ?B, and ?C which can cause an arithmetic underflow or overflow on one
side of the transformation and not on the other. These kind of enabling
conditions are seldom checked since they would prevent most
transformations from operating and are not machine independent. In
general, the transformations are "probabilistic," checking for enabling
conditions which are usually violated. These include predicates on the
range and domain of variables in the program fragments under consideration
[Standish76a].
In Draco, transformations are specified as rules of exchange on the internal form of a domain language which is a tree with a keyword in each node to identify its function (prefix internal form). Thus, the LHS of a transformation is a statement in a prefix internal-form tree-pattern language.
Since the prefix internal form contains identifying keywords, a very fast, simple pattern matcher may be built using the keyword as a left-hand anchor in the matching. We can view the LHS as a tree template which is applied only to nodes in the internal form tree with the same prefix keyword as the root of the LHS pattern. Four types of objects may appear in the LHS pattern after the prefix keyword.
Metarules are rules which relate one rule to another rule. In the context of transformations as production rules, metarules relate the possible action of one transformation to the possible actions of other transformations [Kibler77]. Since the transformations used by Draco are source-to-source, we can automatically produce metarules for these transformations.
In the following discussion, LHS(t) and RHS(t) denote the right-hand and left-hand sides of transformation t, and a\b denotes whether pattern a matches pattern b. The details of the "\" metamatching operator are given in Appendix II. Metarules for a set of transformations, T, are created by the algorithm in figure 9. For each transformation t, algorithm 9 produces an UPLIST(t) and an AGENDA(t,n) for each node n in RHS(t).
ALGORITHM: Metarules(T)
INPUT: a set of transformations T
OUTPUT: for each t in T, priority queue UPLIST(t) and for each node n in RHS(t) priority queue AGENDA(t,n)
The UPLIST for a transformation t lists all the transformations whose LHS contains RHS(t). Thus if the transformation t were applied, the UPLIST(t) lists which transformations might apply at some internal form subtree which encloses the substituted RHS(t). The priority, DEPTH(n[j]), associated with each transformation given in UPLIST states where the transformation should be attempted as the number of tree levels above the node which was just transformed.
From steps 1 and 4 of algorithm 9, it is easy to see that the complexity of creating the metarules for n transformations is O(n^2) in terms of the "\" metamatching operator. Since this operator is expensive, and the number of transformations for a domain can easily range to 2000 [Standish76a], the transformations for a domain are grouped into a library and new transformations are incrementally added. The complexity of adding a new transformation to an existing library of n-1 transformations is O(n). All of the existing metarules still remain, just some new information is added to them.
As will be shown when we discuss the management of the transformations, the ability to generate metarules when the library is formed saves large amounts of searching when the individual transformations are used.
The designers of early transformation systems [ARSEC79] struggled with the problem of how to name each transformation in a large set of transformations so that the user could remember the names. The Draco system deals with this problem in two ways. First, the class feature in the definition of transformations allows one transformation to stand for many transformations depending on the size of the classes involved. Secondly, the metarules virtually eliminate the naming problem by having the transformations refer to each other by name. If a user knows where he wishes to perform a transformation then the metarules will have suggested only those transformations which could apply at that locale.
The number of names the user must recognize, not remember, is reduced to the transformations suggested for each locale. The metarule suggestions, coupled with the ability to display the text of a transformation from the catalog of transformations for each domain, eliminates the naming problem.
101 - up procedural transformations
not source-to-source
don't trace, don't ask user
11 - 12 always do this transformation
9 - 10 convert to canonical form
7 - 8 operator arrangement
5 - 6 flow statement arrangement
3 - 4 program segment arrangement
1 - 2 reverse canonical form
0 very seldom done, keys procedures
The application codes were designed for lookahead in the transformation process but this turned out to be largely unnecessary in the specialization of components discussed on page. They do turn out to be a convenient means for specifying actions to be taken by the transformation mechanism (i.e., convert to canonical form).
The example transformations in this section are on an ALGOL-like language rather than a domain language with which the reader would be totally unfamiliar.
5/3/79 19:18:18 SIMAL.TLB
<BOP> = {ASSIGN,EXP,DIV,IDIV,MPY,SUB,ADD,
NOTEQ,EQUAL,GTR,LESS,GTREQ,LESSEQ,AND,OR}
<REL> = {NOTEQ,EQUAL,GTR,LESS,GTREQ,LESSEQ}
<BOP>EMPX: 12 *EMPTY*<bop>?X => *UNDEFINED*
<BOP>IFELSEX: 4 (IF ?P THEN ?S1
ELSE ?S2)<bop>?X =>
(IF ?P THEN (?S1)<bop>?X
ELSE (?S2)<bop>?X)
<BOP>IFX: 4 (IF ?P THEN ?S1)<bop>?X =>
(IF ?P THEN (?S1)<bop>?X)
<REL>S0: 10 ?A-?B<rel>0 => ?A<rel>?B
ADDX0: 12 ?X+0 => ?X
EQUALMAMB: 12 -?A=-?B => ?A=?B
EXPX2: 9 ?X^2 => ?X*?X
FORXX: 11 FOR ?W:=?X STEP ?Y TO ?X DO
?Z => [[?W:=?X;
?Z]]
IFELSENOT: 12 IF ~?P THEN ?S1
ELSE ?S2 => IF ?P THEN ?S2
ELSE ?S1
LABELIFX: 10 ?X:
IF ?P THEN [[?S;
GOTO ?X]] =>
?X:
WHILE ?P DO ?S
MINUSSUBAB: 9 -(?A-?B) => (?B-?A)
PARPAR: 12 ((?X)) => (?X)
SEMICLXIF: 10 ?X:
?S;
IF ~?Y THEN GOTO ?X => ?X:
REPEAT ?S
UNTIL ?Y
When a domain language program is parsed into the domain's internal form, an agenda is established for each node in the internal form tree. If requested, the PARSE subsystem of Draco 1.0 will suggest transformations for each node in the program. Only transformations which will succeed in matching their LHS's are suggested by placing them in order of application code on the agenda of the node.
The transformation mechanism allows the application of transformations within a selected locale in an instance of a domain. Currently, the locale is selected by the user, but during optimization it really should be selected by analysis tools as discussed on page. The locale serves to focus the attention of the transformation mechanism to a small part of the program at a time. Within the locale the user may apply individual transformations to specific points in the program. The transformation suggestions on the agenda at any particular point in the internal form tree may be displayed by the user.
The individual application of transformations is a very tedious process. Alternatively, the user may request the transformation mechanism to apply transformations in the locale with some range of application code under some application policy with or without user approval of each transformation. Some transformation application policies and their meanings are given below.
As transformations are performed, the metarules for those transformations suggest other transformations. In particular, the RHS of a transformation already has agendas built into its tree form from the metarule creation. When a RHS is instantiated and substituted into the internal form tree, its agendas suggest transformations. Also, when a transformation is applied, its UPLIST is interpreted to add transformations on the agendas of nodes higher in the internal form tree than the node transformed.
Thus, the initial suggestion of transformations during parsing which could apply, coupled with the transformation mechanism's interpretation of the metarules, starts a chain-reaction which keeps all transformations which could apply to the program on the agendas of the internal form of the program. Since transformations are only put on an agenda if their LHS's match, LHS patterns are not attempted all over the program. This reduction in search in the application of a transformation to a locale makes the transformation mechanism very efficient in operation. This high efficiency will become important when "procedural" transformations are introduced in the next section.
With the use of metarules and the best-in-locale transformation application policy, some transformations which were previously considered procedural in nature may be implemented by a small set of source-to-source transformations in a comfortable way. These transformations introduce non-printing semantic markers into the internal form and rely on the metarules for their propagation through the internal form. The effect of the transformations and metarules is to create a Markov algorithm which runs on the body of the program being developed.
BEGIN LOCAL A;
.
.
GOTO LABEL1;
.
.
LABEL1: GOTO LABEL2;
.
.
IF predicate GOTO LABEL1;
.
.
END













Because of the application codes, if there are %GCE2 markers in the tree then one of them is the locus of the next transformation. If there are no %GCE2 markers, then a %GCE1 marker moves up the tree and produces a new %GCE2 marker (GCE-UP1, GCE-UP2) or removes the %GCE1 marker when it encounters the scope of the label (GCE-SCOPE). At any time there is only one %GCE1 marker in the tree.
To avoid leaving semantic markers in the internal form, the transformations with application codes greater than 99 enlarge the locale if they are placed on an internal form agenda outside the locale. The resulting program is shown in figure 26.
BEGIN LOCAL A;
.
.
GOTO LABEL2;
.
.
GOTO LABEL2;
.
.
IF predicate GOTO LABEL2;
.
.
END
These procedural transformation sets are hard to understand and violate the ease of understanding motivation for source-to-source transformations, but they do serve to demonstrate a natural source-to-source technique for implementing some procedural transformations without having to learn an alien language capable of manipulating the internal form trees and writing RHS procedures.
Program transformation can be viewed as a game of perfect information, like chess. The program represents the current board position while the transformations which apply represent the legal moves. The goal of the transformation process is to achieve an optimal program under some criteria. Assuming we had an evaluation function on the program in terms of the criteria of optimization we could use lookahead with the evaluation function to suggest the next transformation to apply, much as the chess playing programs do today. The difficulty is in building the evaluation function which can determine the "goodness" of a given program under some general criteria. One approach to the evaluation function is to assign a "goodness" to each transformation. The application codes of transformations represents this approach. The increase in program "goodness" from applying a series of transformations is a function of the "goodness" of the individual transformations.
The ability to look ahead with Draco transformations is not very important since the transformations are used to specialize components. The degree of specialization could be improved by lookahead but the overwhelming majority of the work in specialization is in removing program fragments which represent unused generality. The relationship between components and transformations, discussed in Chapter 4, initiates transformation sequences to remove most unused generality without lookahead.
An alternative approach for transformation planning is to specify a goal in terms of the program and find some sequence of transformations which achieves the goal. This approach, currently under investigation by Fickas [Fickas80], avoids the huge search space of transformations encountered by lookahead, but it must deal with the problem of suggesting worth-while goals.
On page, the need to perform transformations on the correct level of abstraction is discussed. The transformations for a domain should only deal with the objects and operations of the domain and not anticipate or infer knowledge from other domains which map into or out of the domain.
In this chapter we have discussed how transformations are defined and used for specialization by Draco. The next chapter, which discusses software components, will investigate in detail the relationship between components and transformations. In particular, the theme of removing the responsibility for transformation suggestion will be carried over into components by automatically annotating the components with transformations to be considered.
Components provide the semantics for the domains specified to Draco. Each component represents possible implementations for an object or operation of a domain in terms of other domains known to Draco.
Each component must provide a semantics for the object or operation it represents which is consistent with the transformations of that object or operation in the domain. If, for example, a component represents the insertion of an element in a list, then the result of the operation should be a list. The internal actions of the list insertion component may break the input list structure into a structure which is not a list, but the result of the operation must be a list.
The concept of "granularity of meaning" is introduced here because earlier work in components attempted to prove that a component always upheld the structure of the object being manipulated. As an example, the properties of a list might be axiomatized and used in an attempt to show that a list insertion upheld all the axioms of a list. For most implementations of the insertion operation on a list, the axioms are not upheld since the insertion requires a temporary breakup of the structure of the list in a way which violates the axioms of a list.
In this work we assume that a well-defined component upholds the axioms of its input and output types only with respect to the external environment of the component (i.e., statements in the domain language in which the object or operation is defined).
An example component for exponentiation is shown in figure 27. The component provides the semantics for EXP internal form nodes for the language SIMAL which is not a domain-specific language, but will be used in examples so that the reader will not have to learn a domain-specific language at this point.
COMPONENT: EXP(A,B)
PURPOSE: exponentiation, raise A to the Bth power
IOSPEC: A a number, B a number / a number
DECISION:The binary shift method is O(ln2(B)) while
the Taylor expansion is an adjustable number
of terms. Note the different conditions for
each method.
REFINEMENT: binary shift method
CONDITIONS: B an integer greater than 0
BACKGROUND: see Knuth's Art of ... Vol. 2,
pg. 399, Algorithm A
INSTANTIATION: FUNCTION,INLINE
RESOURCES: none
CODE: SIMAL.BLOCK
[[ POWER:=B ; NUMBER:=A ; ANSWER:=1 ;
WHILE POWER>0 DO
[[ IF POWER.AND.1 # 0
THEN ANSWER:=ANSWER*NUMBER ;
POWER:=POWER//2 ;
NUMBER:=NUMBER*NUMBER ]] ;
RETURN ANSWER ]]
END REFINEMENT
REFINEMENT: Taylor expansion
CONDITIONS: A greater than 0
BACKGROUND: see VNR Math Encyclopedia, pg. 490
INSTANTIATION: FUNCTION,INLINE
ASSERTIONS: none
ADJUSTMENTS: TERMS[20] - number of terms,
error is approximately (B*ln(A))^TERMS/TERMS!
CODE: SIMAL.BLOCK
[[ SUM:=1 ; TOP:=B*LN(A) ; TERM:=1 ;
FOR I:=1 TO TERMS DO
[[ TERM:=(TOP/I)*TERM ;
SUM:=SUM+TERM ]] ;
RETURN SUM ]]
END REFINEMENT
END COMPONENT
A prose description of what the component does is given by the PURPOSE field. If the component takes objects as arguments and/or produces objects, then the type of these objects in terms of the objects in the domain is given in the IOSPEC field of the component. The DECISION field presents a prose description of the possible refinements of the component and the considerations involved in choosing between the alternatives.
Finally, there is a set of refinements of the component which represent a possible implementation of the component in terms of the objects and operations of other domains.
The first REFINEMENT in the set of refinements is the default refinement. In the absence of any other information, Draco will attempt to use this refinement first. Each REFINEMENT has a name and a BACKGROUND which is a prose description of the method the refinement implements and reference to where more information about the method may be found.
The CONDITIONS field of a refinement lists conditions which must be true before the component may be used. There are basically two kinds of conditions: conditions on the domain objects on which the component operates and conditions on previously made implementation decisions. The conditions on the domain objects are local to the locale where the component will be used. The conditions on the implementation decisions are global to the domain instance being refined. The ASSERTIONS field of a refinement makes assertions about the implementation decisions the component makes if it is used. The assertions are the opposites of the conditions on implementation decisions. The management of assertions and conditions is discussed in more detail on page.
The RESOURCES field of a refinement states what other components will be required to perform initialization if the refinement is chosen. The resource components are program parts which are executed before the resulting program begins execution (initialization phase) and they create information resources for the refinements used in the program.
An example use of a resource is a refinement for cosine which interpolates a table of cosines during execution. The table must be built during the initialization phase and the name of the table must be passed to the interpolation refinement of the component cosine. This is achieved by building a refinement which interpolates tables and requires a resource component which builds interpolation tables.
The ADJUSTMENTS field of a refinement states fine tuning settings for a refinement, the meaning of the adjustment, and a default setting. An example adjustment term might adjust the accuracy of a refinement or limit the amount of time spent in calculating in the refinement.
The GLOBAL field lists all names used in the refinement which are not to be renamed. The primary use of a GLOBAL definition is to define variable names which are reserved by a domain and cannot be renamed. The SNOBOL variable &ANCHOR is an example global. GLOBAL definitions should seldom be used and are always suspect. They seem to stem from a poor analysis of a domain. Labels which are defined in the refinement are defined in the LABELS field of the refinement.
The way a refinement may be inserted into the internal form tree during refinement is governed by the INSTANTIATION field of the refinement. The three modes of instantiation are INLINE, FUNCTION, and PARTIAL. More than one instantiation may be given for a refinement with the first one listed being the default instantiation. INLINE instantiation means the refinement is substituted directly into the internal form tree with all variables used in the refinement renamed (including labels) except for the arguments and those declared global. FUNCTION instantiation substitutes a call for the component in the internal form tree and defines a function using the refinement for the body. A new function is defined only if the same function from the same domain has not already been defined. PARTIAL instantiation substitutes a call for the component in the internal form tree with some of the arguments already evaluated in the body of the function defined. Limitations are placed on the partially evaluated forms allowed. When a function is defined the defining domain, component name, and a version number are used to differentiate between functions of the same name in different domains and FUNCTION and PARTIAL versions of the same function in the same domain.
The final field of a refinement is either a DIRECTIVE to Draco or the internal form of a domain. The internal form of a domain may be described either in a parenthesized tree notation with the INTERNAL:domain directive or it may be specified in the external form (domain language) of the domain with the CODE:domain.nonterminal directive. The CODE directive causes the parser for the specified domain to be read in and started to recognize the given nonterminal symbol. A DIRECTIVE to Draco is one of the following alternatives: view the component as a function definition by the user program, view the component as a function call, defer from refining this component, and remove the node which invoked this component from the internal form tree. The Draco DIRECTIVEs are used when a domain language is defined which allows function definitions, functions calls, and such things as refinements for comments which remove them from the program since they are saved in the refinement history.
Not all the component and refinement fields are required for each component definition. Basically the only required fields are COMPONENT, REFINEMENT, INSTANTIATION and CODE.
Components are placed into libraries in much the same way and for much the same reason that transformations are placed into libraries. The processing of a single component for inclusion in the component library of a domain is very expensive. For each refinement in the component, the parser for the domain(s) in which the refinement is written must be loaded to parse the external form into internal form. Once the code for the refinement is in internal form, the agendas of the internal form are annotated with transformations of interest from the transformation library of the target domain. These transformation suggestions are made in much the same way that transformation suggestions are made when a domain language program is parsed as discussed on page. The transformation suggestions will point out things of interest when the refinement is used. Thus, Draco supports a component library construction facility where a group of components may be replaced or added without disturbing the other components in the library.
This section discusses how the fields of a component are used in the refinement process to choose an implementation for the operation of object the component represents. Not all of these actions are accommodated in the current prototype system Draco 1.0. The differences between this narration and the prototype are given on page.
First the IOSPEC conditions on the component should be verified by examining the internal form or refinement history of the surrounding internal form of the node to be refined. Restrictions on the legal internal forms accepted by the domain language parser might make this step easier.
Next a REFINEMENT is chosen and the refinement CONDITIONS are checked. If an implementation decision condition is violated then the refinement may not be used. Local conditions on the domain objects are formed into surrounding code for the refinement body. The hope is that transformations for the domain will be able to remove this surrounding code by "proving" the conditions correct and removing the code.
The user is then asked about any ADJUSTMENTS for the refinement. If the user supplies no adjustments then the default adjustments are used.
The refinement body is now instantiated into the internal form according to the users wishes for INSTANTIATION and the allowed instantiations for the refinement. The body is instantiated with minimal renaming to avoid naming conflicts. If the refinement is instantiated as a function and a function already exists then the already defined function is used.
Once the refinement is inserted, any necessary RESOURCES are added to the initialization phase of the developing program. These resources are usually high-level program fragments which also have to be refined.
Finally the ASSERTIONS for the refinement are made in the scope of the domain instance. The assertions are a kind of lock and key mechanism with the conditions of other refinements. When two domain instances are merged into a single instance of a same or other domain, then the assertions are checked for consistency. This places the overly strong restriction that all objects in a domain of the same type have the same implementation. More experience with domains could probably remove this restriction. If the asserted conditions conflict, then the refinement of the program must be backed up. A model for avoiding conflicting assertions is given on page FORMALMODEL.
The model for the use of a component is very close to the actions of a module interconnection language (MIL). In fact it seems that a MIL is a natural way to organize the components of a particular domain. This similarity is discussed on page.
The refinement mechanism of Draco 1.0 applies the component library of a domain to a locale within an instance of the domain in the internal form tree for the program being refined. The locale is bounded by a domain instance which is a part of the internal form tree in the internal form of a particular domain. Refinements are made in one domain at a time on an instance of the domain. The locale mechanism is important for refinements in that the "inner loop" of the program should be refined first to pick efficient implementations. These implementation decisions will affect the choices outside of the inner loop through the assertion and condition mechanism of the components.
The Draco 1.0 refinement mechanism applies the components to the locale internal form tree using application policies similar to transformation application policies. In general, top-down application is the best policy to avoid conflicting conditions which would require a backup of the refinement.
From the previous discussion about the selection of a refinement for a component and the user interaction necessary to make a choice, it is evident that the user needs some mechanism to keep Draco from asking too many questions. The user needs the ability to specify guidelines for answering the questions and these guidelines are called "tactics."
The TACTICS subsystem of Draco 1.0 allows the user to interactively define tactics which answer refinement questions for the refinement mechanism. The subsystem also allows the user to read and write tactics from storage. A standard set of tactics is already available. When the refinement mechanism requires a user response, it first applies the tactics to see if one of them provides an answer.
DEFINE HEAD.*ENTRY* = COMPONENT,LOC 3;
DEFINE SPACE.*ENTRY*=[ALL<DIRECTIVE>,USE],
[ALL<AVAILABLE FUNCTION>,
USE FUNCTION],
[ALL<FUNCTION INSTANTIATION>,
USE FUNCTION],
USE DEFAULT;
DEFINE *CMD*.SUMMARY = "Summary:",COMPONENT,
PURPOSE,IOSPEC,DECISION,
[ALL,REFINEMENT,CONDITIONS,
BACKGROUND,ASSERTIONS,
RESOURCES,INSTANTIATION,
ADJUSTMENTS,DOMAIN];
EXIT
The SPACE rule checks all refinements to see if one is a Draco directive and if so, it uses it. Otherwise if there is a function which already implements the component, then the internal form node is replaced with a call to the function. Otherwise, if there is a refinement which can be instantiated as a function, then it attempts to use that refinement as a function. If all else fails, then it attempts to use the default refinement with the default instantiation. If none of the tactics is successful in producing a refinement, then the refinement user interface is invoked and the user may inquire as to the problem and make a refinement choice.
The *CMD* rules are rules which may be invoked by the refinement user interface. Thus, they are user-defined commands which may inquire about the state of the program under refinement and attempt to make refinement choices just as tactics would. The SUMMARY command prints out the fields of the component and all its refinements for the user's information and would be used if the user were required to specify a refinement.
The refinement user interface could be used for applying refinements one at a time but this would be very tedious work, similar to applying transformations one at a time. In general early versions of a high-level domain-specific program are refined by the default tactics, which use the usually easy and uncomplicated default refinements, to obtain a first implementation to see if the system implements the user's desires. Once a good domain-specific program is settled upon, the more sophisticated refinements and transformations may be used to refine the program for efficiency.

In Chapter 6, a formal model of the interdependencies of the domains which represent Draco's knowledge base is presented. Strategies based on this model should make it easier for a user to avoid knowing the details of the relationships between domains.
This chapter presents some results from using Draco in the construction of programs. To save the reader from having to understand a special-purpose domain language, the examples in this chapter are in the SIMAL language which is a simple infix Algol-like language. This language is not a domain language in the sense we have been discussing and is used here only for exposition purposes.
This chapter discusses an example which refines a quadratic equation solver from SIMAL into LISP. The three domains used in this refinement are organized as shown in figure 30. The DRACO domain shown in figure 30 creates functions, creates function calls, enforces component conditions, and eliminates scoping rules through renaming. It is the model of functions which Draco 1.0 uses.


In this section we will be discussing the refinement of the SIMAL program given in figure 32. The program represents a simple program for solving for the roots of a quadratic equation. The example is deceptively simple. The refinement of the SIMAL program into its equivalent LISP form must deal with the following problems:
.PROGRAM QUADRATIC
$QUADRATIC
[[ LOCAL A,B,C,ROOT1,ROOT2;
LOOP:
PRINT("QUADRATIC EQUATION SOLVER");
PRINT("INPUT A,B,C PARAMETERS ");
A:=READNUM;
IF A=0 THEN RETURN;
B:=READNUM;
C:=READNUM;
ROOT1:=(-B+SQRT(B^2-4*A*C))/(2*A);
ROOT2:=(-B-SQRT(B^2-4*A*C))/(2*A);
PRINT("THE ROOTS ARE: ",ROOT1," AND ",ROOT2);
GOTO LOOP ]]
$
.END
The first set of tactics used to refine the example are the "SS" tactics.
These direct the refinement mechanism to construct a function for each
component which can be made into a function. If a function for a component
already exists then a call to that function replaces the use of the
component. These tactics are designed to create "small and slow" programs
and are shown in figure 33.
DEFINE SS.*ENTRY* = LOC 2,
[ALL<DIRECTIVE>,USE],
[ALL<FUNCTION INSTANTIATION>,
USE FUNCTION],
[ALL<INLINE INSTANTIATION>,USE INLINE];
The second set of tactics used to refine the example are the "LF" tactics
which direct the refinement mechanism to instantiate a component inline if
possible. Otherwise the component is made into a function. The "LF"
tactics are designed to create "large and fast" programs and are shown in
figure 34.
DEFINE LF.*ENTRY* = LOC 2,
[ALL<DIRECTIVE>,USE],
[ALL<INLINE INSTANTIATION>,USE INLINE],
[ALL<FUNCTION INSTANTIATION>,
USE FUNCTION];
Both tactics are much simpler than is typically used in the refinement of programs. Tactics usually examine the assertions, conditions, possible instantiations, and target domain in their operation.
The second factor influencing the refinement of the example program was
whether or not the use of the transformation ?X^2 => ?X*?X was
allowed in the SIMAL domain. The transformation requires that ?X be
side-effect free [Standish76a]. If the transformation was allowed, it
was automatically suggested in all the components which could use it and
every time the transformation could be applied it was applied.
Table 1 names the programs produced under the circumstances outlined above. From table 1 we can see that the "SS" tactics met part of their objective in that the code space for the programs refined using them is smaller. The code size is the size of the static program for interpretive UCI LISP measured in 36-bit machine words. All measures of memory size are of this form.
transformation used
no yes
SS QUADSS QUADTSS
551 words 451 words
tactics 9 functions 6 functions
used
LF QUADLF QUADTLF
807 words 595 words
2 functions 2 functions
The block structure charts of the resulting programs is given in figures 36 through 35. The structure charts show that a single transformation can be very powerful in removing the need for the exponentiation routine and its support routines. The NUMBER routine shown in some of the structure charts arises from the need to maintain a consistent model of SIMAL numbers in LISP.



The runtime characteristics of the resulting programs was investigated by
running twenty test cases of the same random data through each program and
measuring CPU and memory use. Figure 38 gives the CPU usage of
all the programs for each test case while 39 gives the
cumulative CPU usage for each program as it ran the test cases.


Similarly, figures 40 and 41 give the memory use for
each test case and cumulative memory use for each test case respectively.
The variations in the amount of time and memory needed to run each test
case come from the SQRT and EXP routines which are iterative
approximations (Newton-Raphson and Binary Shift Method, see figure
27) and dependent on the input data.


The runtime characteristics show that the programs refined with the "LF" tactics were larger and faster than their counterparts refined with the "SS" tactics. The difference in tactics, however, was completely dominated by whether or not the transformation was used. The programs which were transformed before refinement were faster and used less memory than those which were not transformed. This simple example demonstrates the importance of performing transformations at the correct level of abstraction as discussed on page.
Figure 39 shows that the QUADTLF implementation was just barely the fastest, beating QUADTSS, even though figure 40 shows that it requires twice as much running space as QUADTSS and its code space is larger. Without transformation, QUADLF is clearly faster than QUADSS and requires only about 20% more running space.
Which implementation is the "best" depends on the time-space tradeoffs in each specific case. The "LF" refined programs were presented at a disadvantage here in that the addition of more transformations would benefit them more since they are embeded inline and the transformations could make use of the surrounding context.
This chapter has presented a simple example which refined a 10 line Algol-like program into approximately 80 lines of LISP. This is clearly not the goal of this work, but it does serve to demonstrate some of the complex interactions which take place between the components, the tactics, and the transformations during refinement. The simple example did not even touch upon the issue of using alternate refinements for a component in that the given tactics always used the default refinement.
Only the ideas of transformations, components, and tactics are presented here. The details of the different definitions allowable in Draco 1.0 are found in the manual for the system [Neighbors80b].
The example in Appendix III uses many domains, more complex tactics, and large transformation libraries. There may be as many as 100 components for a domain each with two or three possible refinements. The transformation libraries may include 2000 or more transformations as the encoding of [Standish76a] for SIMAL does. The tactics may check many features in the context of refinement. The resulting programs may be 10-20 pages long. All of these facts make the transformation and refinement process a very complex operation. The next chapter introduces a formal model of this complex process which may serve as a basis for refinement strategies.
This chapter presents some models and ideas which arose from the use of Draco in the construction of programs. In particular, the nature of source-to-source program transformation, a formal model of the knowledge in Draco domains, and styles of domain organization are discussed.
Experience with source-to-source transformations as used by Draco has
shown that it is important to perform transformations at the appropriate
level of refinement. Continuing the example from Chapter 5, we can
consider the interaction between the SIMAL transformation EXPX2:
?X^2 => ?X*?X and the component for exponentiation shown in figure
27 which has two possible refinements, binary shift method and
Taylor expansion. Given an exponentiation in SIMAL there are three
options: use the transformation; use the binary shift method refinement;
or use the Taylor expansion refinement. For a specific case, of course,
fewer of the options may apply. The possible actions are shown in figure
42.

In the scenarios shown in figure 42 we are attempting to
refine the SIMAL fragment y^2 into a (*TIMES y y) in LISP. As shown, the
application of the EXPX2 transformation followed by the straightforward
refinement of a SIMAL multiply into LISP is the simplest approach.
The refinement of the exponentiation into the binary shift method makes the problem harder but still possible. The POWER could be propagated by transformation, the WHILE loop "unrolled," the AND functions solved, the ANSWER propagated through the unrolled loops, the dead variables eliminated, and the [[...]] block structure removed. Sophisticated and powerful transformations could reduce the binary shift method to a simple multiply.
The use of the Taylor expansion refinement makes the problem unsolvable by general transformations. Of course, a single transformation specific to this particular problem could be defined, and one always exists; but the number of specialized transformations which must exist to do even small problems makes this approach unreasonable. A set of general transformations cannot transform the Taylor expansion into the equivalent multiply because the expansion is an approximation of the multiply. If the transformations are equivalence preserving they shouldn't transform an approximation of a number into the number.
It is attractive to build some specialized knowledge into the system which can deal with problems like the approximation given above. The specialized knowledge would be used to recognize that a specific problem exists and be used to solve the problem. It is the author's opinion that this approach is misguided. The object of the refinement is an exponentiation, not an expansion. An expansion is an implementation detail. The role of knowledge sources in program understanding is discussed on page.
Optimizations of an object or operation must take place on that object or operation and not a refinement of it. This means that for programs constructed by Draco, optimization cannot be regarded as an "after the coding is done" operation. It should most definitely be regarded as an "after the specification is acceptable" operation.
The example of using a transformation at the right level of abstraction which was used here is very simple. The same problem, however, is encountered in more complex settings. As an example in the Augmented Transition Network (ATN) domain, there is a transformation set which removes unnecessary arcs from the transition network. A powerful general set of LISP transformations would have little chance of achieving the effect of this ATN transformation set on the LISP program which results from an ATN description. This is because the LISP transformations deal with LISP primitives and not the states and arcs of an ATN description with which the ATN transformations operate.
Two conditions can cause an optimization to remain undiscovered by source-to-source transformation at the wrong level of abstraction. First, the information necessary to perform the transformation could have been spread out by implementing refinements. Second, the transformations are attempted on an implementation (or model) of the original objects and operations which is not exactly equivalent to the original objects and operations.
To fully understand the capabilities of Draco we must build and reason with a formal model of the technique.
A major goal of the formal model developed in this section is to be able to answer the reusability questions [Freeman76a] outlined below.
The formal model is also of use in answering the deadlock question during refinement. A deadlock during refinement occurs when two refinement decisions, say the implementation of a data structure common to two separately refined program parts, are inconsistent. This means that the refinement of the program must be backed up to a point where the deadlock did not exist. The detection of this deadlock should be possible from the formal model. The deadlock problem is a subproblem of the reusability questions and would be useful during interactive sessions with Draco.
Finally, the formal model should serve as a basis for the development of refinement strategies. It is expected that for all but toy problems the complexity of answering the reusability or deadlock questions would be prohibitively expensive. The formal model can still serve as a planning space for refinement strategies whose goal is to produce a good program under certain criteria with minimal backup during refinement. The ability to look forward during refinement separates the refinement strategies from the refinement tactics described in Chapter 4.
The formal model of the knowledge in Draco is based on a Petri net [Peterson77, Peterson78]. Following the definition of Agerwala [Agerwala79], a Petri net is a bipartite, directed graph N=(T,P,A) where
T={t1,t2,...,tn} a set of transitions
P={p1,p2,...,pm} a set of places
The union of T and P represent the nodes of the graph N which are
connected by a set of directed arcs A. A marked Petri net C=(T,P,A,M)
further specifies a mapping M:P->I where the set I assigns the
number of tokens in each place in the net. In Petri net diagrams, places
are represented by circles, transitions by bars, and tokens by black dots.
Typically, places model conditions while transitions model actions.

Figure 43 performs mutual exclusion because initially there is only one token in p3. Both t1 and t2 are enabled but only one may fire since there is only one token in p3. The choice of which transition fires is completely arbitrary. Thus, after a single transition firing, either p1 contains a token or p2 contains a token but both cannot contain a token. The procedures modeled to be in execution by the existence of a token on p1 or p2 never run simultaneously; they are mutually excluded. This form of Petri net modeling has been used extensively in operating systems theory to model the use of resources.
The knowledge in the domains known to Draco can be viewed as a Petri net
where the places represent the components in the Draco domains. The
transitions represent the action of performing a refinement or a
transformation. The arcs which connect the places and transitions
represent the ability to perform a refinement or transformation. Figure
44 represents a part of the net which models the transformation
and refinements discussed in the example of Chapter 5.

The dotted lines in figure 44 represent domain boundaries for the components of the two domains, SIMAL and LISP, involved in the example. Note that the transformation EXPX2 does not cross a domain boundary since it specifies a rule of exchange between statements in a single domain. Similarly, the transitions which represent individual refinement possibilities for a component always cross domain boundaries even if some or all of the resulting output places are in the same domain as the place of the component being refined. A refinement, of course, may refine a component into more than one domain at once.
The Petri net model discussed above provides a model of the
interconnections between components known to Draco through the
transformations and the different refinement alternatives for each
component. It does not model the information in a particular high-level
domain-specific program. The information specific to a particular problem
is modeled by a marking of the Petri net. For each node represented in the
internal form of the domain-specific high-level program a token is placed
on the Petri net representing the knowledge in Draco which represents that
node's semantics. The concept is illustrated in figure 45 for
the simple SIMAL statement X^2+5.

Each node in the internal form tree has a pointer to the token in the marked net which represents the use of a particular component. When a node is refined it uses the knowledge in the associated component. Tokens which represent nodes in other locations of the internal form tree are not disturbed. Only a transformation applied at a particular node may change the token representation of a subtree of the internal form. Refinements only refine a single node.
A formal definition of level of refinement and level of abstraction may be given with respect to the Petri net model of knowledge in Draco.
The level of refinement of a component in a specific problem is the number of refinement transitions which the token which represents that component has traversed since it was initially placed on the net.
The level of abstraction of a component in a specific problem with respect to a target domain is the minimum number of refinement transitions the token which represents that component must traverse in order to occupy a place in the target domain.
In this section we will show that the first two reusability questions and the refinement deadlock question are decidable, and thus can be answered. We will also show that the computational complexity of answering these questions for any practical case is extremely high. It is unknown if the third reusability question is decidable.
The discussion of this section will use a version of the formal model
which models only the use of components during refinement and ignores the
existence of transformations. Figure 46 presents a part of
the formal model which represents the existence of a refinement with
modeling conditions and modeling assertions.

The places in figure 46 represent the existence of some condition, either the use of some component in the program under development or the assertion of some modeling condition. Thus, each possible modeling condition, like the use of singly-linked lists as a representation for strings, is modeled by the existence of a place in the formal model.
For a refinement to be used (i.e. the transition to fire) all the conditions must be indicated by the presence of a token and the component must be used in the developing program, indicated by the presence of a token in the component's place. When a refinement is used it places a token back on each of the condition places which enabled its use, indicating that each modeling decision is still in effect. Furthermore a token is placed on the places representing any modeling assertions made by the refinement. Of course tokens are also placed on the places representing the components used in the refinement.
To answer the reusability questions for a specific problem (Petri net
marking) with respect to a specific target domain some modification of the
net must be performed. First, the places which represent any modeling
decisions from all refinement assertions are individually connected
through a single-input, single-output transition to a newly defined place
we shall call the distinguished place. Second, all the places representing
the use of a component in the target domain are also connected to the
distinguished place through a single-input, single-output transition. The
distinguished place has the structure shown in figure 47.

Once we have modified the knowledge model net as described above, the first reusability question can be cast as the Petri net reachability problem. The reachability problem for Petri nets is as follows: given an initial marking of the net, is there a possible sequence of transitions which will produce a second specified marking of the net. The first reusability question is as follows: given the marking of the knowledge net model from some domain-specific, high-level program and a target domain, is there a sequence of transitions (refinements) such that only one token exists in the distinguished place and all other places are empty. The second reusability question is answered by the sequence of transitions specified to answer the first question.
The Petri net reachability problem has been shown to be decidable [Sacerdote77] and has been given lower bounds in time and space complexity [Lipton76]. Lipton has shown that the reachability problem will require at least an exponential (2^cn) amount of storage space and an exponential amount of time. The exponent (n) is the number of places and their interconnections to transitions. For the reusability questions, the number of places and interconnections is related to the number of components and modeling decisions and could give rise to exponents well over 100 for a single domain. The general reachability algorithm will not be practically applicable. The first two reusability questions are decidable, however.
Some hope still remains for an algorithm which can automatically refine a given domain-specific program in that general Petri nets may be a far too general model where a specific model of less power as discussed by Hack [Hack75] may have lower complexity bounds.
The inclusion of the general transformation mechanism discussed in chapter 3 into the formal model would render the reusability questions undecidable. The transformation mechanism allows the definition of Markov algorithms which are equivalent to Turing machines in computation power. Answering the reusability questions for an arbitrary set of transformations becomes equivalent to answering the halting problem for Turing machines, which is undecidable.
In describing the domains used in the examples of Chapter 5 it was useful to show the relationships between the domains using a directed graph as shown in figures 30 and 31. These graphs point out important considerations for someone interested in developing a set of domains to generate a particular kind of system.
Some domains, such as the TASK domain which provides parallel execution and the Draco domain which provides a model of functions, are domains close to computer science and exist mainly to be built upon. Other domains, such as the ATN domain, are more specialized and used as models by fewer domains. This suggests that one model of domain organization is to have a base domain which specifies a model of the resulting programs. All domains eventually map into this base domain. Computer science modeling domains surround this base domain supplying such things as data structures, control structures, and mathematical routines. On top of the modeling domains would rest the more application-oriented domains. One would expect the reuse of the components in a domain to increase the closer the domain is to the base domain.
There are several attractive candidates for the base domain including languages and computer architecture models. ADA, COBOL, LISP, and the UCSD Pascal P-machine are all languages which would be attractive base domains.
A model of machine architecture for a von Neumann machine is presented by Frazer [Frazer77a] in his work on code generators. Given an ISP description [Bell71] of a machine Frazer's system automatically builds a code generator for a simple von Neumann machine model dependent language for the described machine. The use of this language as the base domain could be one approach to the portability of high-level domain-specific programs between von Neumann machines. A model of a parallel dataflow machine is represented by the ID language [Arvind78]. In both cases, the description languages model the gross architecture of a partic