IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. VOL. 21, NO. 5, SEPTEMBERIOCTOBER 1991 1231
Regular Correspondence
A Self- organizing Binary Decision Tree for
Incrementally Defined Rule- Based Systems
Tony R. Martinez and Douglas M. Campbell
Abstract- This paper presents an adaptive self- organizing concurrent
system ( ASOCS) model for massively parallel processing of incrementally
defined rule systems in such areas as adaptive logic, robotics, logical
inference, and dynamic control. An ASOCS is an adaptive network
composed of many simple computing elements operating asynchronously
and in parallel. This paper focuses on adaptive algorithm 3 ( AA3) and
details its architecture and learning algorithm. It has advantages over
previous ASOCS models in simplicity, implementability, and cost. An
ASOCS can operate in either a data processing mode or a learning mode.
During the data processing mode, an ASOCS acts as a parallel hardware
circuit. In learning mode, rules expressed as Boolean conjunctions are
incrementally presented to the ASOCS. All ASOCS learning algorithms
incorporate a new rule in a distributed fashion in a short, bounded time.
I. INTRODUCTION
This paper presents an adaptive self- organizing concurrent sys-tem
( ASOCS) architecture [ 6], [ 7], [ 101 that guarantees learning
for Boolean rule- based systems in bounded time. This particular
ASOCS uses adaptive algorithm 3 and has significant simplicity,
implementability, and cost advantages over previous ASOCS models
[ SI, [ 9]. Target applications include rule and example based systems
for logical inference, robotics, adaptive logic, fault- recovery, and
real- time dynamic control.
The search for fast and robust computation has increased research
in highly parallel systems with both traditional 121, [ 4] and connec-tionist
[ 5], [ 14] views. Researchers of massively parallel systems seek
speed both during processing and learning. However, programming
and updating massively parallel systems incur tremendous overhead
and complexity.
The goal of ASOCS is to train ( program) a parallel digital
network to solve problems defined by rule- based propositional logic.
A system is trained ( programmed) through the incremental input of
rules expressed as conjunctions of Boolean variables. Real world
applications using rule- based propositional logic are forthcoming [ 3].
An ASOCS is an adaptive network of many simple computing
elements operating in a parallel, asynchronous fashion. ASOCS can
operate in both data processing and data learning modes.
During data processing the system acts as a parallel hardware
circuit; it asynchronously maps input data to output data in
O( max( d, log r t ) ) time, where ( 1 is the maximum depth ( longest path)
of the network, and r1 is the number of network nodes, as is typical
for hardware circuits.
During learning the system reconfigures itself in a distributed
manner to accommodate new ( and perhaps conflicting) rules. ASOCS
potential comes from its ability to guarantee adaptation in (>( log
( n ) ) time for any new rule. Through its learning process the system
discovers the critical variables; it uses these to generalize and classify
large input spaces.
Manuscript received October 12, 1989; revised April 12, 1991.2
The authors are with the Computer Science Department, Brigharn Young
IEEE Log Number 9101213.
University, Provo, UT 84602.
The majority of ASOCS research on adaptive algorithms has
focused on adaptive algorithm 1, adaptive algorithm 2, and adaptive
algorithm 3. Details for AA1 can be found in [ 8]; details for AA2 can
be found in [ 9]. These three algorithms vary dramatically, although
AA3 shares some similarity to AA2.
ASOCS arose from reexamining perceptron [ 121 related ideas.
The basic building block, however, is that of digital programmable
nodes, an idea spawned by the notion of a universal logic module
( ULM) [ 161. Verstraete [ 151 sought methods of programming fixed
ULM structures to solve arbitrary Boolean mappings. ASOCS departs
from these efforts by using a nonpassive network that adapts in a
self- organizing fashion [ 6], [ lo]. This technique has led to models
promising parallel inference, high- speed adaptation, and internal
consistency control. Proof of concept VLSI fabrication of ASOCS
devices has been completed [ l] and other implementations are
currently underway. Formal proof of AA3 is found in [ 17].
Although ASOCS research was initiated with a neural network
emphasis, the proposed mechanisms differ extensively from standard
neural network paradigms. The authors make no claim to be modeling
neural functionality with this model. Rather, distributed, parallel, and
self- organizing paradigms are used in order to attain an improved
computational mechanism, offering speed, fault tolerance, anci ease
of use.
The outline of the paper follows. Section I1 defines the mechanism
of knowledge input. Section I11 described AA3 during processing
mode. Section IV describes AA3 during learning mode. Section V
works in detail a concrete example of AA3 learning. Section VI
extends the model to multiple outputs. Section VI1 discusses the
advantages of ASOCS. Sections VI11 and IX overview simulation
results, comparisons with AA1 and AA2, and current research efforts.
11. KNOWLEDGE INPUT
The atomic knowledge element 1' the instance. Each instance is
a ( partial) function from a set of Boolean variables to a Boolean
variable. Thus each instance is a propositional production rule.
Following are examples of instances:
I) m. 4- B * c
11) .4- BC * 4
111) w. 4- B * - C,
Instance I forces C to become true whenever A and B are false.
Instance I1 forces .- I to become false whenever A and C are true and
B is false. Instance 111 forces C to become false whenever A and B
are false. Instances I and 111 are inconsistent with each other; when
- 4 and B are false, instance I tries to set C to true while instance
Ill tries to set C to false. When the variables of an instance are not
matched, the instance says nothing about the output of the function.
( An implementation may set the output variable to true, to false, or
even to don't know.) Instances are incomplete and partial functions
by definition.
The union of a set of instances defines another partial function.
It is a ( partial) function defined by a set of rules, a functional rule
base. A set of instances is called an instance set. An instance with a
nonnegated variable on its right- hand side is called apositive instance;
an instance with a negated variable on its right- hand side is called a
negative instance. Thus each instance has either negative or positive
polarity.
0018- 9472/ 91$ 01.00 0 1991 IEEE
1232 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 21, NO. 5, SEPTEMBERIOCTOBER 1991
A Boolean variable occurring in one instance and occurring in its
complemented form in another instance is said to be a discriminant
variable for the two instances.
An instance set S is consistent if S does not contain any two
instances S and I’ where S is a positive instance, I- is a negative
instance and 1) they have the same right- hand variable and 2) there
is a set of Boolean values which can simultaneously match the left-hand
sides of X and I-. We require an instance set to be well defined
( consistent). Thus, no two instances have a common set of Boolean
values for which the function produces inconsistent values.
Two fundamental types of inconsistencies exist. The first type is ob-vious;
the same variables define the output inconsistently as in - 4C +
B and *- IC - B. The second type is more subtle. Although the
rules AD + B and C + - B do not have identical left- hand vari-ables,
when A. D. and C are all true, B becomes both true and false.
Fortunately, there is a basic computational tool for inconsistency.
Its proof is omitted.
Lemma: Two instances are inconsistent if and only if they are of
opposite polarities and have no discriminant variable.
I) AB + C
11) BC + - C
111) - BC 3 - C.
For example, Instance I above is inconsistent with I1 because there
is no discriminant variable. Instance I is consistent with 111 because of
the discriminant variable B. Instances I1 and 111 cannot be inconsistent
because they are of the same polarity.
In the ASOCS model, instances are input incrementally. Let - TI
be a new instance and S be an instance set. Either - 1- 1is consistent
with S or else there is at least one instance of S with which - 1- 1
is inconsistent.
An - TI may contain new information or be redundant to informa-tion
already contained in the instance set.
By definition, a - TI is redundant with respect to an instance set S,
if the partial function defined by the S I is already contained in the
partial function defined by S. For example, let the instance set be
D* C
B - C
and let the new instance be AB 3 C. Clearly, the fact that AB forces
C is already contained by the second instance of the original system.
Since the new instance adds no information, parsimony suggests that
we delete it.
By definition, an - VI contains new information with respect to an
instance set S if the partial function defined by the 1- 1 extends the
partial function defined by S. For example, let the instance set S be
- 4 * c
B* C
and let the new instance be --- I - B + C. In this case, the new
instance tells us to set C‘ to true when A and B are false, an extension
of the partial function defined by S.
The principle of parsimony may also apply to new information.
For example, let the instance set be
. AB + C
‘ 4- c + c
B* C
and let the S I be A + C. The new instance contains new in-formation;
it tells us what to do when - 4 is true and B is false,
as well as what to do when - 4 is true and C is false. But more
than that, the original instances AB =+ C and - 4 - C + C are
now redundant as they are but special cases of A + C. If we can
detect such redundancies quickly, then parsimony suggests they be
deleted. However, perhaps more important than parsimony, is that
by removing “ don’t care” information, the system can find critical
features that can aid in generalizing to a good output when the system
receives input for which no training has taken place.
If a new instance is inconsistent with an instance set, then we
give precedence to the newer instance and remove the contradicted
portion of the old instance.
Theorem: Let S1 + Z be a new instance and S2 + - Z be an
old instance. Suppose there is no discriminant variable for the new
and old instance. If S1 is a subset of S2, then every part of the old
instance contradicts the new instance.
Proof: Since S2 is a subset of S1, every set of variables that
realizes S2 ( and forces Z to be false by the old instance) extends
to variables that realize S1 ( which forces Z to be true by the new
instance). Therefore, every part of the partial function defined by
S1 + Z is a rewrite of S2 - 2, that is, the new instance con-tradicts
everything for which the old rule stood. Q. E. D.
Theorem: Let S1 + 2 be a new instance and S2 + - Z be an
old instance. Suppose there is no discriminant variable for the new
and old instance. Suppose S1 is not a subset of S2. Let S3 be those
variables in S1 that are not in S2. Then the part of the old instance
that is not contradicted by the new instance is given by the following
set of instances (- 152 + - Z: I is in S3).
Proof: Since S1 is not a subset of S2, there is a variable in S1
that is not in S2. Thus, the set S3 is not empty. Let I belong to S3.
The instance 5 2 3 - Z is the union of the two rules: IS2 + - Z
and - 1.52 + - Z. Since I belongs to S1 and since there is no
discriminant variable for S1 and S2, we can find Boolean values for
S1, I , and S2 so that all hold, forcing Z to be set inconsistently.
On the other hand, S1 + Z and - IS2 * - Z now share the
discriminant variable I and are therefore not inconsistent. That is,
we may save the - IS2 + - 2 part of the old instance for each
variable in S3. This concludes the proof of the theorem. Q. E. D.
For example, consider the new instance .- ICD + - Z confront-ing
the old instance - 4Ll + Z. Since - 4CD is not a subset of AB we
form the variables in the new instance that are not in the old instance,
namely C and D. The theorem says to replace AB + Z by the pair
of instances - CAB + Z and - D.- IB + Z.
We may therefore take any consistent instance set and any new
instance and produce a new instance set that contains as much of
the old instance set as can be saved with the new instance being
given precedence. The number of instances in the new instance set
may be greater than, less than, or equal to the number of instances
in the original instance set ( depending on the amount of redundancy
or contradiction). In the new instance set, all instances have equal
priority and order is again inconsequential.
Instances may come from human intervention or automated mech-anisms.
Typical learning has more general instances ( those with
fewer antecedent variables) entered first with refinement through more
specific instances ( those with more antecedent variables).
In the AA3 ASOCS implementation of the next section, the
system maintains consistency in a manner invisible to the user.
By dynamically modifying the instance set, the system discovers
which variables are critical in making decisions. This leads to natural
algorithmic generalization through critical variables when the system
receives novel inputs. In the AA3 ASOCS model, the system does
not explicitly store the instance set; instead, the system stores the
information implicitly in a distributed fashion.
111. THE ASOCS AA3 MODEL
In this section we show how the AA3 ASOCS dynamic parallel
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 21, NO. 5, SEPTEMBERIOCTOBER 1991 1233
Binder
Fig. 1
Binder
4A3 architecture.
Error ( inconsistent) Output Error ( no match)
Fig. 3. Collector structure.
Left Parent Right Parent
Fig. 4. Example network configuration
Vanable Node
Child Child
Fig. 2. AA3 network node
network models a consistent instance set. We say that an AA3 ASOCS
model fulfills a consistent instance set if whenever an input forces the
instance set to logically output Z ( or - Z), then the AA3 dynamic
parallel network physically outputs the corresponding Z ( or - Z).
Fig. 1 gives the architecture of the AA3 ASOCS model. This
section only discusses those parts of the architecture needed for
execution. In Section IV we discuss how the model knows how to
represent itself ( how it learns).
During execution mode, only the logic network is active. The input
data flows asynchronously through the network with only propagation
delays. The feedback path allows the system to use an output variable
as an input variable. ( If a clocked register is added at the output
binder, then the system is similar to a dynamically adaptable finite
state machine.)
Each node within the logic network has the structure of Fig. 2.
During execution mode, only the dyadic AND gate is active. Each
node has two inputs called children: the node child and the variable
child.
The node child receives its input from a node; the node child’s
value represents a conjunction of other variables.
The variable child is connected directly to an input variable. The
conjunction of the node child and variable child is available for direct
output as well as for further processing by another node. If a node
outputs to another node, then it must output to exactly two nodes:
its leftparent, and its right parent. Such parent nodes are said to be
siblings with respect to each other.
Each node control unit has memory for a 3- state polarity flag
with value: P. D+. or D-. The symbol P signifies the node is a
Primitive node ( Pnode for short). D+ signifies the node is a positive
Discriminant node ( positive Dnode); D- signifies the node is a
negative Discriminant node ( negative Dnode).
Each node also contains a variable list, which is the total set of
variables over which the node does a logical conjunction.
The overall structure of the AA3 network is that of a binary
decision tree as in Fig. 4. A node is a Dnode if and only if it is
a ( top) leaf in this structure ( has no parents). A node is a Pnode if
and only if it is an internal node in the tree ( has parents).
The output of all positive Dnodes is sent to the positive collector;
the output of all negative Dnodes is sent to the negative collector.
Each collector performs a logical OR of its received values. Since
the overall network is a binary decision tree, exactly one Dnode will
be active for any input. A three node structure handles the collector
outputs in Fig. 3. The middle node is the output node; it outputs Z if
the positive collector is active ( a positive Dnode is active); it outputs
- 2 if the negative collector is active ( a negative Dnode is active).
If both positive and negative collector are inactive, or both collectors
are active, then the network has an error. An error will not happen in
the model as described, but could occur if there is a hardware fault
in a physical implementation of the model.
The bottom node of the network is the root node; it is has no inputs,
its logic gate always outputs true, and it is initially a negative Dnode.
The maximum depth of the network is equal to the number of bound
input and feedback variables plus the root node.
Fig. 4 illustrates a possible AA3 configuration. Each node repre-sents
a conjunction of a set of variables. The symbol at the top of the
node is the polarity and node indicator. The 16 possible outputs of the
network in Fig. 4 are give in Table I. Note that although an instance
1234 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 21, NO. 5, SEPTEMBERIOCTOBER 1991
TABLE I
NETWORKF UNCTIORNE PRESENTATION
A B C D output
0 0 0 0 1
0 0 0 1 1
0 0 1 0 1
0 0 1 1 1
0 1 0 0 0
0 1 0 1 0
0 1 1 0 1
0 1 1 1 1
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 1
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
set represents a partial function, an actual AA3 network represents a
total function on those inputs that are part of the network.
IV. THE AA3 LEARNINAGLG ORITHM
In this section we first discuss the architecture that supports the
learning algorithm and then the learning algorithm itself.
A. Architecture
When the system receives a new instance that contains a new input
variable, then the input binder of Fig. 1 allocates an input line for
the new variable.
When the system receives a new instance the adaption unit of Fig. 1
broadcasts to the logic network the variables and polarity of the new
instance. This information allows all nodes within the network to
work cooperatively and in a distributed fashion.
The node of Fig. 2 contains a control unit able to execute the
learning and deletion algorithm. The control unit is also able to send
messages to its node child and to its siblings ( if they exist).
We emphasize that the network does not store the original instance
set. Indeed, as the example in Section V shows, it is usually logically
impossible to reconstruct the instance set from the network.
B. The Learning Algorithm
We describe the AA3 learning algorithm that tells how a consistent
network reconfigures itself when faced with a new instance. At the
completion of the AA3 learning algorithm ( and at the completion of
its deletion algorithm) the network is still consistent and represents the
functionality of the previous network coupled with new information
from the new instance.
Each node has a type ( Dnode or Pnode), polarity, and variable list,
labeled as node. type, node. polarity and node. variables respectively.
Node. polarity is null for a Pnode.
When the system receives a new instance, the AU broadcasts to
each node the instance. variables and the instance. polarity of the
new instance. Each node then independently follows the following
learning algorithm.
( 01) if ( node. type = D) and
( 02)
( 03)
( 04)
( instance. polarity < > node. polarity) and
there is no discriminant variable between
instance. variables and node. variables then
if ( node. variables is a proper superset of
instance. variab1es) then
( 05) polarity- inversion( n0de)
else begin
( 06)
( 07) DVA ( node, S)
( 08) Self- Deletion ()
S := instance. variables - node. variables
end;
end
C. Remarks
Line ( 01)-( 03): Only a Dnode that has opposite polarity to the
new instance and that cannot be discriminated from the new instance
ever “ learns” ( is modified).
Line ( 04)-( 05); If a node has the same or more variables than the
new instance, then the new instance directly contradicts the node for
all active states. The contradiction will disappear if the node changes
its polarity to that of the new instance. Polarity inversion flips the
polarity of the node and redirects the node’s output to the opposite
collector.
Line ( 06): Since instance. variables is always non- empty and since
it failed line ( 04), some instance. variable is not a node. variable.
Therefore, the set S is not empty. For each variable in S, the
new instance contradicts the node for some state. Therefore, for
each such variable we add two nodes to the network to resolve this
contradiction, We do this recursively through the procedure DVA.
Line ( 07): Informally, the recursive procedure DVA takes a variable
T from S, wires in two new parent nodes for S, deletes 1 ~ from S,
changes the old node to a Pnode, and recursively calls itself for the
concordant parent node if there still are some variables left. More
formally,
Procedure DVA ( node, 5);
( 09) Allocate ( node 1); Allocate ( node 2);
( 10) Set 1. to an element of S;
( 1 1) nodel. polarity := node. polarity.
nodel. child := complement of I
nodel. type := D;
( 12) node2. polarity := instance. polarity,
node2. child := 1.;
node2. type := D;
( 13) node. type := P;
( 14) if IS1 > 1, then DVA ( node2, S - 1.);
D. Remarkr
Line ( 09): Node1 and node2 are the new parent nodes.
Line ( IO): The order of variables chosen is inconsequential
Line ( 11- 12): The new parents become Dnodes of opposite
polarity, each differing in node. variables only by the discriminant
variable I - .
Line ( 13): Since node is no longer a Dnode, change its type to
a Pnode.
Line ( 24): If S has more than one element, then recursively call
DVA for node2 and S - I-.
We give an example of DVA. Consider the positive Dnode of Fig. 5
with variables - 4- B confronting the new instance ,1- BCD * - Z.
The node executes the learning algorithm. The node is a Dnode,
has opposite polarity to the new instance, and has no discriminant
variable with respect to the new instance. Since A N B is not a
superset of - 4- BC D, the set of instance. variab1es- node. variables is
{ C. D}. D1-- 4 is called twice. Assuming the discriminant variable C
is chosen first, Fig. 6 shows the modification after one DVA recursion,
and Fig. 7 the final reconfiguration after both.
All DVA modifications take place independently and in parallel.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 21, NO. 5, SEPTEMBERIOCTOBER 1991 1235
A
Fig. 5. Initial positive Dnode.
t T
-€ C
Fig. 6. After one DVA. b\ ,/$
c
- c C
Fig. 7. After complete DVA
T T
Fig. 8. Initial network section.
D
Fig. 9. Final network section after self- deletion.
The final step of the algorithm is self- deletion. Each Dnode has
a unique sibling. At the end of the learning algorithm, each node
independently executes the following deletion algorithm.
( 15) Self- Deletion
( 16) If ( node. type = D) and
( 17) ( node. polarity = sibling. polarity) then begin
( 18) Inform- child( node)
( 19) Self- Delete( node);
Self- Delete( sib1ing).
End;
End;
Lines ( 15)-( 19). Sibling Dnodes that have the same polarity are
superfluous; if their child node fires, then one of the two nodes must
fire with their common polarity. Therefore, delete both nodes and
make their child node a Dnode of the same polarity as the parents.
This procedure is then recursively initiated by the new Dnode with
its sibling.
For example, assume initially the section of network shown in
Fig. 8.
Assume that the new instance - A- BE =+ 2 is broadcast. Nodes
1 and 6 are the only discordant Dnodes. Node 6 is discriminated by
variable B. Node 1 is superset so it does polarity inversion, becoming
a positive Dnode. Nodes 1 and 2 are both positive Dnodes and they
self- delete, causing node 3 to become a positive Dnode. At this point
Fig. 10. Initial network configuration.
nodes 3 and 4 are both positive Dnodes and they self- delete causing
node 5 to become a positive Dnode. Since the sibling of node 5 ( node
6) is a negative Dnode, no more self- deletion occurs in this section
of the network. The final network section after these modifications
is shown in Fig. 9.
Like DVA, self- deletion occurs independently and in parallel
throughout the network.
v . ILLUSTRATION OF THE DISTRIBUTEMD ECHANISM
In this section we illustrate each aspect of AA3 with an example.
The example demonstrates that the original instance set may be
distributed throughout the system and that the original instance set
need not be recoverable from the network.
We start the network in the null state of Fig. 10. We describe the
evolution of the network as seven instances are input.
Instunce I - A- BC =+ - 2 ( negative): The root node begins as
a negative Dnode. Since the initial network has no positive Dnodes,
no matching occurs. The network remains as given in Fig. 10.
Instance I1 - - ABDE =+ - Z ( negative): Since Fig. 10 has no
positive Dnodes, the network remains unchanged.
Instance Ill - BDE =+ - Z ( negative): Since there still are no
positive Dnodes in Fig. 10, the network remains unchanged.
Instance N - ABC =+ Z ( positive): Since the root node has
no variables, there is no discriminant variable for the root node and
the new instance. Since ABC is a superset of the variables of the
1236 IEEE TRANSACTIONS ON SYSTEMS. MAN. AND CYBERNETICS, VOL. 21, NO. 5, SEPTEMBERIOCTOBER 1991
Fig. 11. Modified network
t t
Fig. 12. Modified network
root node, a DVA is done for each of the variables A. B. C. If the
DVA is done in alphabetical order, then Fig. 11 is produced. Since
no self deletion is possible, the final network configuration remains
as in Fig. 11.
Instance V - -- 4D + 2 ( positive): The variables - AD are
compared with the variables of the three negative Dnodes ( 2), ( 4),
( 6). - 4 is a discriminant variable for node ( 4) and for node ( 6). Node
( 2) and the - TI do not share a discriminant variable. The variable
list --- ofI n ode ( 2) is a subset of the variable list -,- ID of the - 1- 1.
Therefore, DVA is done at node 2 with the variable D. Fig. 12 gives
the network. Since no self- deletion is possible the network remains
as in Fig. 12.
- 4- B- C are compared with the variables of the positive Dnodes ( 5)
and ( 8). C is a discriminant variable for node ( 5); is a discriminant
variable for node ( 8). Since there are no positive Dnode matches,
the network remains unchanged. The S I is already fulfilled by the
network.
Instance VI/ - B + Z ( positive): The variable B is compared
with the variables of the negative Dnodes ( 4), ( 6), and ( 7). B is a
discriminant variable for node ( 4). Neither nodes ( 6) nor node ( 7)
have a discriminant variable for B 3 2. The variables of node ( 6),
AB - C, are a superset of the 3- 1. Node ( 6) therefore undergoes
polarity inversion. Since the variables of node ( 7) are not a superset
Instance VI - .- I - B - C + - 2 ( negative): The variables
- A
A
Fig. 13. After polarity inversion and DVA
? r
of the SI, DVA must be done. DVA is done for each variable in
B 3 2 that does not appear in A- D, namely for B alone. Fig. 13
gives the network. Self- deletion is now possible. Nodes ( 5) and ( 6)
are siblings of the same ( positive) polarity; they delete themselves;
their child, node ( 3), becomes a positive Dnode as in Fig. 14. Self-deletion
continues. Nodes ( 3) and ( 4) are siblings, but have opposite
polarity. Nodes ( 9) and ( 10) are siblings, but have opposite polarity.
There are no other siblings; the self- deletion stops. The final network
is that of Fig. 14.
Note that the network fulfills the instance set with a distributed
mechanism. Indeed, no single node is responsible for the final instance
B 3 2. The responsibility is spread across the three positive Dnodes
( 3), ( 8), and ( 9). One of these nodes is active whenever B is true,
depending on the value of the other inputs, thus fulfilling the instance.
VI. MULTIPLEO UTPUTS
Having discussed the architecture and learning and deletion algo-rithm
for a single output, we extend to multiple outputs. We first
discuss the changes to the architecture and then the changes to the
algorithms.
Architecturally, a node has an additional polarity flag for each
output variable for which it is a Dnode. Each output variable has
its own positive and negative collector. A node could be a positive
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 21, NO. 5, SEPTEMBERIOCTOBER 1991 1237
Dnode for one variable, a negative Dnode for another variable, and
a Pnode for a third variable.
During any learning cycle, the AU sends three items: the polarity
of the instance, its variables, and the output variable name. Each node
must check whether it is a Dnode for the output variable of the new
instance. During DVA, a new Dnode must also set the corresponding
polarity flag for the current output variable. If a node is being used for
other variables, then when told to self- delete it simply sets its current
output variable flag to nil. If a node is no longer being used for other
variables, then when told to self- delete it can do a true node deletion.
The real power and efficiency of the system becomes evident with
multiple outputs; sharing of network nodes can be done with multiple
outputs, leading to more efficient use of hardware.
VII. FREEDOMFR OM EXPONENTIACLO MPLEXln
The majority of current connectionist models assume an initial
fixed number of nodes and a static interconnect where the dynamic
aspect of the network is in the weights on links between nodes. The
ASOCS model assumes a dynamic structure where the number of
nodes may increase or decrease and links between nodes are also
dynamic. ( Efficient implementation technologies to allow dynamic
structure are mentioned below.)
The main advantage of a dynamic structure is the potential for the
model to solve complex functions without necessity of exponential
space or time demands. For example, assume a static binary decision
tree ( BDT) that is universal for 20 input variables. By universal it
is meant that it will be able to solve72any of the possible Boolean
functions of 20 variables. There are 22 functions of r i variables and
in this case the BDT would have to have 2” or 1 000 000 nodes in
order to guarantee representation of an arbitrary function of 20 inputs.
As can be seen, the number of nodes grows exponentially. It appears
that any static neural network model that guarantees arbitrary function
learning will require an exponential amount of space whether it be in
number of nodes, number of links, resolution of weights, etc. Use of a
dynamically structured neural network allows models that guarantee
learning of complex functions without requiring exponential space.
There are random functions that appear to always require expo-nential
demands, such as recognizing white noise on a raster screen.
However, these types of applications are expressly outside of the ap-plication
domain targeted by both artificial and natural neural systems
[ l l ] . The application space for which neural networks and ASOCS
are targeted feature input/ output mappings where generalization and
minimization can take place, thus allowing a parsimonious network
solution. However, since the specific mappings are not known U
priori, it requires a dynamic network structure in order to take
advantage of this occurrence.
For example, in representing a Boolean function of 20 variables
with a BDT, some branches of the tree may be of length 20, while
the majority, when minimized, may be much shorter. The dynamic
structure accommodates those parts of the network requiring the full
complexity, while requiring only sufficient nodes for the rest of the
network.
There are a number of ways to implement a logically dynamic
network. One model of AA3 uses small BDT’s, with a maximum
depth of say 10. Complex connections exceeding 10 variables pass
through a dynamic router into another 10- depth AA3 module. Extra
modules are thus only used where required. Another class of mech-anisms
uses a logically independent network with a single network
node representing a Dnode, with the nodes connected to a broadcast
topology [ 13], i. e., tree, mesh, optical, etc. This allows dynamic
logical structure while maintaining parsimony of node usage at an
implementation level.
VIII. SIMULATIOANN D COMPARISOWNI TH
Two OTHER ALOGRITHMS
Software simulation of AA3 indicates that each instance generates
two nodes on the average [ 6] for a single variable output. As discussed
in Section VI, this statistic should improve with multiple variable
outputs.
AA3 improves AA1 [ 8] in two ways.
1) Adaption unirfunctionality: In AA1 the adaption unit must store
and maintain a consistent instance set. In AA3 the adaption unit
merely broadcasts the instance to the network; the network handles
storage and consistency in a distributed manner.
2) Memory requirements: In AA1 the memory requirement per
node is proportional to the product of the instance set size and the
number of output variables. In AA3 a single variable list is stored
at each node, and 2- bits are required for each output variable that a
Dnode defines.
Although AA3 is similar to AA2 [ 6], [ l l ] , it improves AA2 in
two significant ways.
1) Simpler learning algorithm: AA2 does comparison with all
nodes, building a node that exactly matches each new instance. AA3
matches only with discordant Dnodes, and requires only that the
network can fulfill the instance set.
2) Ease of implementation: AA2 requires dynamic interconnect
between network nodes. AA3 uses a fixed interconnect model, with
improved potential for VLSI design.
IX. CONCLUSION
This paper introduces adaptive algorithm 3 ( AA3) for adaptive
1) guaranteed mapping of arbitrary Boolean functions,
2) bounded linear learning time ( logarithmic with the number of
nodes)
3) stability of previously learned patterns,
4) ability to extract critical features from a large environmental
input.
self- organizing concurrent systems. It features the following:
AA3 implements an ASOCS model with:
5) negligible memory requirements,
6) distributed storage and maintenance of the knowledge base,
7) a simple learning algorithm,
8) a fixed interconnect.
New models that allow simpler implementation and a more robust
instance mechanisms are now being proposed. In order to give the
model expanded utility in real- world applications, current research
thrusts include:
extension of ASOCS data and functional primitives to higher
order structures ( including analog and multistate variables),
investigation and enhancements of the ASOCS feedback mech-anism
to allow temporal dynamics and sequential algorithms,
new models with improved speed, programmability, and fault
tolerance,
improved generalization mechanisms for handling inputs for
which no explicit training has taken place.
REFERENCES
[ I ] J. Chang and J. J. Vidal, “ Inferencing in hardware,” in Proc. MCC- Univ.
Res. Symp., Austin, TX, July 1987.
121 L. S. Haynes, R. Lau, D. Siewiorek, and D. Mizell. “ A survey of highly
parallel computing,” Computer, vol. 15, no. 1, pp. 9- 24, 1982.
[ 3] J. J. Helly, W. V. Bates, and S. Kelem, “ A representational basis for
the development of a distributed expert system for space shuttle flight
control,” NASA Tech. Memo. 58258, May 1984.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 21, NO. 5, SEPTEMBEWOCTOBER 1991
K. Hwang, “ Advanced parallel processing with supercomputer architec-tures,”
Proc. IEEE, vol. 75, pp. 1348- 1379, 1987.
T. Kohonen, Self- Organization and Associative Memory. New York:
Springer- Verlag, 1984.
T. R. Martinez, “ Adaptive self- organizing logic networks,” Ph. D. dis-sertation,
Tech. Rep. CSD 860093, Univ. Calif., Los Angeles, May
1986.
- , “ Adaptive self- organizing concurrent systems,” in Progress in
Neural Networks, vol. 1, 0. Omidvar, Ed. Nonvood, NJ: Ablex, ,
T. R. Martinez and J. J. Vidal, “ Adaptive parallel logic networks,” J.
Parallel and Distributed Computing, vol. 5, no. 1, pp. 26- 58, 1988.
T. R. Martinez and D. G. Campbell, “ A self- adjusting dynamic logic
module,” J. Parallel and Distributed Computing, vol. 11, no. 4, pp.
T. R. Martinez, “ Self- organizing binary decision trees: Towards VLSI
connectionist computing,” Int. J. Computer- Aided VZSI Design, to be
published.
- , “ Neural network applicability: Classifying the problem space,”
in Proc. LASTED Int. Symp. Expert Systems and Neural Networks, pp.
4144, Aug. 1989.
F. Rosenblatt, Principles of Neurodynamics. Washington, DC: Spartan
Books, 1962.
G. Rudolph and T. R. Martinez, “ DNA Towards an implementation of
ASOCS,” in Proc. IASTED Int. Symp. Expert Syst. and Neural Networks,
pp. 12- 15, Aug. 1989.
D. Rumelhart and J. McClelland, Parallel Distributed Processing:
Explorations in the Microstructure of Cognition, vol. I. Cambridge,
MA MIT Press, 1986.
R. A. Verstraete, “ Assignment of functional responsibility in percep-trons,”
Ph. D. dissertation, Comput. Sci. Dept., Univ. Calif., Los Ange-les,,
June 1986.
S. S. Yau and C. K. Tang, “ Universal logic circuits and their modular
realizations,” in AFIPS Con$ Proc., vol. 32, pp. 297- 305, 1968.
C. Barker and T. R. Martinez, “ Proof of correctness for ASCOS AA3
networks,” to be published.
ch. 5, pp. 105- 126, 1990.
303- 313, 1991.
the “ actual” but unknown probability distribution over the set of
events ( states, conditions) in terms of which the problem is defined.
However, the probabilistic information available for decisions under
risk and uncertainty may also be characterized in this way. For
decision under risk, the set li is a singleton:
Ii = { p } .
For decision under uncertainty, I< is the entire simplex of probability
distributions over the event space of the problem. This perspective
facilitates a unified approach to decision making using various forms
of probabilistic information.
Here, decision problems are considered with the following com-ponents:
a set of possible acts A = { a1 .... . am}, a set of pos-sible
states ( or events) S = { SI,. . . , s,,}. and a utility function
U : S x A + R, where S is the Cartesian product of a ( finite)
set of finite variable domains:
S = dom( l/,) = x, cc, dom( v).
Probabilities p ( s J ) are determined only to the extent to which they
may be inferred from a given collection of marginal probability
distributions defined on Cartesian products of the domains of subsets
of some set V of variables such that, usually, V,
In general, the marginals determine a nonsingleton proper subset
I< of Pv, . the set of all possible distributions over S ( unconstrained
by compatibility with any given set of marginals), any member of
which could be the actual distribution. Thus, the type of decision
problem under consideration is one of decision making under partial
uncertainty proper, although it may in special cases ( due either to the
structure over which probabilities are available and its relation to the
set S or to the quantities involved) reduce to one of decision making
V.
Decisions with Probabilities over Finite Product Spaces
Michael Pittarelli
Abstract- Techniques for decision making with probabilities over finite
product spaces are discussed. In general, the type of decision problem
generated by the available probabilistic information is one of decision
under partial uncertainty: the probability distribution over the event
space for the problem is determined only to the extent that it is contained
in a convex polyhedron of distributions. The structure of this set makes
computationally feasible the application of any of the various criteria
for decision making with indeterminate probabilities that have appeared
in the literature. Algorithms are developed for economically reducing the
size of sets guaranteed to contain the unknown distribution over the event
space for a given problem, thereby improving the quality of the decision
made using any criterion.
under risk or under uncertainty.
The restriction that the set S of relevant events be a Cartesian
product of finite variable domains is not as severe as it might at first
I. INTRODUCTION
Over the past 30 years, many authors have found it useful to
classify decision problems into three types: decision under risk,
decision under uncertainty, and decision under partial uncertainty.
Partial uncertainty is usually characterized as involving knowledge of
a set of distributions li such that any element p of li is potentially
Manuscript received April 1, 1990; revised April 12, 1991.
The author is with the Department of Computer and Information Science,
State University of New York Institute of Technology, Utica, NY 13504- 3050.
IEEE Log Number 9101212.
seem. The variables v E V,, may represent attributes in terms of
which a collection of entities is classified, as in the relational model
of data [ 12] or in statistical analysis of categorical variables [ 5].
The collections of marginals available for a given decision problem
are themselves treated as probabilistic databases [ Z] here. They are
manipulated via an algebra analogous to the relational algebra for
relational databases in order to reduce the size of the set li of
distributions over the set r/, compatible with the given marginals,
while at the same time keeping the computational expense reasonably
low. The effect of the reduction in size of K is to increase the
likelihood that a single best action ( relative to many different types of
criteria) will be identified, thereby overcoming to the extent possible
what may be viewed as the “ imperfection,” relative to the given
decision problem, of the available data [ 14].
A simple algebra of probability over finite product spaces is
presented in Section 11.
11. NOTATIONA ND ALGEBRA
A set of variables I.’ = { V I . . . . . v k } over which a probability
distribution p is defined is referred to as the scheme for p.
A collection P = { pl , . . . , p,} of probability distributions over
schemes of finite variables will be referred to as a ( probabilistic)
database [ 2]. The database P = { p . p,} has the strucfure
X = { VI,. . . , IT,,,} w, here V, is the scheme for distribution p,. Let
dom( X) = dom( V1 U . . . U V,). ( As in relational database theory,
some arbitrary but fixed linear ordering on the variables is assumed.)
0018- 9472/ 91$ 01.00 0 1991 IEEE