Neural Network:

A neural network is a system composed of many simple processing elements operating in parallel whose function is determined by network structure, connection strengths, and the processing performed at computing elements or nodes.

A neural network is a massively parallel-distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects:

1. Knowledge is acquired by the network through a learning process.

2. Interneuron connection strengths known as synaptic weights are used to store the knowledge.

ANNs have been applied to an increasing number of real-world problems of considerable complexity. Their most important advantage is in solving problems that are too complex for conventional technologies - problems that do not have an algorithmic solution or for which an algorithmic solution is too complex to be found. In general, because of their abstraction from the biological brain, ANNs are well suited to problems that people are good at solving, but for which computers are not. These problems include pattern recognition and forecasting (which requires the recognition of trends in data).

Introduction:

Artificial neural networks (ANNs) are computational paradigms which implement simplified models of their biological counterparts, biological neural networks. Biological neural networks are the local assemblages of neurons and their dendritic connections that form the (human) brain accordingly.

ANNs are characterized by

* Local processing in artificial neurons (or processing elements, PEs),

* Massively parallel processing, implemented by rich connection pattern between PEs,

* The ability to acquire knowledge via learning from experience,

* Knowledge storage in distributed memory, the synaptic PE connections.

The attempt of implementing neural networks for brain-like computations like patterns recognition, decisions making, motory control and many others is made possible by the advent of large scale computers in the late 1950's. Indeed, ANNs can be viewed as a major new approach to computational methodology since the introduction of digital computers.

Although the initial intent of ANNs was to explore and reproduce human information processing tasks such as speech, vision, and knowledge processing, ANNs also demonstrated their superior capability for classification and function approximation problems. This has great potential for solving complex problems such as systems control, data compression, optimization problems, pattern recognition, and system identification.

Artificial neural networks were originally developed as tools for the exploration and reproduction of human information processing tasks such as speech, vision, olfaction, touch, knowledge processing and motor control. Today, most research is directed towards the development of artificial neural networks for applications such as data compression, optimization, pattern matching, system modeling, function approximation, and control. One of the application areas to which we apply artificial neural networks is flight control. Artificial neural networks give control systems a variety of advanced capabilities. We are currently developing a neural network control system for a wave rider shaped vehicle called LOFLYTE TM. This 23-foot vehicle will demonstrate the control system at subsonic speeds. A successful flight will pave the way for super sonic and hypersonic versions of the vehicle.

Since artificial neural networks are highly parallel systems, conventional computers are unsuited for neural networks algorithms. Special purpose computational hardware has been constructed to efficiently implement artificial neural networks. Accurate Automation has developed a Neural Network Processor (NNP®). This hardware will allow us to run even the most complex neural networks in real time. The NNP TM is capable of multiprocessor operation in Multiple-Instruction-Multiple-Data (MIMD) fashion. It is the most advanced digital neural network hardware in existence. Each NNP TM system is capable of implementing 8K neurons with 32K interconnections per processor. The computational capability of a single processor 140M connections (8 bit multiply-accumulates) per second (35MHz). An 8 processor NNP® would be capable of over one billion connections per second. The NNP® architecture is extremely flexible and any neuron is capable of interconnecting with other neuron in the system. The NNP TM is implemented on both VME and PC compatible cards.

Historical background:

The study of the human brain is thousands of years old. With the advent of modern electronics, it was only natural to try to harness this thinking process. The first step toward artificial neural networks came in 1943 when Warren McCulloch, a neurophysiologist, and a young mathematician, Walter Pitts, wrote a paper on how neurons might work. They modeled a simple neural network with electrical circuits.

Reinforcing this concept of neurons and how they work was a book written by Donald Hebb. The Organization of Behavior was written in 1949. It pointed out that neural pathways are strengthened each time that they are used.

As computers advanced into their infancy of the 1950s, it became possible to begin to model the rudiments of these theories concerning human thought. Nathanial Rochester from the IBM research laboratories led the first effort to simulate a neural network. That first attempt failed. But later attempts were successful. It was during this time that traditional computing began to flower and, as it did, the emphasis in computing left the neural research in the background.

Yet, throughout this time, advocates of "thinking machines" continued to argue their cases. In 1956 the Dartmouth Summer Research Project on Artificial Intelligence provided a boost to both artificial intelligence and neural networks. One of the outcomes of this process was to stimulate research in both the intelligent side, AI, as it is known throughout the industry, and in the much lower level neural processing part of the brain.

In the years following the Dartmouth Project, John von Neumann suggested imitating simple neuron functions by using telegraph relays or vacuum tubes. Also, Frank Rosenblatt, a neuro-biologist of Cornell, began work on the Perceptron. He was intrigued with the operation of the eye of a fly. Much of the processing which tells a fly to flee is done in its eye. The Perceptron, which resulted from this research, was built in hardware and is the oldest neural network still in use today. A single-layer perceptron was found to be useful in classifying a continuous-valued set of inputs into one of two classes. The perceptron computes a weighted sum of the inputs, subtracts a threshold, and passes one of two possible values out as the result. Unfortunately, the perceptron is limited and was proven as such during the "disillusioned years" in Marvin Minsky and Seymour Papert's 1969 book Perceptrons.

In 1959, Bernard Widrow and Marcian Hoff of Stanford developed models they called ADALINE and MADALINE. These models were named for their use of Multiple ADAptive LINear Elements. MADALINE was the first neural network to be applied to a real world problem. It is an adaptive filter, which eliminates echoes on phone lines. This neural network is still in commercial use.

Unfortunately, these earlier successes caused people to exaggerate the potential of neural networks, particularly in light of the limitation in the electronics then available. This excessive hype, which flowed out of the academic and technical worlds, infected the general literature of the time. Disappointment set in, as promises were unfilled. Also, a fear set in as writers began to ponder what effect "thinking machines" would have on man. Asimov's series on robots revealed the effects on man's morals and values when machines where capable of doing all of mankind's work. Other writers created more sinister computers, such as HAL from the movie 2001.

These fears, combined with unfulfilled, outrageous claims, caused respected voices to critique the neural network research. The result was to halt much of the funding. This period of stunted growth lasted through 1981.

In 1982 several events caused a renewed interest. John Hopfield of Caltech presented a paper to the national Academy of Sciences. Hop field’s approach was not to simply model brains but to create useful devices. With clarity and mathematical analysis, he showed how such networks could work and what they could do. Yet, Hop field’s biggest asset was his charisma. He was articulate, likeable, and a champion of a dormant technology.

At the same time, another event occurred. A conference was held in Kyoto, Japan. This conference was the US-Japan Joint Conference on Cooperative/Competitive Neural Networks. Japan subsequently announced their Fifth Generation effort. US periodicals picked up that story, generating a worry that the US could be left behind. Soon funding was flowing once again.

By 1985 the American Institute of Physics began what has become an annual meeting - Neural Networks for Computing. By 1987, the Institute of Electrical and Electronic Engineer's (IEEE) first International Conference on Neural Networks drew more than 1,800 attendees.

By 1989 at the Neural Networks for Defense meeting Bernard Widrow told his audience that they were engaged in World War IV, "World War III never happened," where the battlefields are world trade and manufacturing. The 1990 US Department of Defense Small Business Innovation Research Program named 16 topics, which specifically targeted neural networks with an additional 13 mentioning the possible use of neural networks.

Today, neural networks discussions are occurring everywhere. Their promise seems very bright, as nature itself is the proof that this kind of thing works. Yet, its future, indeed the very key to the whole technology, lies in hardware development. Currently most neural network development is simply proving that the principal works. This research is developing neural networks that, due to processing limitations, take weeks to learn. To take these prototypes out of the lab and put them into use requires specialized chips. Companies are working on three types of neuro chips - digital, analog, and optical. Some companies are working on creating a "silicon compiler" to generate a neural network Application Specific Integrated Circuit (ASIC). These ASICs and neuron-like digital chips appear to be the wave of the near future. Ultimately, optical chips look very promising. Yet, it may be years before optical chips see the light of day in commercial applications.

What is neural network?

Neural Networks are a different paradigm for computing:

· Von Neumann machines are based on the processing/memory abstraction of human information processing.

· Neural networks are based on the parallel architecture of animal brains.

Neural networks are a form of multiprocessor computer system, with

· Simple processing elements

· A high degree of interconnection

· Simple scalar messages

· Adaptive interaction between elements

A biological neuron may have as many as 10,000 different inputs, and may send its output (the presence or absence of a short-duration spike) to many other neurons. Neurons are wired up in a 3-dimensional pattern.

Real brains, however, are orders of magnitude more complex than any artificial neural network so far considered.

A type of artificial intelligence that attempts to imitate the way a human brain works. Rather than using a digital model, in which all computations manipulate zeros and ones, a neural network works by creating connections between processing elements, the computer equivalent of neurons. The organization and weights of the connections determine the output.

Neural networks are particularly effective for predicting events when the networks have a large database of prior examples to draw on. Strictly speaking, a neural network implies a non-digital computer, but neural networks can be simulated on digital computers.

An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems.

ANNs, like people, learn by example. These networks have the capacity to learn, memorize and create relationships amongst data. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.

There are many different types of ANN but some are more popular than others. The most widely used ANN is known as the Back Propagation ANN. This type of ANN is excellent at prediction and classification tasks. Another is the Kohonen or Self Organizing Map, which is excellent at finding relationships amongst complex sets of data.

Why we need Neural Networks:

Why would anyone want a `new' sort of computer?

What are (everyday) computer systems good at...and not so good at?

Good at	Not so good at
Fast arithmetic	Interacting with noisy data or data from the environment
Doing precisely what the programmer programs them to do	Massive parallelism
	Massive parallelism
	Fault tolerance
	Adapting to circumstances

Where can neural network systems help?

· Where we can't formulate an algorithmic solution.

· Where we can get lots of examples of the behavior we require.

· Where we need to pick out the structure from existing data.

Why use neural networks?

Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.

Other advantages include:

1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.

2. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.

3. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.

4. Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with

Why Are ANNs Better?

1. They deal with the non-linearity in the world in which we live.

2. They handle noisy or missing data.

3. They create their own relationship amongst information - no equations!

4. They can work with large numbers of variables or parameters.

5. They provide general solutions with good predictive accuracy.

From Human Neurons to Artificial Neurons:

How the Human Brain Learns?

Much is still unknown about how the brain trains itself to process information, so theories abound. In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity from the axon into electrical effects that inhibit or excite activity in the connected neurones. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes.

Corresponding between biological and artificial terminology:

Neurons	Node/Unit/Cell/Neurode
Synapse	Connection/Edge/Line
Synaptic Efficiency	Connection Strength/Weight
Firing Frequency	Node Output

The Structure:

The Structure of the Nervous System:

For our purpose, it will be sufficient to know that the nervous system consists of neurons, which are connected to each other in a rather complex way. Each neuron can be thought of as a node and the interconnections between them are edges.

Such a structure is called as a directed graph. Further, each edge has a weight associated with it, which represents how much the two neurons, which are connected by it, can interact. If the weight is more, then the two neurons can interact much more - a stronger signal can pass through the edge.

Functioning of the Nervous System:

The nature of interconnections between 2 neurons can be such that - one neuron can either stimulate or inhibit the other. An interaction can take place only if there is an edge between 2 neurons. If neuron A is connected to neuron B as below with a weight w, then

if A is stimulated sufficiently, it sends a signal to B. The signal depends on the weight w, and the nature of the signal, whether it is stimulating or inhibiting. This depends on whether w is positive or negative. If sufficiently strong signals are sent, B may become stimulated.

Note that A will send a signal only if it is stimulated sufficiently, that is, if its stimulation is more than its threshold. Also if it sends a signal, it will send it to all nodes to which it is connected. The threshold for different neurons may be different. If many neurons send signals to A, the combined stimulus may be more than the threshold.

Next if B is stimulated sufficiently, it may trigger a signal to all neurons to which it is connected.

Depending on the complexity of the structure, the overall functioning may be very complex but the functioning of individual neurons is as simple as this. Because of this we may dare to try to simulate this using software or even special purpose hardware.

Major Components of an Artificial Neuron:

This section describes the seven major components, which make up an artificial neuron. These components are valid whether the neuron is used for input, output, or is in one of the hidden layers.

Component 1. Weighting Factors: A neuron usually receives many simultaneous inputs. Each input has its own relative weight, which gives the input the impact that it needs on the processing element's summation function. These weights perform the same type of function as do the varying synaptic strengths of biological neurons. In both cases, some inputs are made more important than others so that they have a greater effect on the processing element as they combine to produce a neural response.

Weights are adaptive coefficients within the network that determine the intensity of the input signal as registered by the artificial neuron. They are a measure of an input's connection strength. These strengths can be modified in response to various training sets and according to a network's specific topology or through its learning rules.

Component 2. Summation Function: The first step in a processing element's operation is to compute the weighted sum of all of the inputs. Mathematically, the inputs and the corresponding weights are vectors which can be represented as (i1, i2 . . . in) and (w1, w2 . . . wn). The total input signal is the dot, or inner, product of these two vectors. This simplistic summation function is found by multiplying each component of the i vector by the corresponding component of the w vector and then adding up all the products. Input1 = i1 * w1, input2 = i2 * w2, etc., are added as input1 + input2 + . . . + inputn. The result is a single number, not a multi-element vector.

Geometrically, the inner product of two vectors can be considered a measure of their similarity. If the vectors point in the same direction, the inner product is maximum; if the vectors point in opposite direction (180 degrees out of phase), their inner product is minimum.

The summation function can be more complex than just the simple input and weight sum of products. The input and weighting coefficients can be combined in many different ways before passing on to the transfer function. In addition to a simple product summing, the summation function can select the minimum, maximum, majority, product, or several normalizing algorithms. The specific algorithm for combining neural inputs is determined by the chosen network architecture and paradigm.

Some summation functions have an additional process applied to the result before it is passed on to the transfer function. This process is sometimes called the activation function. The purpose of utilizing an activation function is to allow the summation output to vary with respect to time. Activation functions currently are pretty much confined to research. Most of the current network implementations use an "identity" activation function, which is equivalent to not having one. Additionally, such a function is likely to be a component of the network as a whole rather than of each individual processing element component.

Component 3. Transfer Function: The result of the summation function, almost always the weighted sum, is transformed to a working output through an algorithmic process known as the transfer function. In the transfer function the summation total can be compared with some threshold to determine the neural output. If the sum is greater than the threshold value, the processing element generates a signal. If the sum of the input and weight products is less than the threshold, no signal (or some inhibitory signal) is generated. Both types of response are significant.

The threshold, or transfer function, is generally non-linear. Linear (straight-line) functions are limited because the output is simply proportional to the input. Linear functions are not very useful. That was the problem in the earliest network models as noted in Minsky and Papert's book Perceptrons.

The transfer function could be something as simple as depending upon whether the result of the summation function is positive or negative. The network could output zero and one, one and minus one, or other numeric combinations. The transfer function would then be a "hard limiter" or step function.

Another type of transfer function, the threshold or ramping function, could mirror the input within a given range and still act as a hard limiter outside that range. It is a linear function that has been clipped to minimum and maximum values, making it non-linear. Yet another option would be a sigmoid or S-shaped curve. That curve approaches a minimum and maximum value at the asymptotes. It is common for this curve to be called a sigmoid when it ranges between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1. Mathematically, the exciting feature of these curves is that both the function and its derivatives are continuous. This option works fairly well and is often the transfer function of choice. Other transfer functions are dedicated to specific network architectures and will be discussed later.

Prior to applying the transfer function, uniformly distributed random noise may be added. The source and amount of this noise is determined by the learning mode of a given network paradigm. This noise is normally referred to as "temperature" of the artificial neurons. The name, temperature, is derived from the physical phenomenon that as people become too hot or cold their ability to think is affected. Electronically, this process is simulated by adding noise. Indeed, by adding different levels of noise to the summation result, more brain-like transfer functions are realized. To more closely mimic nature's characteristics, some experimenters are using a gaussian noise source. Gaussian noise is similar to uniformly distributed noise except that the distribution of random numbers within the temperature range is along a bell curve. The use of temperature is an ongoing research area and is not being applied to many engineering applications.

NASA just announced a network topology, which uses what it calls a temperature coefficient in a new feed-forward, back-propagation learning function. But this temperature coefficient is a global term, which is applied to the gain of the transfer function. It should not be confused with the more common term, temperature, which is simple noise being added to individual neurons. In contrast, the global temperature coefficient allows the transfer function to have a learning variable much like the synaptic input weights. This concept is claimed to create a network, which has a significantly faster (by several order of magnitudes) learning rate and provides more accurate results than other feedforward, back-propagation networks.

Component 4. Scaling and Limiting: After the processing element's transfer function, the result can pass through additional processes, which scale and limit. This scaling simply multiplies a scale factor times the transfer value, and then adds an offset. Limiting is the mechanism which insures that the scaled result does not exceed an upper or lower bound. This limiting is in addition to the hard limits that the original transfer function may have performed.

This type of scaling and limiting is mainly used in topologies to test biological neuron models, such as James Anderson's brain-state-in-the-box.

Component 5. Output Function (Competition): Each processing element is allowed one output signal, which it may output to hundreds of other neurons. This is just like the biological neuron, where there are many inputs and only one output action. Normally, the output is directly equivalent to the transfer function's result. Some network topologies, however, modify the transfer result to incorporate competition among neighboring processing elements. Neurons are allowed to compete with each other, inhibiting processing elements unless they have great strength. Competition can occur at one or both of two levels. First, competition determines which artificial neuron will be active, or provides an output. Second, competitive inputs help determine which processing element will participate in the learning or adaptation process.

Component 6. Error Function and Back-Propagated Value: In most learning networks the difference between the current output and the desired output is calculated. This raw error is then transformed by the error function to match particular network architecture. The most basic architectures use this error directly, but some square the error while retaining its sign, some cube the error, other paradigms modify the raw error to fit their specific purposes. The artificial neuron's error is then typically propagated into the learning function of another processing element. This error term is sometimes called the current error.

The current error is typically propagated backwards to a previous layer. Yet, this back-propagated value can be either the current error, the current error scaled in some manner (often by the derivative of the transfer function), or some other desired output depending on the network type. Normally, this back-propagated value, after being scaled by the learning function, is multiplied against each of the incoming connection weights to modify them before the next learning cycle.

Component 7. Learning Function: The purpose of the learning function is to modify the variable connection weights on the inputs of each processing element according to some neural based algorithm. This process of changing the weights of the input connections to achieve some desired result could also be called the adaptation function, as well as the learning mode. There are two types of learning: supervised and unsupervised. Supervised learning requires a teacher. The teacher may be a training set of data or an observer who grades the performance of the network results. Either way, having a teacher is learning by reinforcement. When there is no external teacher, the system must organize itself by some internal criteria designed into the network. This is learning by doing.

Structure of Artificial neuron:

A simple neuron:

An artificial neuron is a device with many inputs and one output. The neuron has two modes of operation; the training mode and the using mode. In the training mode, the neuron can be trained to fire (or not), for particular input patterns. In the using mode, when a taught input pattern is detected at the input, its associated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine whether to fire or not.

A simple neuron

Firing rules

The firing rule is an important concept in neural networks and accounts for their high flexibility. A firing rule determines how one calculates whether a neuron should fire for any input pattern. It relates to all the input patterns, not only the ones on which the node was trained.

A simple firing rule can be implemented by using Hamming distance technique. The rule goes as follows:

Take a collection of training patterns for a node, some of which cause it to fire (the 1-taught set of patterns) and others, which prevent it from doing so (the 0-taught set). Then the patterns not in the collection cause the node to fire if, on comparison, they have more input elements in common with the 'nearest' pattern in the 1-taught set than with the 'nearest' pattern in the 0-taught set. If there is a tie, then the pattern remains in the undefined state.

For example, a 3-input neuron is taught to output 1 when the input (X1, X2 and X3) is 111 or 101 and to output 0 when the input is 000 or 001. Then, before applying the firing rule, the truth table is;

X1: 0 0 0 0 1 1 1 1

X2: 0 0 1 1 0 0 1 1

X3: 0 1 0 1 0 1 0 1

OUT: 0 0 0/1 0/1 0/1 1 0/1 1

As an example of the way the firing rule is applied, take the pattern 010. It differs from 000 in 1 element, from 001 in 2 elements, from 101 in 3 elements and from 111 in 2 elements. Therefore, the 'nearest' pattern is 000, which belongs, in the 0-taught set. Thus the firing rule requires that the neuron should not fire when the input is 001. On the other hand, 011 is equally distant from two taught patterns that have different outputs and thus the output stays undefined (0/1).

By applying the firing in every column the following truth table is obtained;

X1: 0 0 0 0 1 1 1 1

X2: 0 0 1 1 0 0 1 1

X3: 0 1 0 1 0 1 0 1

OUT: 0 0 0 0/1 0/1 1 1 1

The difference between the two truth tables is called the generalization of the neuron. Therefore the firing rule gives the neuron a sense of similarity and enables it to respond 'sensibly' to patterns not seen during training.

Pattern Recognition - an example

An important application of neural networks is pattern recognition. Pattern recognition can be implemented by using a feed-forward neural network that has been trained accordingly. During training, the network is trained to associate outputs with input patterns. When the network is used, it identifies the input pattern and tries to output the associated output pattern. The power of neural networks comes to life when a pattern that has no output associated with it, is given as an input. In this case, the network gives the output that corresponds to a taught input pattern that is least different from the given pattern.

For example:

The network of figure 1 is trained to recognize the patterns T and H. The associated patterns are all black and all white respectively as shown below.

If we represent black squares with 0 and white squares with 1 then the truth tables for the 3 neurones after generalization are;

X11: 0 0 0 0 1 1 1 1

X12: 0 0 1 1 0 0 1 1

X13: 0 1 0 1 0 1 0 1

OUT: 0 0 1 1 0 0 1 1

Top neuron

X21: 0 0 0 0 1 1 1 1

X22: 0 0 1 1 0 0 1 1

X23: 0 1 0 1 0 1 0 1

OUT: 1 0/1 1 0/1 0/1 0 0/1 0

Middle neuron

X21: 0 0 0 0 1 1 1 1

X22: 0 0 1 1 0 0 1 1

X23: 0 1 0 1 0 1 0 1

OUT: 1 0 1 1 0 0 1 0

Bottom neuron

From the tables it can be seen the following associations can be extracted:

In this case, it is obvious that the output should be all blacks since the input pattern is almost the same as the 'T' pattern.

Here also, it is obvious that the output should be all whites since the input pattern is almost the same as the 'H' pattern.

INPUT OUTPUT

Here, the top row is 2 errors away from the T and 3 from an H. So the top output is black. The middle row is 1 error away from both T and H so the output is random. The bottom row is 1 error away from T and 2 away from H. Therefore the output is black. The total output of the network is still in favor of the T shape.

A more complicated neuron:

The previous neuron doesn't do anything that conventional computers don't do already. A more sophisticated neuron is the McCulloch and Pitts model (MCP). The difference from the previous model is that the inputs are ‘weighted’; the effect that each input has at decision-making is dependent on the weight of the particular input. The weight of an input is a number which when multiplied with the input gives the weighted input. These weighted inputs are then added together and if they exceed a pre-set threshold value, the neuron fires. In any other case the neuron does not fire.

An MCP neuron

In mathematical terms, the neuron fires if and only if;

X1W1 + X2W2 + X3W3 + ... > T

The addition of input weights and of the threshold makes this neuron a very flexible and powerful one. The MCP neuron has the ability to adapt to a particular situation by changing its weights and/or threshold. Various algorithms exist that cause the neuron to 'adapt'; the most used ones are the Delta rule and the back error propagation. The former is used in feed-forward networks and the latter in feedback networks.

Architecture of neural networks

Feed-forward networks

Feed forward network:

Feed-forward ANNs allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affect that same layer. Feed-forward ANNs tend to be straightforward networks that associate inputs with outputs. They are extensively used in pattern recognition. This type of organization is also referred to as bottom-up or top-down.

Feedback networks:

Feedback networks can have signals traveling in both directions by introducing loops in the network. Feedback networks are very powerful and can get extremely complicated. Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at the equilibrium point until the input changes and a new equilibrium needs to be found. Feedback architectures are also referred to as interactive or recurrent, although the latter term is often used to denote feedback connections in single-layer organizations.

The Learning Process

The memorization of patterns and the subsequent response of the network can be categorized into two general paradigms:

Þ Associative mapping in which the network learns to produce a particular pattern on the set of input units whenever another particular pattern is applied on the set of input units. The associative mapping can generally be broken down into two mechanisms:

Þ Auto-association: an input pattern is associated with itself and the states of input and output units coincide. This is used to provide pattern competition, i.e. to produce a pattern whenever a portion of it or a distorted pattern is presented. In the second case, the network actually stores pairs of patterns building an association between two sets of patterns.

— Hetero-association: is related to two recall mechanisms:

— Nearest-neighbor recall, where the output pattern produced corresponds to the input pattern stored, which is closest to the pattern presented, and

— Interpolative recall, where the output pattern is a similarity dependent interpolation of the patterns stored corresponding to the pattern presented. Yet another paradigm, which is a variant associative mapping is classification, i.e. when there is a fixed set of categories into which the input patterns are to be classified.

— Regularity detection in which units learn to respond to particular properties of the input patterns. Whereas in associative mapping the network stores the relationships among patterns, in regularity detection the response of each unit has a particular 'meaning'. This type of learning mechanism is essential for feature discovery and knowledge representation.

Every neural network possesses knowledge, which is contained in the values of the connections weights. Modifying the knowledge stored in the network as a function of experience implies a learning rule for changing the values of the weights.

Information is stored in the weight matrix W of a neural network. Learning is the determination of the weights. Following the way learning is performed, we can distinguish two major categories of neural networks:

— Fixed networks in which the weights cannot be changed, i.e. dW/dt=0. In such networks, the weights are fixed a priori according to the problem to solve.

— Adaptive networks which are able to change their weights, i.e. dW/dt not= 0.

Types of learning:

All learning methods used for adaptive neural networks can be classified into two major categories:

Once a network has been structured for a particular application, that network is ready to be trained. To start this process the initial weights are chosen randomly. Then, the training, or learning, begins.

There are two approaches to training - supervised and unsupervised. Supervised training involves a mechanism of providing the network with the desired output either by manually "grading" the network's performance or by providing the desired outputs with the inputs. Unsupervised training is where the network has to make sense of the inputs without outside help.

The vast bulk of networks utilize supervised training. Unsupervised training is used to perform some initial characterization on inputs. However, in the full-blown sense of being truly self-learning, it is still just a shining promise that is not fully understood, does not completely work, and thus is relegated to the lab.

Supervised Learning:

The vast majority of artificial neural network solutions have been trained with supervision. In this mode, the actual output of a neural network is compared to the desired output. Weights, which are usually randomly set to begin with, are then adjusted by the network so that the next iteration, or cycle, will produce a closer match between the desired and the actual output. The learning method tries to minimize the current errors of all processing elements. This global error reduction is created over time by continuously modifying the input weights until acceptable network accuracy is reached.

With supervised learning, the artificial neural network must be trained before it becomes useful. Training consists of presenting input and output data to the network. This data is often referred to as the training set. That is, for each input set provided to the system, the corresponding desired output set is provided as well. In most applications, actual data must be used. This training phase can consume a lot of time. In prototype systems, with inadequate processing power, learning can take weeks. This training is considered complete when the neural network reaches an user defined performance level. This level signifies that the network has achieved the desired statistical accuracy as it produces the required outputs for a given sequence of inputs. When no further learning is necessary, the weights are typically frozen for the application. Some network types allow continual training, at a much slower rate, while in operation. This helps a network to adapt to gradually changing conditions.

Training sets need to be fairly large to contain all the needed information if the network is to learn the features and relationships that are important. Not only do the sets have to be large but also the training sessions must include a wide variety of data. If the network is trained just one example at a time, all the weights set so meticulously for one fact could be drastically altered in learning the next fact. The previous facts could be forgotten in learning something new. As a result, the system has to learn everything together, finding the best weight settings for the total set of facts. For example, in teaching a system to recognize pixel patterns for the ten digits, if there were twenty examples of each digit, all the examples of the digit seven should not be presented at the same time.

How the input and output data is represented, or encoded, is a major component to successfully instructing a network. Artificial networks only deal with numeric input data. Therefore, the raw data must often be converted from the external environment. Additionally, it is usually necessary to scale the data, or normalize it to the network's paradigm. This pre-processing of real-world stimuli, be they cameras or sensors, into machine-readable format is already common for standard computers. Many conditioning techniques which directly apply to artificial neural network implementations are readily available. It is then up to the network designer to find the best data format and matching network architecture for a given application.

After a supervised network performs well on the training data, then it is important to see what it can do with data it has not seen before. If a system does not give reasonable outputs for this test set, the training period is not over. Indeed, this testing is critical to insure that the network has not simply memorized a given set of data but has learned the general patterns involved within an application.

Unsupervised Learning:

Unsupervised learning is the great promise of the future. It shouts that computers could someday learn on their own in a true robotic sense. Currently, this learning method is limited to networks known as self-organizing maps. These kinds of networks are not in widespread use. They are basically an academic novelty. Yet, they have shown they can provide a solution in a few instances, proving that their promise is not groundless. They have been proven to be more effective than many algorithmic techniques for numerical aerodynamic flow calculations. They are also being used in the lab where they are split into a front-end network that recognizes short, phoneme-like fragments of speech, which are then passed on to a back-end network. The second artificial network recognizes these strings of fragments as words.

This promising field of unsupervised learning is sometimes called self-supervised learning. These networks use no external influences to adjust their weights. Instead, they internally monitor their performance. These networks look for regularities or trends in the input signals, and makes adaptations according to the function of the network. Even without being told whether it's right or wrong, the network still must have some information about how to organize itself. This information is built into the network topology and learning rules.

An unsupervised learning algorithm might emphasize cooperation among clusters of processing elements. In such a scheme, the clusters would work together. If some external input activated any node in the cluster, the cluster's activity as a whole could be increased. Likewise, if external input to nodes in the cluster was decreased, that could have an inhibitory effect on the entire cluster.

Competition between processing elements could also form a basis for learning. Training of competitive clusters could amplify the responses of specific groups to specific stimuli. As such, it would associate those groups with each other and with a specific appropriate response. Normally, when competition for learning is in effect, only the weights belonging to the winning processing element will be updated.

At the present state of the art, unsupervised learning is not well understood and is still the subject of research. This research is currently of interest to the government because military situations often do not have a data set available to train a network until a conflict arises.

Learning Rates:

The rate at which ANNs learn depends upon several controllable factors. In selecting the approach there are many trade-offs to consider. Obviously, a slower rate means a lot more time is spent in accomplishing the off-line learning to produce an adequately trained system. With the faster learning rates, however, the network may not be able to make the fine discriminations possible with a system that learns more slowly. Researchers are working on producing the best of both worlds.

Generally, several factors besides time have to be considered when discussing the off-line training task, which is often described as "tiresome." Network complexity, size, paradigm selection, architecture, type of learning rule or rules employed, and desired accuracy must all be considered. These factors play a significant role in determining how long it will take to train a network. Changing any one of these factors may either extend the training time to an unreasonable length or even result in an unacceptable accuracy.

Most learning functions have some provision for a learning rate, or learning constant. Usually this term is positive and between zero and one. If the learning rate is greater than one, it is easy for the learning algorithm to overshoot in correcting the weights, and the network will oscillate. Small values of the learning rate will not correct the current error as quickly, but if small steps are taken in correcting errors, there is a good chance of arriving at the best minimum convergence.

Learning Laws:

Many learning laws are in common use. Most of these laws are some sort of variation of the best known and oldest learning law, Hebb's Rule. Research into different learning functions continues as new ideas routinely show up in trade publications. Some researchers have the modeling of biological learning as their main objective. Others are experimenting with adaptations of their perceptions of how nature handles learning. Either way, man's understanding of how neural processing actually works is very limited. Learning is certainly more complex than the simplifications represented by the learning laws currently developed. A few of the major laws are presented as examples.

Hebb's Rule: The first, and undoubtedly the best known, learning rule was introduced by Donald Hebb. The description appeared in his book The Organization of Behavior in 1949. His basic rule is: If a neuron receives an input from another neuron, and if both are highly active (mathematically have the same sign), the weight between the neurons should be strengthened.

Hopfield Law: It is similar to Hebb's rule with the exception that it specifies the magnitude of the strengthening or weakening. It states, "if the desired output and the input are both active or both inactive, increment the connection weight by the learning rate, otherwise decrement the weight by the learning rate."

The Delta Rule: This rule is a further variation of Hebb's Rule. It is one of the most commonly used. This rule is based on the simple idea of continuously modifying the strengths of the input connections to reduce the difference (the delta) between the desired output value and the actual output of a processing element. This rule changes the synaptic weights in the way that minimizes the mean squared error of the network. This rule is also referred to as the Widrow-Hoff Learning Rule and the Least Mean Square (LMS) learning rule.

The way that the Delta Rule works is that the delta error in the output layer is transformed by the derivative of the transfer function and is then used in the previous neural layer to adjust input connection weights. In other words, this error is back propagated into previous layers one layer at a time. The process of back-propagating the network errors continues until the first layer is reached. The network type called Feed forward, Back-propagation derives its name from this method of computing the error term.

When using the delta rule, it is important to ensure that the input data set is well randomized. Well-ordered or structured presentation of the training set can lead to a network which can not converge to the desired accuracy. If that happens, then the network is incapable of learning the problem.

The Gradient Descent Rule: This rule is similar to the Delta Rule in that the derivative of the transfer function is still used to modify the delta error before it is applied to the connection weights. Here, however, an additional proportional constant tied to the learning rate is appended to the final modifying factor acting upon the weight. This rule is commonly used, even though it converges to a point of stability very slowly.

It has been shown that different learning rates for different layers of network help the learning process converge faster. In these tests, the learning rates for those layers close to the output were set lower than those layers near the input. This is especially important for applications where the input data is not derived from a strong underlying model.

Kohonen's Learning Law: This procedure, developed by Teuvo Kohonen, was inspired by learning in biological systems. In this procedure, the processing elements compete for the opportunity to learn, or update their weights. The processing element with the largest output is declared the winner and has the capability of inhibiting its competitors as well as exciting its neighbors. Only the winner is permitted an output, and only the winner plus its neighbors are allowed to adjust their connection weights.

Further, the size of the neighborhood can vary during the training period. The usual paradigm is to start with a larger definition of the neighborhood, and narrow in as the training process proceeds. Because the winning element is defined as the one that has the closest match to the input pattern, Kohonen networks model the distribution of the inputs. This is good for statistical or topological modeling of the data and is sometimes referred to as self-organizing maps or self-organizing topologies.

Several Neural Network Models:

Hopfield Network:

John Hopfield first presented his crossbar associative network in 1982 at the National Academy of Sciences. In honor of Hop field’s success and his championing of neural networks in general, this network paradigm is usually referred to as a Hopfield Network. Primary applications for this sort of network have included associative, or content-addressable, memories and a whole set of optimization problems, such as the combinatory best route for a traveling salesman.

A Hopfield network has the following interesting features:

· Distributed Representation-A memory is stored as a pattern of activation across a set of processing elements. Furthermore, memories can be superimposed on one another; different memories are represented by different patterns over the same set of processing elements.

· Distributed, Asynchronous Control-Each processing element makes decisions based only on its own local situation. All these local actions add up to a global solution.

· Content Addressable Memory-A number of patterns can be stored in a network. To retrieve a pattern, we need only specify a portion of it. The network automatically finds the closest match.

· Fault Tolerance-If a few processing elements misbehave or failed completely, the neural network will still function properly.

A simple Hopfield Network:

Processing elements are always in one of two states, active or inactive. In this figure units colored black are active and units colored white are inactive. Units are connected to each other with weight. A positive weight connection indicates that the two units tend to active each other. A negative connection allows an active unit to deactivate a neighboring unit.

The network operates as follows. A random unit is chosen if any of its neighbor are active, the unit computes the sum at the weights on the connection to those active neighbors. If the sum is positive the unit becomes active otherwise it become inactive. Then another random unit is chosen and the process repeats until the network reaches a stable state i.e. until no more units can change state. This process is called parallel relaxation.

Neural network shown in figure-1 has four distinct stable states, which are shown in figure-2. Given any initial state the neural network will necessarily settle into one of this four state.

The network can be use as a content address memory by setting the activities of the unit to correspond to a partial pattern. To retrieve a pattern we need only supply a portion of it. The network will then settle into the stable state that best match the partial pattern.

Parallel relaxation is nothing more than search it is useful to think of the various states of network as formatting a search space. A randomly chosen state will transfer itself ultimately into one of the local minima; namely the nearest stable state. This is how we get content addressable behavior. We also get error-correcting behavior. Suppose we read description, gray, large, fish, eats plankton. We imagine a whale, even though we know that a whale is a mammal. Even if the initial state contains inconsistencies, a Hopfield network will settle into solution that violates the fewest constraints offered by the inputs.

Now suppose a unit occasionally fails say by becoming active or inactive where it should not. This cause no major problem, surrounding unit quickly set it straight again. It would take the unluckily concerted effort of many errant units to push the network into wrong stable state. Networks of thousands of more highly interconnected units, such fault tolerance is even more apparent.

The Hopfield network has two major limitations when used as a content addressable memory:

First, the number of patterns that can be stored and accurately recalled is severely limited. If too many patterns are stored, the network may converge to a novel spurious pattern different from all programmed patterns. Or, it may not converge at all. The storage capacity limit for the network is approximately fifteen percent of the number of processing elements in the Hopfield layer.

The second limitation of the paradigm is that the Hopfield layer may become unstable if the common patterns it shares are too similar. Here an example pattern is considered unstable if it is applied at time zero and the network converges to some other pattern from the training set. This problem can be minimized by modifying the pattern set to be more orthogonal with each other.

The Perceptron:

This is a very simple model and consists of a single `trainable' neuron. Trainable means that its threshold and input weights are modifiable. Inputs are presented to the neuron and each input has a desired output (determined by us). If the neuron doesn't give the desired output, then it has made a mistake. To rectify this, its threshold and/or input weights must be changed. How this change is to be calculated is determined by the learning algorithm.

The output of the perceptron is constrained to boolean values - (true, false), (1,0), (1, -1) or whatever. This is not a limitation because if the output of the perceptron were to be the input for something else, then the output edge could be made to have a weight. Then the output would be dependant on this weight.

The Perceptron looks like -

· x₁, x₂... x_n are inputs. These could be real numbers or boolean values depending on the problem.

· y is the output and is boolean.

· w₁, w₂,..., w_n are weights of the edges and are real valued.

· T is the threshold and is real valued.

The output y is 1 if the net input which is

w₁ x₁ + w₂ x₂ + ... + w_n x_n

is greater than the threshold T. Otherwise the output is zero.

The idea is that we should be able to train this perceptron to respond to certain inputs with certain desired outputs. After the training period, it should be able to give reasonable outputs for any kind of input. If it wasn't trained for that input, then it should try to find the best possible output depending on how it was trained.

So during the training period we will present the perceptron with inputs one at a time and see what output it gives. If the output is wrong, we will tell it that it has made a mistake. It should then change its weights and/or threshold properly to avoid making the same mistake later.

Note that the model of the perceptron normally given is slightly different from the one pictured here. Usually, the inputs are not directly fed to the trainable neuron but are modified by some "preprocessing units". These units could be arbitrarily complex, meaning that they could modify the inputs in any way. These units have been deliberately eliminated from our picture, because it would be helpful to know what can be achieved by just a single trainable neuron, without all its "powerful friends".

To understand the kinds of things that can be done using a perceptron, we shall see a rather simple example of its use - Compute the logical operations "and", "or", "not" of some given boolean variables.

Computing "and": There are n inputs, each either a 0 or 1. To compute the logical "and" of these n inputs, the output should be 1 if and only if all the inputs are 1. This can easily be achieved by setting the threshold of the perceptron to n. The weights of all edges are 1. The net input can be n only if all the inputs are active.

Computing "or": It is also simple to see that if the threshold is set to 1, then the output will be 1 if at least one input is active. The perceptron in this case acts as the logical "or".

Computing "not": The logical "not" is a little tricky, but can be done. In this case, there is only one boolean input. Let the weight of the edge be -1, so that the input, which is either 0 or 1, becomes 0 or -1. Set the threshold to 0. If the input is 0, the threshold is reached and the output is 1. If the input is -1, the threshold is not reached and the output is 0.

The XOR Problem:

There are problems, which cannot be solved by any perceptron. Intact there are more such problems than problems, which can be solved using perceptrons. The most often quoted example is the XOR problem - build a perceptron, which takes 2 boolean inputs and outputs the XOR of them. What we want is a perceptron which will output 1 if the two inputs are different and 0 otherwise.

Input | Desired Output

------|----------------

0 0 | 0

0 1 | 1

1 0 | 1

1 1 | 0

Consider the following perceptron as an attempt to solve the problem –

è If the inputs are both 0, then net input is 0, which is less than the threshold (0.5). So the output is 0 - desired output.

è If one of the inputs is 0 and the other is 1, then the net input is 1. This is above threshold, and so the output 1 is obtained.

è But the given perceptron fails for the last case. To see that no perceptron can be built to solve the problem, try to build one yourself.

Pattern Recognition Terminology:

The inputs that we have been referring to, of the form (x₁, x₂, ..., x_n) are also called as patterns. If a perceptron gives the correct, desired output for some pattern, then we say that the perceptron recognizes that pattern. We also say that the perceptron correctly classifies that pattern.

Since a pattern by our definition is just a sequence of numbers, it could represent anything -- a picture, a song, a poem... anything that you can have in a computer file. We could then have a perceptron, which could learn such inputs and classify them e.g. a neat picture or a scribbling, a good or a bad song, etc. All we have to do is to present the perceptron with some examples -- give it some songs and tell it whether each one is good or bad. (It could then go all over the internet, searching for songs, which you may like.) Sounds incredible? At least that’s the way it is supposed to work. But it may not. The problem is that the set of patterns, which you want the perceptron to learn, might be something like the XOR problem. Then no perceptron can be made to recognize your taste. However, there may be some other kind of neural network, which can.

Linearly Separable Patterns and Some Linear Algebra:

If a set of patterns can be correctly classified by some perceptron, then such a set of patterns is said to be linearly separable. The term "linear" is used because the perceptron is a linear device. The net input is a linear function of the individual inputs and the output is a linear function of the net input. Linear means that there is no square (x²⁾ or cube (x³), etc. terms in the formulas.

A pattern (x₁,x₂, ..., x_n) is a point in an n-dimensional space. This is an extension of the idea that (x,y) is a point in 2-dimensions and (x,y,z) is a point in 3 dimensions. The utility of such a wider notion of an n-dimensional space is that there are many concepts, which are independent of dimension. Such concepts carry over to higher dimensions even though we can think only of there 2 or 3-dimensional counterparts. For example, if the distance to a point (x,y) in 2 dimensions is r, then

r² = x² + y²

Since the distance to a point (x, y, z) in 3 dimensions is also defined similarly, it is natural to define the distance to a point (x₁, x₂, ..., x_n) in n dimensions as

r² = x₁² + x₂² + ... + x_n²

r is called as the norm (actually Euclidean norm) of the point (x₁,x₂, ..., x_n).

Similarly, a straight line in 2D is given by -

ax + by = c

In 3D, a plane is given by -

ax + by + cz = d

When we generalize this, we get an object called as a hyper plane -

w₁x₁ + w₂x₂ + ... + w_nx_n = T

Notice something familiar? This is the net input to a perceptron. All points (patterns) for which the net input is greater than T belong to one class (they give the same output). All the other points belong to the other class.

We now have a geometrical interpretation of the perceptron. A perceptron with weights w₁,w₂, ..., w_n and threshold T can be represented by the above hyper plane. All points on one side of the hyper plane belong to one class. The hyper plane (perceptron) divides the set of all points (patterns) into 2 classes.

Now we can see why the XOR problem cannot have a solution. Here there are 2 inputs. Hence there are 2 dimensions (luckily). The points that we want to classify are (0,0), (1,1) - in one class and (0,1), (1,0) in the other class.

Clearly we cannot classify the points (crosses on one side, circles on other) using a straight line. Hence no perceptron exists which can solve the XOR problem.

Minsky and Papert using perceptron give solution of this XOR problem:

If we could draw the elliptical decision surface, we could encircle the two ‘1’ output in the XOR space. However perceptrons are incapable of modeling such surface.

Another idea is to employ two separate line drawing stages. One line isolates point (1,1) and another line divides other three points into two categories, one is (1,0) (0,1) and other is (0,0).

Using this idea we can construct multiplayer perceptron to solve the problem.

In this the output of first perceptron is serves as the input of second, with a large negatively weighted connection. If first perceptron has input (1,1) that is x1 =1 and x2=1, it will send a massive inhibitory pulse to the second perceptron causing that unit to output 0 regardless to its other inputs. If either of the inputs is 0,the second perceptron gets no inhibition from first perceptron and it outputs 1 if either of the input is 1.

The perceptor-learning algorithm can correctly adjust weights between inputs and outputs but it cannot adjust weights between perceptrons. In this fig the inhibitory weight –9.0 were hand-coded, not learned, and this is the limitation of this solution.

Perceptron Learning Algorithms:

During the training period, a series of inputs are presented to the perceptron - each of the form (x₁,x₂, ..., x_n). For each such input, there is a desired output - either 0 or 1. The actual output is determined by the net input, which is w₁ x₁ + w₂ x₂ + ... +w_nx_n. If the net input is less than threshold then the output is 0, otherwise output is 1. If the perceptron gives a wrong (undesirable) output, then one of two things could have happened -

1. The desired output is 0, but the net input is above threshold. So the actual output becomes 1.

In such a case we should decrease the weights. But by how much? The perceptron-learning algorithm says that the decrease in weight of an edge should be directly proportional to the input through that edge. So, new weight of an edge i = old weight - cx_i

There are several algorithms depending on what c is. For now, think that it is a constant.

The idea here is that if the input through some edge was very high, then that edge must have contributed to most of the error. So we reduce the weight of that edge more (i.e. proportional to the input along that edge).

2. The other case when the perceptron makes a mistake is when the desired output is 1, but the net input is below threshold.

Now we should increase the weights. Using the same intuition, the increase in weight of an edge should be proportional to the input through that edge. So, new weight of an edge i = old weight + cx_i

What about c? If c is actually a constant, then the algorithm is called as the "fixed increment rule". In this case, the perceptron may not correct its mistake immediately. That is, when we change the weights because of a mistake, the new weights don't guarantee that the same mistake will not be repeated. This could happen if c is very small. However, by repeated application of the same input, the weights will change slowly each time, until that mistake is avoided.

We could also choose c in such a way that it will certainly avoid the most recent mistake, next time it is presented the same input. This is called as the "absolute correction rule". The problem with this approach is that by learning one input, it might "forget" a previously learnt input. For example, if one input leads to an increase in some weight and an other input decreases it, then such a problem may arise.

Boltzmann Machine:

The binary Boltzmann machine is very similar to the binary Hopfield network, with the addition of three features:

Stochastic activation function: the state a unit is in is probabilistically related to its Energy gap. The bigger the energy gap between its current state and the opposite state, the more likely the unit will flip states.
Temperature and simulated annealing: the probability that a unit is on is computed according to a sigmoid function of its total weighted summed input divided by T. If T is large, the network behaves very randomly. T is gradually reduced and at each value of T, all the units' states are updated. Eventually, at the lowest T, units are behaving less randomly and more like binary threshold units.
Contrastive Hebbian Learning: A Boltzmann machine is trained in two phases, "clamped" and "unclamped". It can be trained either in supervised or unsupervised mode. Only the supervised mode was discussed in class; this type of training proceeds as follows, for each training pattern:

Clamped Phase: The input units' states are clamped to (set and not permitted to change from) the training pattern, and the output units' states are clamped to the target vector. All other units' states are initialized randomly, and are then permitted to update until they reach "equilibrium" (simulated annealing). Then Hebbian learning is applied.
Unclamped Phase: The input units' states are clamped to the training pattern. All other units' states (both hidden and output) are initialized randomly, and are then permitted to update until they reach "equilibrium". Then anti-Hebbian learning (Hebbian learning with a negative sign) is applied.

The above two-phase learning rule must be applied for each training pattern, and for a great many iterations through the whole training set. Eventually, the output units' states should become identical in the clamped and unclamped phases, and so the two learning rules exactly cancel one another. Thus, at the point when the network is always producing the correct responses, the learning procedure naturally converges and all weight updates approach zero.

· The stochasticity enables the Boltzmann machine to overcome the problem of getting stuck in local energy minima, while the contrastive Hebb rule allows the network to be trained with hidden features and thus overcomes the capacity limitations of the Hopfield network. However, in practice, learning in the Boltzmann machine is hopelessly slow.

In class, we saw a demonstration on overhead transparencies of the Boltzmann machine performing figure-ground segregation. This network was hard-wired (i.e. hand-selected weights, no learning). Some units were designated as "edge units" which had a particular direction and orientation, while others were designated as "figure-ground units". At each image location there was a full set of edge units in every possible orientation (horizontal or vertical) and direction (left, right, up or down), and a full set of figure/ground units (one of each). The weights could be excitatory or inhibitory, and represented particular constraints amongst the figure/ground and edge units. For example, an edge unit at one location would inhibit the edge unit of the same orientation but opposite direction at the same location. Another example: a vertical right-ward pointing edge unit would excite a figure unit at the next image local to the right, and inhibit a ground unit at that location, and would inhibit the figure unit to the left and excite the ground unit to the left. The entire network was initialized with the appropriate edge units turned on, and all other units off, and then all units were randomly flipped with some small probability so that the input was noisy. Units states were then updated using simulated annealing. The network was shown to be able to fill in continuous regions and label them as either figure or ground. The region could be non-convex (e.g. the letter C). The network could also fill in non-continuous edges, exhibiting "illusory contours".

The Back propagation Network:

Introduction:

The back propagation network is the most popular network in use today and is used in about 90% of all applications.

The algorithm for the back propagation network was first discovered in the early eighties independently by David E. Rumelhart of California University and David B. Parker of Stanford University. However it was only in 1986 when the algorithm was rediscovered by Rumelhart, Ronald J. Williams and Geoffrey E. Hinton the algorithm was really popularized. They demonstrated that the back propagation network with the addition of a hidden layer could be used to solve complex patterns such as hand-written digits.

It was subsequently found that the algorithm could be applied to most non-linear applications where extensive training data was available. Examples of such uses ranged from the prediction of currency in money markets to identification of pre-cancerous cells in the human body.

The back propagation network is a good example of a fully associative, fully connected (although not always the case) network with the outputs distributed or localized, depending on the application.

Description of the three layer network:

Input layer - The input layer is used to feed the network with data from the outside world. The number of input neurons required depends on the application and the number of data units in a training, test pattern. For example if a test pattern of an 8x8 grid is required to be inputted into a network then the number of input neurons needed is 64. The input values to a network usually ranges from zero to one and when inputted to the network, it is propagated to the next layer.

Hidden layer - The hidden layer of neurons receives inputs from the previous layer (in this case the input layer) and depending on the sum of the input weights, produces an output. The hidden layer is mainly required to overcome the hard learning problem as described in the previous section. The number of hidden neurons required for a layer depends on the number of training patterns presented to the network during the training phase. However, a formula does exist to calculate the minimum number of neurons required based on binary information theory.

Where n = the number of hidden units and m = the number of patterns.

For more complex problems two or more hidden layers may be required to solve the problem.

Output Layer - The output layer is used to collate the resultant outputs from the previous layers and depending on the weights and biases produces an output. The output of the network may be either distributed or localized depending on the application. The output values usually range from zero to one.

Verses:

Neural networks versus conventional computers:

Neural networks take a different approach to problem solving than that of conventional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem. Unless the specific steps that the computer needs to follow are known the computer cannot solve the problem. That restricts the problem solving capability of conventional computers to problems that we already understand and know how to solve. But computers would be so much more useful if they could do things that we don't exactly know how to do.

Neural networks process information in a similar way the human brain does. The network is composed of a large number of highly interconnected processing elements (neurones) working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The examples must be selected carefully otherwise useful time is wasted or even worse the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable.

On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem is to solve must be known and stated in small unambiguous instructions. These instructions are then converted to a high-level language program and then into machine code that the computer can understand. These machines are totally predictable; if anything goes wrong is due to a software or hardware fault.

Neural networks and conventional algorithmic computers are not in competition but complement each other. There are tasks are more suited to an algorithmic approach like arithmetic operations and tasks that are more suited to neural networks. Even more, a large number of tasks require systems that use a combination of the two approaches (normally a conventional computer is used to supervise the neural network) in order to perform at maximum efficiency. Neural networks do not perform miracles. But if used sensibly they can produce some amazing results.

How Neural Networks Differ from Traditional Computing and Expert Systems:

Neural networks offer a different way to analyze data, and to recognize patterns within that data, than traditional computing methods. However, they are not a solution for all computing problems. Traditional computing methods work well for problems that can be well characterized. Balancing checkbooks, keeping ledgers, and keeping tabs of inventory are well defined and do not require the special characteristics of neural networks. Table identifies the basic differences between the two computing approaches.

Traditional computers are ideal for many applications. They can process data, track inventories, network results, and protect equipment. These applications do not need the special characteristics of neural networks.

Expert systems are an extension of traditional computing and are sometimes called the fifth generation of computing. (First generation computing used switches and wires. The second generation occurred because of the development of the transistor. The third generation involved solid-state technology, the use of integrated circuits, and higher level languages like COBOL, Fortran, and "C". End-user tools, "code generators," are known as the fourth generation.) The fifth generation involves artificial intelligence.

CHARACTERISTICS	TRADITIONAL COMPUTING (including Expert Systems)	ARTIFICIAL NEURAL NETWORKS
Processing style Functions	Sequential Logically (left brained) via Rules Concepts Calculations	Parallel Gestalt (right brained) via Images Pictures Controls
Learning Method Applications	By rules (didactically) Accounting word processing math inventory digital communications	By example (Socratically) Sensor processing speech recognition pattern recognition text recognition

Table: Comparison of Computing Approaches.

Typically, an expert system consists of two parts, an inference engine and a knowledge base. The inference engine is generic. It handles the user interface, external files, program access, and scheduling. The knowledge base contains the information that is specific to a particular problem. This knowledge base allows an expert to define the rules, which govern a process. This expert does not have to understand traditional programming. That person simply has to understand both what he wants a computer to do and how the mechanism of the expert system shell works. It is this shell, part of the inference engine, that actually tells the computer how to implement the expert's desires. This implementation occurs by the expert system generating the computer's programming itself; it does that through "programming" of its own. This programming is needed to establish the rules for a particular application. This method of establishing rules is also complex and does require a detail-oriented person.

Efforts to make expert systems general have run into a number of problems. As the complexity of the system increases, the system simply demands too much computing resources and becomes too slow. Expert systems have been found to be feasible only when narrowly confined.

Artificial neural networks offer a completely different approach to problem solving and they are sometimes called the sixth generation of computing. They try to provide a tool that both programs itself and learns on its own. Neural networks are structured to provide the capability to solve problems without the benefits of an expert and without the need of programming. They can seek patterns in data that no one knows are there.

A comparison of artificial intelligence's expert systems and neural networks is contained in Table

Characteristics	Von Neumann Architecture Used for Expert Systems	Artificial Neural Networks
Processors	VLSI (traditional processors)	Artificial Neural Networks; variety of technologies; hardware development is on going
Processing Approach	Separate	The same
Processing Approach	Processes problem rule at a one time; sequential	Multiple, simultaneously
Connections	Externally programmable	Dynamically self programming
Self learning	Only algorithmic parameters modified	Continuously adaptable
Fault tolerance	None without special processors	Significant in the very nature of the interconnected neurons
Neurobiology in design	None	Moderate
Programming	Through a rule based complicated	Self-programming; but network must be set up properly
Ability to be fast	Requires big processors	Requires multiple custom-built chips

Table: Comparisons of Expert Systems and Neural Networks.

Expert systems have enjoyed significant successes. However, artificial intelligence has encountered problems in areas such as vision, continuous speech recognition and synthesis, and machine learning. Artificial intelligence also is hostage to the speed of the processor that it runs on. Ultimately, it is restricted to the theoretical limit of a single processor. Artificial intelligence is also burdened by the fact that experts don't always speak in rules. Yet, despite the advantages of neural networks over both expert systems and more traditional computing in these specific areas, neural nets are not complete solutions. They offer a capability that is not ironclad, such as a debugged accounting system. They learn, and as such, they do continue to make "mistakes." Furthermore, even when a network has been developed, there is no way to ensure that the network is the optimal network.

Neural systems do exact their own demands. They do require their implementer to meet a number of conditions. These conditions include:

· A data set, which includes the information, which can characterize the problem.

· An adequately sized data set to both train and test the network.

· An understanding of the basic nature of the problem to be solved so that basic first-cut decision on creating the network can be made. These decisions include the activization and transfer functions, and the learning methods.

· An understanding of the development tools.

· Adequate processing powers (some applications demand real-time processing that exceeds what is available in the standard, sequential processing hardware. The development of hardware is the key to the future of neural networks).

Once these conditions are met, neural networks offer the opportunity of solving problems in an arena where traditional processors lack both the processing power and a step-by-step methodology. A number of very complicated problems cannot be solved in the traditional computing environments. For example, speech is something that all people can easily parse and understand. Without the massively paralleled processing power of a neural network, this process is virtually impossible for a computer. Image recognition is another task that a human can easily do but which stymies even the biggest of computers. A person can recognize a plane as it turns, flies overhead, and disappears into a dot. A traditional computer might try to compare the changing images to a number of very different stored patterns.

This new way of computing requires skills beyond traditional computing. It is a natural evolution. Initially, computing was only hardware and engineers made it work. Then, there were software specialists - programmers, systems engineers, data base specialists, and designers. Now, there are also neural architects. This new professional needs to be skilled different than his predecessors of the past. For instance, he will need to know statistics in order to choose and evaluate training and testing situations. This skill of making neural networks work is one that will stress the logical thinking of current software engineers.

Neural networks offer a unique way to solve some problems while making their own demands. The biggest demand is that the process is not simply logic. It involves an empirical skill, an intuitive feel as to how a network might be created. Now that there is a general understanding of artificial neural networks, it is appropriate to explore them in greater detail. But before jumping into the various networks, a more complete understanding of the inner workings of a neural network is needed. As stated earlier, artificial neural networks are a large class of parallel processing architectures, which are useful in specific types of complex problems. These architectures should not be confused with common parallel processing configurations, which apply many sequential processing units to standard computing topologies. Instead, neural networks are radically different than conventional Von Neumann computers in that they crudely mimic the fundamental properties of man's brain.

Why it is Difficult to Model a Brain-like Neural Network

We have seen that the functioning of individual neurons is quite simple. Then why is it difficult to achieve our goal of combining the abilities of computers and humans?

The difficulty arises because of the following -

1. It is difficult to find out which neurons should be connected to which. This is the problem of determining the neural network structure. Further, the interconnections in the brain are constantly changing. The initial interconnections seem to be largely governed by genetic factors.

2. The weights on the edges and thresholds in the nodes are constantly changing. This problem has been the subject of much research and has been solved to a large extent. The approach has been as follows -

Given some input, if the neural network makes an error, then it can be determined exactly which neurons were active before the error. Then we can change the weights and thresholds appropriately to reduce this error.

For this approach to work, the neural network must "know" that it has made a mistake. In real life, the mistake usually becomes obvious only after a long time. This situation is more difficult to handle since we may not know which input led to the error.

Also notice that this problem can be considered as a generalization of the previous problem of determining the neural network structure. If this is solved, that is also solved. This is because if the weight between two neurons is zero then, it is as good as the two neurons not being connected at all. So if we can figure out the weights properly, then the structure becomes known. But there may be better methods of determining the structure.

3. The functioning of individual neurons may not be so simple after all. For example, remember that if a neuron receives signals from many neighboring neurons, the combined stimulus may exceed its threshold. Actually, the neuron need not receive all signals at exactly the same time, but must receive them all in a short time-interval.

It is usually assumed that such details will not affect the functioning of the simulated neural network much. But may be it will.

Another example of deviation from normal functioning is that some edges can transmit signals in both directions. Actually, all edges can transmit in both directions, but usually they transmit in only 1 direction, from one neuron to another.

Applications of neural networks:

... Or are they just a solution in search of a problem?

The applications of neural network are almost limitless but fall into a few simple categories.

Neural networks have broad applicability to real world business problems. In fact, they have already been successfully applied in many industries. Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:

— Sales forecasting

— Industrial process control

— Customer research

— Data validation

— Risk management

— Target marketing

But to give you some more specific examples; ANN are also used in the following specific paradigms: recognition of speakers in communications; diagnosis of hepatitis; recovery of telecommunications from faulty software; interpretation of multimeaning Chinese words; undersea mine detection; texture analysis; three-dimensional object recognition; hand-written word recognition; and facial recognition.

Neural networks in medicine:

Artificial Neural Networks (ANN) is currently a 'hot' research area in medicine and it is believed that they will receive extensive application to biomedical systems in the next few years. At the moment, the research is mostly on modeling parts of the human body and recognizing diseases from various scans (e.g. cardiograms, CAT scans, ultrasonic scans, etc.).

Neural networks are ideal in recognizing diseases using scans since there is no need to provide a specific algorithm on how to identify the disease. Neural networks learn by example so the details of how to recognize the disease are not needed. What is needed is a set of examples that are representative of all the variations of the disease. The quantity of examples is not as important as the 'quantity'. The examples need to be selected very carefully if the system is to perform reliably and efficiently.

è Modeling and Diagnosing the Cardiovascular System:

Neural Networks are used experimentally to model the human cardiovascular system. Diagnosis can be achieved by building a model of the cardiovascular system of an individual and comparing it with the real time physiological measurements taken from the patient. If this routine is carried out regularly, potential harmful medical conditions can be detected at an early stage and thus make the process of combating the disease much easier.

Another reason that justifies the use of ANN technology is the ability of ANNs to provide sensor fusion, which is the combining of values from several different sensors. Sensor fusion enables the ANNs to learn complex relationships among the individual sensor values, which would otherwise be lost if the values were individually analyzed. In medical modeling and diagnosis, this implies that even though each sensor in a set may be sensitive only to a specific physiological variable, ANNs are capable of detecting complex medical conditions by fusing the data from the individual biomedical sensors.

è Electronic noses:

ANNs are used experimentally to implement electronic noses. Electronic noses have several potential applications in telemedicine. Telemedicine is the practice of medicine over long distances via a communication link. The electronic nose would identify odors in the remote surgical environment. These identified odors would then be electronically transmitted to another site where a door generation system would recreate them. Because the sense of smell can be an important sense to the surgeon, telesmell would enhance telepresent surgery.

è Instant Physician:

An application developed in the mid-1980s called the "instant physician" trained an auto associative memory neural network to store a large number of medical records, each of which includes information on symptoms, diagnosis, and treatment for a particular case. After training, the net can be presented with input consisting of a set of symptoms; it will then find the full stored pattern that represents the "best" diagnosis and treatment.

Neural Networks in business:

Business is a diverted field with several general areas of specialization such as accounting or financial analysis. Almost any neural network application would fit into one business area or financial analysis.

There is some potential for using neural networks for business purposes, including resource allocation and scheduling. There is also a strong potential for using neural networks for database mining that is, searching for patterns implicit within the explicitly stored information in databases. Most of the funded work in this area is classified as proprietary. Thus, it is not possible to report on the full extent of the work going on. Most work is applying neural networks, such as the Hopfield-Tank network for optimization and scheduling.

è Marketing:

There is a marketing application, which has been integrated with a neural network system. The Airline Marketing Tactician (a trademark abbreviated as AMT) is a computer system made of various intelligent technologies including expert systems. A feed forward neural network is integrated with the AMT and was trained using back-propagation to assist the marketing control of airline seat allocations. The adaptive neural approach was amenable to rule expression. Additionally, the application's environment changed rapidly and constantly, which required a continuously adaptive solution. The system is used to monitor and recommend booking advice for each departure. Such information has a direct impact on the profitability of an airline and can provide a technological advantage for users of the system.

While it is significant that neural networks have been applied to this problem, it is also important to see that this intelligent technology can be integrated with expert systems and other approaches to make a functional system. Neural networks were used to discover the influence of undefined interactions by the various variables. While these interactions were not defined, they were used by the neural system to develop useful conclusions. It is also noteworthy to see that neural networks can influence the bottom line.

è Credit Evaluation:

The HNC Company, founded by Robert Hecht-Nielsen, has developed several neural network applications. One of them is the Credit Scoring system, which increase the profitability of the existing model up to 27%. The HNC neural systems were also applied to mortgage screening. A neural network automated mortgage insurance underwriting system was developed by the Nestor Company. This system was trained with 5048 applications of which 2597 were certified.

In investment analysis:

To attempt to predict the movement of stocks currencies etc., from previous data. There, they are replacing earlier simpler linear models.

In signature analysis:

As a mechanism for comparing signatures made (e.g. in a bank) with those stored. This is one of the first large-scale applications of neural networks in the USA, and is also one of the first to use a neural network chip.

In process control:

There are clearly applications to be made here: most processes cannot be determined as computable algorithms. Newcastle University Chemical Engineering Department is working with industrial partners (such as Zeneca and BP) in this area.

In monitoring:

Networks have been used to monitor

· The state of aircraft engines. By monitoring vibration levels and sound, early warning of engine problems can be given.

· British Rail has also been testing a similar application monitoring diesel engines.

Pen PC's

PC's where one can write on a tablet, and the writing will be recognized and translated into (ASCII) text.

White goods and toys

As Neural Network chips become available, the possibility of simple cheap systems, which have learned to recognize simple entities (e.g. walls looming, or simple commands like Go, or Stop), may lead to their incorporation in toys and washing machines etc. Already the Japanese are using a related technology, fuzzy logic, in this way. There is considerable interest in the combination of fuzzy and neural technologies.

Connectionist Speech:

Speech recognition is a difficult perceptual task. Connectionist networks have been applied to a number of problems in speech recognition for a survey. How a three-layer back propagation network can be trained to discriminate between different vowel sounds. The network is trained to output one of ten vowels, given a pair of frequencies taken from the speech waveform. Note the nonlinear decision surfaces created by back propagation learning.

Speech production, the problem of translating text into speech rather than vice versa, has also been attacked with neural networks. Speech production is easier than speech recognition, and high performance programs are available. NET talk, a network that learns to pronounce English text, was one of the first systems to demonstrate that connectionist methods could be real-world tasks.

Linguists have long studied the rules governing the translation of text into speech units called phonemes. For example, the letter “x” is usually pronounced with a “ks” sound, as in “ box” and “axe”. A traditional approach to the problem would be to write all these rules down and use a production system to apply them. Unfortunately, most of the rules have exceptions-consider “xylophone”- and these exceptions must also be programmed in. Also, the rules may interact with one another in unpleasant, unforeseen ways. A connectionist approach is simply to present a network with words and their pronunciations, and hope that the network will discover the regularities and remember the exceptions. NET talk succeeds fairly well at this task with a back propagation network.

We can think of NET talk as an exercise in “extensional programming”. There exists some complex relationship between text and speech, and we program that relationship into the computer by showing it examples from the real world. Contrast this with traditional, “intentional programming”, in which we write rules or specialized algorithm without reference to any particular examples. In the former case, we hope that the network generalizes to translate new words correctly; in the letter case, we hope that the algorithm is general enough to handle whatever words it receives. Extensional programming is a powerful technique because it drastically cuts down on knowledge acquisition time, a major bottleneck in the construction of AI systems. However, current learning methods are not adequate for the extensional programming of very complex tasks, such as the translation of English sentences into Japanese.

Connectionist Vision:

Humans achieve significant visual prowess with limited visual hardware. Only the center of the retina maintains good spatial resolution; as a result, we must constantly shift our attention among varies points of interest. Each snapshot lasts only about two hundred milliseconds. Since individual neural firing rates usually lie in the millisecond range, each scene must be interpreted in about a hundred computational steps. To compound the problem, each interpretation must be rapidly integrated with previous interpretations to enable the construction of a stable three-dimensional model of the world. These severe timing constraints strongly suggest that human vision is highly parallel. Connectionism offers many methods for studying both the engineering and biological aspects of massively parallel vision.

Parallel relaxation plays an important role in connectionist vision systems. Recall our discussion of parallel relaxation search in Hopfield networks and Boltzmann machines. Ina typical system, some neural units receive their initial activation levels from a video camera and then these activations are iteratively modified based on the influences of nearby units. One use for relaxation is detecting edges. If many units think they are located on an edge border, they can override any dissenters. The relaxation process settles on the most likely set of edges in the scene. While traditional vision programs running on serial computing engines must reason about which regions of a scene require edge detection processing, the connectionist approach simply assumes massively parallel machinery.

Visual interpretation also requires the integration of many constraint sources. For example, if two adjacent areas in the scene have the same color and texture, then they are probably part of the same object. If these constraints can be encoded in a network structure, then parallel relaxation is an attractive technique for combining them. Because relaxation treats constraints as “soft”- i.e., it will violate one constraint if necessary to satisfy the others- it achieves a global best-fit interpretation even in the presence of local ambiguity or noise.

Are there any limits to Neural Networks?

The major issues of concern today are the scalability problem, testing, verification, and integration of neural network systems into the modern environment. Neural network programs sometimes become unstable when applied to larger problems. The defense, nuclear and space industries are concerned about the issue of testing and verification. The mathematical theories used to guarantee the performance of an applied neural network are still under development. The solution for the time being may be to train and test these intelligent systems much as we do for humans. Also there are some more practical problems like:

· The operational problem encountered when attempting to simulate the parallelism of neural networks. Since the majority of neural networks are simulated on sequential machines, giving rise to a very rapid increase in processing time requirements as size of the problem expands.
Solution: implement neural networks directly in hardware, but these need a lot of development still.

· Instability to explain any results that they obtain. Networks function as "black boxes" whose rules of operation are completely unknown.

Conclusion:

The computing world has a lot to gain from neural networks. Their ability to learn by example makes them very flexible and powerful. Furthermore there is no need to devise an algorithm in order to perform a specific task; i.e. there is no need to understand the internal mechanisms of that task. They are also very well suited for real time systems because of their fast response and computational times, which are due to their parallel architecture.

Neural networks also contribute to other areas of research such as neurology and psychology. They are regularly used to model parts of living organisms and to investigate the internal mechanisms of the brain.

Perhaps the most exciting aspect of neural networks is the possibility that some day 'conscious' networks might be produced. There is a number of scientists arguing that consciousness is a 'mechanical' property and that 'conscious' neural networks are a realistic possibility.

Finally, I would like to state that even though neural networks have a huge potential we will only get the best of them when they are integrated with computing, AI, fuzzy logic and related subjects.

The Future:

Because gazing into the future is somewhat like gazing into a crystal ball, so it is better to quote some "predictions". Each prediction rests on some sort of evidence or established trend, which, with extrapolation, clearly takes us into a new realm.

Prediction1: Neural Networks will fascinate user-specific systems for education, information processing, and entertainment. "Alternative realities", produced by comprehensive environments, are attractive in terms of their potential for systems control, education, and entertainment. This is not just a far-out research trend, but is something, which is becoming an increasing part of our daily existence, as witnessed by the growing interest in comprehensive "entertainment centers" in each home.
This "programming" would require feedback from the user in order to be effective but simple and "passive" sensors (e.g. fingertip sensors, gloves, or wristbands to sense pulse, blood pressure, skin ionization, and so on) could provide effective feedback into a neural control system. This could be achieved, for example, with sensors that would detect pulse, blood pressure, skin ionization, and other variables, which the system could learn to correlate with a person's response state.

Prediction2: Neural networks, integrated with other artificial intelligence technologies, methods for direct culture of nervous tissue, and other exotic technologies such as genetic engineering, will allow us to develop radical and exotic life-forms whether man, machine, or hybrid.

Prediction3: Neural networks will allow us to explore new realms of human capability realms previously available only with extensive training and personal discipline. So a specific state of consciously induced neurophysiologically observable awareness is necessary in order to facilitate a man machine system interface.

TO DOWN LOAD REPORT AND PPT
DOWNLOAD

SEMINARS AND PROJECTS

NEURAL NETWORK