James Ting-Ho Lo
Department of Mathematics and Statistics
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250-0001
My research interests are concentrated in two clusters:
Recurrent deep learning machines and the synthetic approach to designing dynamical systems:
The conventional analytic approach suffers from the following limitations:
Computational models of the brain and brain-like learning machines:
There have been a large number of findings in neuroscience. Integrating these findings into a computational model of the brain is necessary for understanding the brain and developing brain-like learning machines. As elementary computation of the brain is performed by biological neural networks (BNNs), a first step in constructing a computational model of the brain is to construct that of BNNs. Based on a computational model of BNNs, computational models of visual, auditory, somatosensory, and somatomotor systems are to be developed. Ultimately, these systems together with models of other parts in the brain will be connected so as to deliver high-level cognitive functions such as decision making, prediction, creation and other human behavior.
Deconvexification for data fitting The risk-averting error criterion used in the convexification method for avoiding nonglobal local minima in training neural networks and estimating nonlinear regression models causes computer register overflow when its risk sensitivity index is large. To eliminate this difficulty, a normalized risk-averting error criterion is used instead. It is proven that the number of nonglobal local minuma decreases to zero as the risk sensitivity index goes to infinity. Starting with a very large risk sensitivity index and gradually decreasing it to zero was always effective in finding a global minimum in all the numerical examples worked out so far.
Recurrent deep learning machines A research thrust in recent years is the development of deep learning machines. Advantages of learning machines with a deep architecture over those with a shallow architecture were analyzed by Y. Bengio and Y. LeCun. However, feedback structures are conspicuously missing in the existing deep learning machines. Since neural networks with a feedback structure, called recurrent neural networks, are necessary for universally approximating dynamical systems, and provide better detection/recognition performance even for static patterns, developing recurrent deep learning machines is an important step beyond the current research thrust. Because of the additional training difficulty caused by feedback structures, the newly developed deconvexification method is expected to be of a great value in the development of recurrent deep learning machines.
A cortex-like learning machine Starting with a functional model and a low-order model of biological neural networks, a cortex-like learning machine, called clustering interpreting probabilistic associative memory (CIPAM), has been derived. CIPAM the following advantages:
A low-order model of biological neural networks Motivated by a functional model of biological neural networks (BNNs), a low-order model (LOM) of the same has been obtained. It is a recurrent hierarchical network of processing units (PUs), each comprising models of axonal/dendritic trees (for encoding inputs to the processing unit); synapses (for storing covariances between axonal/dendritic codes and labels of said inputs); spiking and nonspiking somas (for retrieving/generating labels); unsupervised/supervised learning mechanisms; and a maximal generalization scheme, and feedback nerves with different lengths among PUs.
To the best of my knowledge, LOM is the only biologically plausible model of BNNs that provides logically coherent answers to the following questions:
A functional model of biological neural networks Derivation of a functional model of biological neural networks, called temporal hierarchical probabilistic associative memory (THPAM), is guided by the following four neurobiological postulates:
The construction of a functional model of biological neural networks based on all the four postulates has broken the barriers confining the multilayer perceptrons and the associative memories. A first contribution of this paper lies in each of the following features of that such existing models as the recurrent multilayer perceptron and associative memories do not have: 1. a recurrent multilayer network learning by a Hebbian-type rule; 2. fully automated unsupervised and supervised Hebbian learning mechanisms (involving no differentiation, error backpropagation, optimization, iteration, cycling repeatedly through all learning data, or waiting for asymptotic behavior to emerge); 3. dendritic trees encoding inputs to neurons; 4. neurons communicating with spike trains carrying subjective probability distributions or membership functions, 5. masking matrices facilitating recognition of corrupted, distorted, and occluded patterns; and 6. feedbacks with different delay durations for fully utilizing temporally and spatially associated information.
Convexification for data fitting, and robust processing The neural network approach has been plagued by the local minima problem in training. To solve this problem, a new type of risk-averting error criterion was discovered. On one hand, the convexity region of the criterion expands as its risk-sensitivity index increases. On the other hand, when its sensitivity index goes to zero, the criterion converges to the standard mean squared error. These properties suggest a convexification and a deconvexification phase in minimizing the standard mean squared error for avoiding poor local minima. This two-phase method can be used together with any local or "global" optimization technique and is expected to effectively solve the local minima problem in not only training neural networks but also estimating nonlinear regression models and other kinds of nonlinear data fitting.
As its sensitivity index goes to infinity, the risk-averting error criterion approaches a minimax criterion. This suggests that the risk-averting error criterion provides a continuous spectrum of robustness. Depending on the application, the degree of robustness can be selected by setting an appropriate value of the sensitivity index.
Adaptive system identification Two new paradigms of adaptive processing: (1) adaptive feedforward and recurrent neural networks with long- and short-term memories and (2) accommodative neural networks (i.e., adaptive recurrent neural networks with fixed weights) have been developed for adaptive system identification. The former adjust only their linear weights online, and the latter do not even need online adjustment for adaptation. These represent perhaps the only two effective systematic general approaches to adaptive processing for system identification.
Overcoming the compactness limitation of neural networks A neural filter is obtained by fitting a recurrent neural network (RNN) such as a recurrent multilayer perceptron to signal and measurement data. If the ranges of the signal and measurement expand over time, such as in financial time series prediction, satellite orbit determination, aircraft/ship navigation, and target tracking, or are large relative to the filtering resolution or accuracy required, then the size of the RNN and the training data set must be large. The larger the RNN and the training data set is, the more difficult it is to train the RNN on the training data set. Furthermore, the time periods, over which the training data is collected, by computer simulation or actual experiment, are necessarily of finite length. If the measurement and signal processes grow beyond these time periods, the neural network trained on the training data usually diverges. To eliminate this difficulty, called the compactness limitation, we propose the use of dynamical range transformers. There are two types of dynamical range transformer, called dynamical range reducers or extenders, which are preprocessors and postprocessors respectively of an RNN. A dynamical range reducer transforms dynamically a component of an exogenous input process (e.g. a measurement process) and sends the resulting process to the input terminals of an RNN so as to reduce the valid input range or approximation capability required of the RNN. On the other hand, a dynamical range extender transforms dynamically the output of an output node of an RNN so as to reduce the valid output range or approximation capability required of the RNN. The purpose of both the dynamical range reducer and extender is to ease the RNN size and training data requirements and thereby lessen the training difficulty. The fundamental range transforming requirement in using a dynamical range transformer (i.e. dynamical range reducer or extender) is the existence of a recursive filter comprising an RNN and the dynamical range transformer that approximates the optimal filter to any accuracy, provided that the RNN with the selected architecture is sufficiently large. A recursive filter comprising an RNN and dynamical range transformers is called a recursive neural filter.
Synthetic approach to optimal filtering The long-standing notorious problem of nonlinear filtering (e.g., prediction, estimation, smoothing) was solved in its most general setting in 1992 by a synthetic (or neural network) approach. R. E. Kalman said in his 1998 email to me: "I read your patents and paper. I am absolutely amazed." The synthetic approach has the following advantages: 1. No such assumption as the Markov property, linearity of signal or measurement process, Gaussian distribution, or additive measurement noise is necessary. 2. It applies, even if a mathematical model of the signal and measurement processes is not available. 3. The resultant neural filter has the minimum error variance for the given structure. 4. The neural filter converges to the minimum-variance filter as the number of hidden neurons increases. 5. Much like the Kalman filter, the neural filter requires no Monte Carlo simulation online and is well suited for real-time processing.
1. Developing recurrent deep learning machines.
Deep learning machines (DLMs) are known to have more efficient architectures, less training data required, and more effective representations of functions or classifiers than do shallow learning machines. Effective deep architectures such as in the convolutional nets and training methods such as the intriguing greedy layer-wise training strategy for Bolzmann machines have been developed in the recent research thrust on DLMs.
However, feedback structures are conspicuously missing in the research thrust. Feedbacks to a computing node bring current or past information contained in neighboring or larger receptive fields of other computing nodes to said computing node for forming better local representations or features. Such information is required in processing dynamical data (i.e., sequential data with recursive dynamics) and can enhance processing accuracy and generalization for both sequential data (with or without recursive dynamics) and static data.
Deep learning machines with a feedback structure are called recurrent deep learning machines (RDLMs). Extending DLMs such as convolutional nets and Bolzmann machine to RDLMs such as recurrent convolutional nets and recurrent Bolzmann machines is an important and challenging research underway. those with feedbacks Due to a large number of nonglobal local minima on the training error landscape, training DLMs and especially RDLMs, is difficult. The deconvexification method, which overcomes the well-known local-minimum problem, is expected to play an important role in the development of RDLMs.
2. Developing a systematic general approach to designing adaptive or robust dynamical systems for system control and filtering in uncertain or dynamically changing environments.
Because of their practical importance, robustness and adaptiveness are two fundamental issues extensively studied in control and filtering for more than 30 years. Two simple effective systematic general approaches to adaptive processing have been developed for system identification. One approach employs a UAODS with long- and short-term memories, the former being determined in the lab before deployment and the latter adjusted on-line by one of the fast adaptive linear filter algorithms (e.g., LMS and RLS) developed over more than 40 years. The other uses a UAODS (with fixed weights) without online weight adjustment, which is called accommodative processor.
For robust processing, we use the novel normalized risk-averting criterion (NRAE), which emphasizes greater errors in an exponential manner and thereby induces robust performance. As opposed to the H-infinity or minmax criterion, which is often too pessimistic, NRAE allows us to select any desired degree of robustness by setting an appropriate risk-sensitivity index.
3. Developing faithful computational models of the brain and brain-like learning machines.
A low-order model (LOM) of biological neural networks was recently reported. LOM is a network of biologically plausible models of dendritic/axonal nodes and trees, spiking/nonspiking somas, unsupervised/supervised covariance/accumulation learning mechanisms, feedback connections with various time delays, and a scheme for maximal generalization. These component models were motivated and necessitated by making LOM learn and retrieve easily; and cluster, detect and recognize multiple/hierarchical corrupted, distorted and occluded temporal and spatial patterns.
Current and future work includes developing mechanisms for motion detection and attention selection for LOM, extending LOM to low-order models of visual; auditory; somatosensory; somatomotor; and other systems, and integrating all these models into a computational model of the brain.
Besides, LOM is ready for application to detection, clustering and recognition of such spatial patterns as handwriting, faces, targets, explosives, weapons (in baggage and containers) and such spatial patterns as speech, text, video, financial data.