First let’s discuss encoding the input pixel. A robust and simple method should be used to input information to the neural network. This helps simplify the architecture of the network. So we need color and luminance information, and one simple way to provide that is with RGB formatting. Next we need pixel location information within the visible frame. The conventional method for providing this information is in row/column format. But that is not the ideal method for many vision applications where resolving objects and features is desired. An improved approach is to provide direction and distance information referenced from the center pixel of the image. (Or from the center pixel of the area of interest we are evaluating.) The direction information would then be +/- 0 to 180 degrees, normalized to 0 to 1, with 0.5 being straight up. Distance information would then be the vector distance in number of pixels x and y, normalized so that the farthest pixel’s distance, from center, in the area, is equal to 1. So now each pixel is represented by 5 inputs R G B DIR DIST.
Object Planar Orientation. If we consider creating a neural network which has the capability to recognize three-dimensional objects, we will need to accommodate the ability for the neural network to understand the relative orientation of the object. To start this process we would need to provide the capacity for the network to resolve a two-dimensional orientation of an object. Humans relay on two sensory inputs for this function. First we rely on the visual information, secondly we may supplement that visual information from sensory inputs of the inner ear which provide gravitational/inertial direction input to our brains. So in vehicular vision systems, accelerometer or gyroscopic sensors as inputs to the neural network, will enhance object recognition and reduce recognition time requirements to some extent. Using visual cues to discern object orientation requires features of the neural network we will explore further as we proceed. But the pixel input scheme outlined above is ideally suited for adopting a neural network to such tasks.
Neural Network Architecture. Studies of biological neuro systems, and recent research in deep learning artificial neural networks has provided us with new insights for improving the performance of artificial neural networks. One thing we have learned is that biological neural networks are not just comprised of simple layered structures, but are more complex, and are much more functional and adaptive than previous and current neural network topologies. The reasons for this may become more apparent as we proceed with this discussion. Let’s begin with a brief discussion of RESNET neural networks in order to illustrate a concept. RESNET was principally developed in order to improve the training of deep neural networks (networks with many layers). But that was only one of the advantages we gained from these and similar network topologies. From these networks we also began to recognize the advantages in recognition and accuracy offered by adding synapses which skipped layers. Statistically we could then begin to appreciate the many benefits that synapses which span portions of the network layers offered. While some thought it was just the improved training that offered dramatic improvements afforded by these networks, it can be shown that the simple existence of the layer bypass synapses, in itself, improves network performance, intelligence, and adaptability. One effect these synapses which skip layers offers is what we will call cognitive scaling. This allows the neural network to combine lower level features and higher level learned features to more accurately derive and extract actual object definitions. Neural Network Object Resolution Enhancements For advanced artificial “vision”, ideally the neural network would incorporate a sort of artificial lens so that the machine can direct its highest resolution image recognition capabilities toward the area of interest within its field of view. This is a recognition efficiency enhancement, similar to our ocular attributes, wherein we can direct our focus (eyes) toward objects of interest. Since we have a very high density of cones in the center of focus region of our eyes, we obtain high resolution recognition due to our ability to shift our focus to areas (objects) of interest. Adding such a feature to artificial vision systems enhances recognition in an efficient manner, and allows the system to recognize and identify the location of more than one object in an image with greater ease and accuracy. One way to implement this lensing effect is to keep the center of the area of interest at the native pixel resolution, and to reduce resolution with distance from the center of the area of interest. Providing this form of preprocessing reduces the requisite size of the neural network, sometimes quite dramatically. This in turn reduces hardware requirements and processing time, but maintains high efficiency, speed, and excellent accuracy.
There is some good news for many applications which use pretrained neural networks for image recognition. Our Company has developed a new way to create specific pretrained hardware, without requiring computationally intensive recursive matrix multiplications. Our Company, Akins Enterprises, using our unique intellectual property, can fabricate a demonstration module which will prove this new technology for use in many applications. These modules consume very little power and are able to operate at exceptionally high frame rates. For serious inquiries please contact chipakins@akinsenterprises.com. These devices would be quite suitable for many military and commercial image recognition applications and machine vision systems.