Inhibition-augmented trainable COSFIRE filters for keypoint detection and object recognition

The shape and meaning of an object can radically change with the addition of one or more contour parts. For instance, a T-junction can become a crossover. We extend the COSFIRE trainable filter approach which uses a positive prototype pattern for configuration by adding a set of negative prototype patterns. The configured filter responds to patterns that are similar to the positive prototype but not to any of the negative prototypes. The configuration of such a filter comprises selecting given channels of a bank of Gabor filters that provide excitatory or inhibitory input and determining certain blur and shift parameters. We compute the response of such a filter as the excitatory input minus a fraction of the maximum of inhibitory inputs. We use three applications to demonstrate the effectiveness of inhibition: the exclusive detection of vascular bifurcations (i.e., without crossovers) in retinal fundus images (DRIVE data set), the recognition of architectural and electrical symbols (GREC’11 data set) and the recognition of handwritten digits (MNIST data set).


Introduction
Recently, a novel trainable filter for object recognition has been proposed in [5].It is called combination of shifted filter responses or COSFIRE for brevity.A COSFIRE filter is configured to be selective for a given local pattern by extracting from that pattern characteristic properties of contour parts (such as orientation) and their geometrical arrangement.COSFIRE filters were demonstrated to be effective for detection of local patterns (keypoints) and recognition of objects and achieve very good performance in various applications [4,6,8,19,47,49,50].They were also used in a multilayer hierarchical approach [6].
Figure 1 shows some examples where a COSFIRE filter of the type proposed in [5] may, however, not perform very well.COSFIRE filters that are configured to be selective for the patterns shown in the images in the top row of Fig. 1 also give strong responses to the images in the bottom row of Fig. 1.This is because all contour parts of a pattern in the top row are present, in the preferred arrangements, in the corresponding image shown in the bottom row of Fig. 1.The presence of additional contour parts, such as the diagonal bar in Fig. 1a (bottom) or the extra stroke in Fig. 1b (bottom), does not have influence on the response of the filter.
The COSFIRE method [5] was inspired by a specific type of shape-selective neuron in area V4 of visual cortex.This method, however, relies on contour parts that provide only excitatory inputs.This means that every involved contour part detector contributes to enhance the response of a COSFIRE filter.
There is neurophysiological evidence, however, that neurons in different layers of the visual cortex receive also inhibitory inputs [21].For instance, neurons in the lateral geniculate nucleus (LGN) have center-surround receptive fields which have been modeled by difference-of-Gaussians  Fig. 2 Selectivity of a shape-selective neuron in the posterior inferotemporal cortex [13].a The curvatures marked with circles evoke excitation of the concerned cell, while b the curvature marked with a dashed circle inhibits the activation of the cell.The bars specify the strength of the response (DoG) operators.A center-on DoG has an excitatory central region with an inhibitory surround.Similarly, simple cells in area V1, whose properties provided the inspiration for Gabor filters [16,23], derivative of Gaussians [18] and CORF [3,7] filters, have receptive fields that consist of inhibitory and excitatory regions.Non-classical receptive field inhibition in orientation-selective visual neurons provided the inspiration for surround inhibition in orientation-selective filters [42].It has been shown to improve contour detection by suppressing responses to textured regions.Moreover, shape-selective neurons of the type studied in [13], located in the posterior inferotemporal cortex, respond to complex shapes that are formed by a number of convex and concave curvatures with a certain geometrical arrangement.The presence of some specific curvature elements can inhibit the response of such a neuron.Figure 2 shows the response of a TEO neuron, studied in [13], which is excited by the encircled curvatures A, B and C, but is inhibited by the dashed encircled curvature D.
The bar plots indicate the responses to the stimuli.Inhibition is also thought to increase the selectivity of neurons [46].Inhibition is an important phenomenon in the brain.It facilitates sparseness in the representation of information that may result in an increase in the storage capacity and a higher number of patterns that can be discriminated [45].End-stopped cells [12,22] in area V1 of visual cortex are another example.
In this work, we add inhibition to COSFIRE filters in order to increase their discrimination ability.The inhibition that we propose is learned in an automatic configuration process.We configure an inhibition-augmented COSFIRE filter by using two different types of prototype patterns, namely one positive pattern and one or more negative pattern(s), in order to extract excitatory and inhibitory contour parts, respectively.Such a filter can effectively detect patterns that are equivalent or similar to the positive prototype, but does not respond to the negative prototype(s).
The proposed inhibition-augmented filters can be used in keypoint detection and object recognition.A large body of work has been done in these areas, and many methods have been proposed [9,10,15,20,24,28,[34][35][36][37][38][39]56,59].The Hessian detector [10] and the Harris detector [20], for instance, detect points of interest and are invariant to rotation but not so much to scale.Scaling invariances of these two operators can be achieved by applying them in a Laplacian of Gaussian scale space [34], resulting in the so-called Hessian-Laplace and Harris-Laplace detectors [36].A point of interest can be described by some local keypoint descriptors, such as the scale-invariant feature transform (SIFT) [35], the histogram of oriented gradients (HOG) [15], the image descriptor GIST [39] and the gradient location and orientation histogram (GLOH) [37].Other keypoint descriptors include the speeded up robust features (SURF) [9], which is akin to SIFT but faster as it makes efficient use of integral images [56], the texture-based local binary patterns (LBP) [38], textons [24,59] and the biologically inspired local descriptor (BILD) [60], as well as the rotation invariant feature transform (RIFT) descriptor [28].None of these methods employs inhibition.
Multiple keypoints can be used to represent bigger and more complex patterns, such as complete objects or scenes.In [32], a bag-of-visual-words approach was proposed to describe an image or a region of interest with a histogram of prototypical keypoints.This method is improved by using spatial pyramids [29] or a random sample consensus algorithm [25].Other object recognition approaches use hierarchical representations of objects, which have been inspired by the visual processing in the brain.These include the HMAX model [44], the object representation by parts proposed in [17], neural networks [26] and the deep learning approach [30].
These to positive examples, but may also give strong responses to objects that contain additional contour parts.For instance, the detectors, which are trained by examples shown in the top row of Fig. 1, will give strong responses to objects that are equivalent or similar to the ones shown in the top row of Fig. 1.They will, however, also give strong responses to objects that are equivalent or similar to the ones in the bottom row of Fig. 1.Therefore, it is difficult for these methods to discriminate the pairs of patterns shown in Fig. 1a-f.
The rest of the paper is organized as follows.In Sect.2, we explain how an inhibition-augmented filter is configured by given positive and negative prototype patterns.In Sect.3, we demonstrate the effectiveness of the proposed approach in three applications.In Sect.4, we discuss some aspects of the proposed method, and finally, we draw conclusions in Sect. 5.

Overview
Figure 3a shows an input image containing a rectangle with a vertical line inside it.Let us consider the two local patterns encircled by a solid and a dashed line, which are shown enlarged in Fig. 3b, c, respectively.The two solid ellipses in Fig. 3b, c surround a line segment that is present in both patterns, while the dashed ellipse surrounds a line segment that is only present in Fig. 3c.We use these two patterns to configure an inhibition-augmented filter that will respond to the pattern shown in Fig. 3b, a line ending, but not to the pattern shown in Fig. 3c, a continuous line.
We consider the line ending and the continuous line shown in Fig. 3b, c as a positive and a negative prototype, respectively.A positive prototype is a local pattern to which the The solid ellipses represent line segments that are present in both prototypes, while the dashed ellipse represents a line segment which is only present in the negative prototype inhibition-augmented filter to be configured should respond, while a negative prototype is a local pattern to which it should not respond.We use the positive and the negative prototypes to configure two COSFIRE filters with the method proposed in [5].Next, we look for and identify pairs of contour parts with identical properties in the two filters.In Fig. 3, we use a solid ellipse to indicate that the corresponding contour part is an excitatory feature.We use a dashed ellipse to indicate the contour part that is only present in the negative prototype, and therefore, we consider it as an inhibitory feature.
The response of the inhibition-augmented filter is the difference between the excitatory input and a fraction of the maximum of the inhibitory inputs.The resulting filter will only respond to the patterns that are identical with or similar to the positive prototype, but will not respond to images similar to any of the negative prototypes.This design decision is inspired by the function of a type of shape-selective neuron in posterior inferotemporal cortex.
In the next subsections, we elaborate further on the configuration steps mentioned above.

Gabor filters
The proposed inhibition-augmented filter uses as input the responses of Gabor filters.We denote by g λ,θ (x, y) the response of a Gabor filter, which has a preferred wavelength λ and orientation θ , to a given input image at location (x, y).We threshold the responses of Gabor filters at a given fraction t 1 (0 ≤ t 1 ≤ 1) of the maximum response across all combinations of values (λ, θ ) and all positions (x, y) in the image.We denote these thresholded response images by |g λ,θ (x, y)| t 1 .Figure 4a shows the intensity map of a Gabor function with a wavelength λ = 6 and an orientation θ = 0. Figure 4b, c shows the corresponding thresholded response images of this Gabor filter |g 6,0 (x, y)| t 1 =0.2 to the input images in Fig. 3b,  c, respectively.Such a filter has other parameters, including spatial aspect ratio, bandwidth and phase offset on which we do not elaborate further here.We refer the interested reader The ellipses illustrate the wavelengths and orientations of the selected Gabor filters, and their positions indicate the locations at which the responses of these Gabor filters are taken with respect to the center.The blobs represent the blurring functions that are used to provide some spatial tolerance to these positions to [5,27,41] for technical details and to an online implementation. 1

Configuration of an inhibition-augmented filter
The configuration of an inhibition-augmented filter involves two steps.
In the first step, we configure two separate COSFIRE filters with the method proposed in [5] to be selective for the specified positive and negative prototypes that are shown in Fig. 3b, c, respectively.Figure 5a, b shows the corresponding superimposed thresholded responses of a bank of Gabor filters (θ ∈ {0, π/8, . . .7π/8} and λ ∈ {4, 4 √ 2, 6, 6 √ 2}) to the positive and negative prototypes.In this example, for the configuration of a COSFIRE filter with a given prototype, we consider the Gabor responses along two concentric circles with radii ρ ∈ {5, 14} pixels around the specified 1 http://matlabserver.cs.rug.nl.point of interest.In Fig. 5c, d we illustrate the structures of the resulting selected filters.The size and orientation of an ellipse represent the preferred wavelength λ and orientation θ of a Gabor filter that provides input to the COSFIRE filter.The position of its center indicates the location at which we take the concerned Gabor filter response.
We specify a COSFIRE filter by a set of four tuples in which each four tuple represents a Gabor filter and the positions at which its response has to be taken.We denote by P f and N f the two COSFIRE filters, configured with the patterns shown in Fig. 3b, c, respectively: and In the second step, we form a new set S f by selecting tuples from the sets P f and N f as follows.We include all tuples from the set P f in the new set S f and add a new parameter δ = +1 to indicate that the corresponding Gabor responses of such tuples provide excitatory input to the inhibitionaugmented filter.We define a dissimilarity function, which we denote by d(P i f , N j f ), of the distance between the locations indicated by the ith tuple in the set P f and the jth tuple in the set N f : where D is the Euclidean distance between the polar coordinates (ρ i , φ i ) of tuple i in the positive set P f and the polar coordinates (ρ j , φ j ) of tuple j in the negative set N f .ζ is the threshold, and we provide further details on the selection of its value in Sect.2.5.We compute the pairwise dissimilarity values between one tuple N j f from N f and all tuples from P f .If N j f is dissimilar to all tuples in P f , we include it to the new set S f and add a tag δ = −1, which indicates that the corresponding Gabor response provides an inhibitory input.We repeat the above procedure for each tuple in set N f .With this process, we ensure that a line segment that is present in both the positive and the negative prototypes in roughly the same position gives an excitatory input.On the other hand, a line segment that is only present in the negative prototype, i.e., it does not overlap with a line segment in the positive prototype, provides an inhibitory input.
For the above example, we include the two tuples in set P f , which are illustrated by the two ellipses in Fig. 5c, in the new set S f .We add to each of these two tuples a tag δ = +1 to indicate that they provide excitatory input to the inhibition-augmented filter.These two tuples are also present in set N f .Then, we include in S f the other two tuples from N f indicated by the two ellipses at the top of Fig. 5d with a tag δ = −1 as we do not find any matches in P f .For the above example, this method results in the following set S f : shows the structure of the resulting inhibitionaugmented filter that is represented by the set S f .The red ellipses indicate Gabor filters that provide excitatory input, and the blue ellipses indicate Gabor filters that provide inhibitory input to the inhibition-augmented filter at hand.

Configuration with multiple negative prototypes
In the above example, we configured an inhibition-augmented filter to be selective for line endings by using one positive and one negative prototype pattern.In practice, however, a positive pattern may be contained within multiple other patterns, and thus, we may need multiple negative examples.
Figure 7a-c shows an example of three similar Chinese letters that have completely different meanings and are translated into English as "big," "dog" and "extremely," respectively.The letter in Fig. 7a is also present in Fig. 7b, c, but accompanied with additional strokes.Next, we demonstrate how we configure an inhibition-augmented filter with more than one negative prototype pattern.Here, we use the letter image in Fig. 7a as our positive pattern of interest from which we extract contour parts that provide excitatory input to the resulting filter.The letter images in Fig. 7b, c are used as negative prototype patterns from which we determine inhibitory contour parts.
First, we configure a filter P f for the positive prototype pattern in Fig. 7a as proposed in [5] that results in only excitatory inputs.For this example, we consider three values of the radius ρ (ρ = {0, 15, 33}) and we apply a bank of Gabor filters with four wavelengths (λ ∈ {8, 8 √ 2, 16}) and eight orientations (θ ∈ { πi 8 | i = 0 . . .7}).Then, we use the procedure proposed in [5] to apply the filter P f to both the negative prototype patterns in Fig. 7b, c.For each negative pattern, we determine the location at which the maximum response is achieved by the filter P f .We take the patterns from Fig. 7b, c that surround these locations and use them to configure two COSFIRE filters, which we denote by N f 1 and N f 2 , respectively.Finally, we form a new set S big by selecting appropriate tuples from P f , N f 1 and N f 2 as follows.We include all tuples from set P f in the new set S big with a tag δ = +1 and compute the dissimilarity values between the locations of the tuples in N f i (here i = 1, 2) and those in set P f by the method described in Sect.2.3.The tuples in N f 1 and N f 2 that are not similar to any of the tuples in P f are added to S big and marked as inhibitory parts with tags Figure 7d shows the resulting structure of the inhibitionaugmented filter S big , in which the red ellipses indicate the tuples of the filter that provide excitatory input to the inhibition-augmented filter, while the blue and green ellipses indicate the tuples that provide inhibitory input.

Application of an inhibition-augmented COSFIRE filter
In the following, we first explain how we blur and shift the responses of the involved Gabor filters, and then, we describe the functions that we use to compute the collective excitatory input, the various collections of inhibitory inputs and the ultimate filter output.

Blurring and shifting Gabor filter responses
We blur the Gabor filter responses in order to allow for some tolerance in the positions at which their responses are taken.We define the blurring operation as the weighted maximum of local Gabor filter responses.For weighting, we use a Gaussian function G σ (x, y), the standard deviation σ of which is a linear function of the distance ρ from the center of the COSFIRE filter: where σ 0 and α are constants.The choice of the linear function in Eq. 2 is advocated for more detail in [5].For α > 0, the tolerance to the positions of the considered contour parts increases with an increasing distance ρ from the center of the concerned COSFIRE filter.We use values of α between 0 and 2, depending on the application.Then, we shift all blurred Gabor filter responses so that they meet at the support center of the inhibition-augmented filter.This is achieved by shifting the blurred responses of a Gabor filter (λ i , θ i ) by a distance ρ i in the direction opposite to φ i .In polar coordinates, the shift vector is specified by (ρ i , φ i + π).In Cartesian coordinates, it is ( x i , y i ) where x i = −ρ i cos φ i , and y i = −ρ i sin φ i .We denote by s λ i ,θ i ,ρ i ,φ i ,δ i (x, y) the blurred and shifted thresholded response of a Gabor filter in position (x, y) that is specified by the ith tuple (λ i , θ i , ρ i , φ i , δ i ) in the set S f : where −3σ ≤ x , y ≤ 3σ .
In order to prevent interference of inhibitory and excitatory parts of the filter, we restrict ζ (in Eq. 1) to be three times the maximum standard deviation of any pair of tuples in P f and N f .

Response of an inhibition-augmented COSFIRE filter
We denote by r S f (x, y) the response of an inhibitionaugmented COSFIRE filter which we define as the difference between excitatory response r S + f (x, y) and a fraction of the maximum of the inhibitory responses r S − j f (x, y). where is a coefficient that we call inhibition factor and |.| t 3 stands for thresholding the response at a fraction t 3 of its maximum across all image coordinates (x, y).
We denote by r S + f and r S − f , the weighted geometric means of all the blurred and shifted responses of the Gabor filters s λ i ,θ i ,ρ i ,φ i ,δ i (x, y) that correspond to the contour parts described by S + f and S − j f : where |.| t 2 stands for thresholding the response at a fraction t 2 of its maximum across all image coordinates (x, y).For 1/σ = 0, the computation of the COSFIRE filter becomes equivalent to the standard geometric mean.We refer the interested reader to [5] for more details.Figure 8 shows an illustration of the application of an inhibition-augmented filter that is selective for vertical line endings pointing upwards.Figure 8d shows the output of this filter, and the positions of the strongest local output are marked by crosses in the input image.In this example, this filter only responds strongly at the locations where the positive pattern is present.
Figure 9a shows a data set of line endings with different line widths and orientations.We applied the same configured inhibition-augmented filter to the stimuli in this data set, and the responses of this filter are rendered by a gray level shading of the features (Fig. 9b).The maximum response is reached for the feature that was used as a positive prototype in the configuration process while it also reacts, with less than the maximum response, to line endings that differ slightly in scale and orientation.This example illustrates the selectivity and the generalization ability of the proposed filter.Moreover, in Fig. 10d-f we show the response images of the filter S big , which we configured in Sect.2.4, to the corresponding patterns in Fig. 10a-c.The configured inhibition-augmented filter correctly responds only to the pattern shown in Fig. 10a but not to the ones in Fig. 10b, c.

Tolerance to geometric transformations
The proposed inhibition-augmented filter can achieve tolerance to scale, rotation and reflection by similar parameter manipulation as proposed for the original COSFIRE filters [5]. Figure 9c, d  elaborate on these aspects here, and we refer the reader to [5] for a thorough explanation.

Applications
In the following, we demonstrate the effectiveness of the proposed inhibition-augmented filter in three practical applications: the detection of vascular bifurcations in retinal fundus images, the recognition of architectural and electrical symbols and the recognition of handwritten digits.

Detection of retinal vascular bifurcations
The retina contains cues of the health status of a person.For instance, its vascular geometrical structure can reflect the risk of some cardiovascular diseases such as hypertension [53] and atherosclerosis [14].The identification of vascular bifurcations is one of the basic steps in such analysis.For a thorough review on retinal fundus image analysis, we refer to [1,40].Figure 11 shows an example of a retinal fundus image and its segmentation in blood vessels and background, both of which are taken from the DRIVE data set [48].It contains 109 blood vessel features (81 bifurcations marked by red circles and 28 crossovers marked by blue squares).A bifurcation-selective filter configured by the basic COSFIRE approach [5] gives a response also to crossovers and therefore cannot be used to exclusively detect bifurcations.The existing methods that are used to distinguish bifurcations from crossovers preprocess the binary retinal fundus images by morphological operators, such as thinning.Then, they typically apply template matching or connected component labeling, which do not work very well for complicated situations, e.g., two bifurcations that are close to each other can be detected as a crossover.An overview of these methods can be found in [2,11,52].In the following, we illustrate how inhibitionaugmented filters that we propose can be configured to detect only vascular bifurcations in retinal fundus images.
First, we select a bifurcation prototype from a given retinal fundus image and use it as a positive example to configure a COSFIRE filter P f 1 that is composed of excitatory vessel segments.For the configuration of this filter, we use three values of the distance ρ (ρ = {0, 5, 10}), threshold value t 1 = 0.2, t 2 = 0.45, a bank of symmetric Gabor filters with eight orientations (θ ∈ { πi 8 | i = 0 . . .7}) and five wavelengths (λ ∈ {4(2 12b, e show an enlarged prototype and the corresponding filter structure, respectively.Then, we apply the configured filter P f 1 to all 20 training retinal fundus images (with filenames from 21_manual1.gif to 40_manual1.gif) without tolerance to rotation, scale and reflection transformations.We consider the points that characterize crossover patterns and evoke sufficiently strong responses (which is more than a fraction ε of the maximum response to the positive pattern, here ε = 0.2).We then use these patterns as negative prototypes. Figure 12a, c show two of the negative prototypes and the structures of the resulting COSFIRE filters are shown in Fig. 12d, f.We generate an inhibition-augmented filter S f 1 by the method proposed in Sect.2.4. Figure 12d-i shows how two groups of inhibitory line segments are automatically selected by the proposed configuration procedure.
We repeat the above procedure by applying the filter P f 1 in reflection-and rotation-tolerant mode in order to find more negative patterns.Finally, the filter S f 1 contains 19 groups of inhibitory tuples.
The values of the inhibition factor η and the threshold t 3 are determined as follows.We apply the filter S f 1 to the 20 training retinal fundus images and perform a grid search to estimate the best pair of parameters η and t 3 .For η, we consider the range of values [0, 5], and for t 3 we consider the range [0, 1], both of which are in intervals of 0.01.For each combination of these two parameters, we calculate the precision P and recall R. The corresponding harmonic mean (2P R/(P + R)) reaches a maximum at an inhibition factor η = 2 and threshold t 3 = 0.29 when the precision P is at least 90 %.Here, the filter S f 1 detects 30 bifurcations and achieves 100 % precision.For the remaining bifurcations that are not detected by S f 1 , we perform the following steps.We randomly select one of the undetected bifurcations and use it as a new positive prototype.Then, we use the same procedure as described above to find the inhibitory parts of the new filter S f 2 as well as the corresponding inhibition factor η and threshold value t 3 .The prototype pattern f 2 is shown in Fig. 13.By applying the filters S f 1 and S f 2 (η(S f 2 ) = 1.80, t 3 (S f 2 ) = 0.37) together, we correctly detect 42 correctly detected bifurcations and no crossovers.We continue increasing the number of filters by using vascular features that are not detected by the previously configured filters.For this given retinal fundus image, we achieve 95 % recall and 100 % precision with only four  Fig. 14 Precision-recall plots of the inhibition-augmented method and original COSFIRE method, indicated by the dashed and solid line, respectively filters, Fig. 13.Table 1 reports the values of the parameters η and t 3 that were determined with the grid search method described above.In order to evaluate the performance of proposed approach, we apply the four inhibition-augmented filters to the 20 test retinal fundus images in the DRIVE data set.We perform two experiments with the four filters, the first one using the fine-tuned inhibition factors η and the other one with η = 0. We change the value of the threshold parameter t 3 (S f i ) to compute the precision P and recall R. For each filter, we alter the threshold value t 3 (S f i ) by the same offset value (ranging between −0.2 and 0.2 in intervals of 0.01) which results in the P-R plots shown in Fig. 14.For the same value of recall, the precision of the inhibition-augmented method is substantially higher than that of the method without inhibition.

Recognition of architectural and electrical symbols
Recognition of hand-drawn or scanned architectural and electrical symbols is an important application for the automatic conversion to a digital representation which can then be stored efficiently or processed by auto CAD sys- tems [43,51,55,58].In the following, we illustrate how the inhibition-augmented filter that we propose is effective for such an application.We evaluate the proposed approach on the Graphics Recognition Contest (GREC'11) data set [54].The GREC'11 data set contains 150 different symbol classes, in which the images are of size 256 × 256 pixels.This data set consists of three different sets of images, namely SetA, SetB and SetC.SetA contains 2500 images from 50 symbol classes, SetB comprises 5000 images from 100 classes, and SetC consists of 7500 images from 150 classes.The three data sets contain examples with different scale, rotation and various levels of noise degradation.
In the following, we explain how the proposed inhibitionaugmented filters are configured to be exclusively selective for specific symbol classes.Figure 15 shows two such examples of symbol images from the GREC'11 data set.All contour parts of the symbol in Fig. 15a are contained in the symbol in Fig. 15b.
For configuration, we do the following steps.First, we consider a model symbol, such as the one in Fig. 15a, as a positive prototype pattern to configure a COSFIRE filter without inhibition.Figure 16a shows the structure of the resulting filter.Then, we apply the configured filter in rotation-and scale-tolerant mode to all the other 149 model images.We threshold the responses at a given fraction ε (ε = 0.3) of the maximal filter response to the positive pattern used for configuration.The other symbol images which evoke strong responses to the filter are considered as negative prototype patterns.For instance, the symbol shown in Fig. 15b is one negative prototype for the pattern in Fig. 15a.The COSFIRE filter structure that corresponds to the pattern in Fig. 15b is shown in Fig. 16b.Next, we compare the structures shown in Fig. 16a, b to identify contour parts to be used for inhibition.In Fig. 16c, we show the structure of the resulting inhibitionaugmented filter, in which red and blue ellipses and blobs indicate Gabor responses that provide, respectively, positive and negative inputs to the filter.In this implementation, we consider a bank of Gabor filters with eight orientations (θ ∈ { πi 8 | i = 0 . . .7}) and two wavelengths (λ ∈ {10, 18}).We use the empirically determined threshold values t 1 = 0.2 and t 2 = 0.5.For the blurring function, we use a fixed standard deviation σ = 4.In order to make sure that we extract information from all the line segments of a given prototype, we first use a large set of ρ values, and then, we remove redundant tuples from the filters as follows.We compute the pairwise dissimilarity proposed in Sect.2.3 with parameter ζ equal to three times the maximum standard deviation of any pair of tuples and delete one tuple from the pair whose dissimilarity value is 0. In this way, the corresponding blurring maps of tuples do not overlap each other.
In order to determine the optimal value of the inhibition factor η for such an inhibition-augmented filter, we perform the following steps.First, we apply the filter to all 150 model symbol images with different values of inhibition factor η in a range between 0 and 10 in interval of 0.1.Then, for each inhibition factor, we calculate the harmonic mean of the precision2 and recall3 of this filter.Figure 17 shows the harmonic mean of the concerned filter with different values of inhibition factor.The optimal inhibition factor (η = 7.1) is the minimum value of η that achieves the highest harmonic mean.In Fig. 17, we indicate this point by a star marker.
We perform the same procedure on the remaining 149 symbols.We apply the resulting 150 inhibition-augmented filters to the 150 symbol images.Figure 18a, b shows matrices (of size 150 × 150) obtained using the COSFIRE filters without inhibition (η = 0) and the inhibition-augmented COSFIRE filters, respectively.The value of the element (i, j) in each matrix is the maximum response of the filter configured by symbol i to symbol image j.For each filter, we compute the precision and recall.The average precision entations.We do not preprocess the images that have a mean value less than 90 % of the maximum since most of them do not lose part of their contour segments.
We apply the 150 inhibition-augmented filters to each preprocessed image by using the proposed method in rotationand scaling-tolerant mode with parameters ψ = { πi 32 | i = 0, 1, . . ., 31} and v = {0.5, 0.6, . . ., 2.5}.A given image is classified to the class of the positive prototype symbol by which the inhibition-augmented filter that achieves the maximum response was configured.In Table 2, we compare the results that we achieve with the existing methods on the three data sets.The proposed approach achieves the best results in all data sets.

Recognition of handwritten digits
Handwritten digit recognition is an important application in optical character recognition (OCR) systems.Various benchmark data sets and approaches have been proposed, a review of which is given in [33].
In this application, we use the MNIST data set [31] to evaluate the performance of our approach.The data set contains 60,000 training and 10,000 test digit images in gray scale of size 28 × 28 pixels. 4or configuration, we randomly select 20 training images from each digit class.We select a random location as the point of interest to configure a COSFIRE filter in each image from the same digit class.The local pattern around such a point should provide at least four tuples to the resulting filter; otherwise, we select another random location.Then, we apply this filter to all the other 180 training images from different digit classes in order to identify negative prototypes.We use the method described in Sect. 2 to configure an inhibition-augmented filter.We repeat the above process for all the 200 training digit images and configure 200 inhibitionaugmented filters.In this application, we use a bank of antisymmetric Gabor filters with 16   For the application process, we apply these 200 inhibitionaugmented filters to 60,000 training images by using the proposed method.We take the maximum response of each filter to these digit images and generate a matrix of size 60,000 × 200.
Next, we apply a wrapper method for feature selection using support vector machines (SVMs) with a linear kernel.We iteratively add the result of one filter, the one that best improves the sevenfold cross-validation accuracy, and stop the process until no more improvement is achieved.This process results in 108 filters when the 200 inhibitionaugmented COSFIRE filters are applied (η = 1) and 111 filters when the 200 original COSFIRE filters are applied (η = 0).Then, we use the inhibition-augmented and non-inhibition-augmented training vectors with the selected features to train two multi-class SVMs.
The plots in Fig. 19 show the recognition rates as a function of increasing number of selected filters.The method with inhibition achieves a recognition rate of 98.77 % with 108 filters while the method without inhibition achieves 98.66 % with 111 filters.The inhibition-augmented training vectors of 108 dimensions have 753,019 (11.62 %) zero elements, which is substantially greater than the 277,641 (4.17 %) zero elements in non-inhibition-augmented vectors of 111 dimensions.In this application, the proposed inhibition-augmented COSFIRE filters achieve better recognition rate with less number of filters and with a much sparser representation.

Discussion
We proposed an inhibition-augmented COSFIRE approach which uses a positive prototype and a set of negative prototypes to configure a filter.The choice of negative prototypes can be either manually specified by a user or automatically discovered by the system.For instance, the negative prototype shown in Fig. 3, a complete line, is selected by the user.For more complex situations, such as the recognition of symbols and handwritten digits, it is more practical to use an automated process.To discover negative prototypes, we first apply the COSFIRE filter which is configured by a positive prototype pattern to all the other pattern images.The ones which evoke strong responses to the filter are negative prototype patterns.
The response of an inhibition-augmented filter is defined as the difference between the excitatory input and a fraction of the maximum of the inhibitory inputs.The inhibition factor can be adjusted by changing the value of the parameter η.In the detection of vascular bifurcations and the symbol recognition applications, we determine an optimal value of η for each filter as the one that contributes to the maximum harmonic mean on the training images.For the other application, we set the same η value for all filters so that none of them achieves a response to any of the negative patterns.
In neurophysiology, there is an ongoing debate about what kind of neural coding the brain uses to encode the representation of objects.The two extremes in the debate are the grandmother cell theory (i.e., only one specific cell fires for a given pattern) and population coding (i.e., a number of neurons fire for a given pattern with different rates).In the recognition of architectural and electrical symbols, the proposed inhibition-augmented COSFIRE filters work in the way that is similar to the grandmother cell theory.While in the recognition of handwritten digits, they are similar to the population coding.Both applications demonstrate that the inhibition mechanism facilitates sparseness in the representation of information.
The computational cost of the configuration of a COS-FIRE filter with inhibition depends on the number of negative prototype patterns and the bank of Gabor filters it uses.An inhibition-augmented filter is configured in less than one second for one positive and one negative prototype pattern with the size of 512 × 512 pixels and a bank of Gabor filters of eight orientations and five wavelengths.The computational cost of the application of an inhibition-augmented filter is proportional to the computations of the excitatory and inhibitatory responses and their blurring and shifting operations.For the detection of vascular bifurcations, a retinal fundus image of size 564×584 pixels is processed in less than 20 s by four rotation-and reflection-tolerant inhibition-augmented filters.And for the recognition of architectural and electrical symbols, a symbol image of size 256 × 256 pixels is processed in less than 30 s by 150 inhibition-augmented filters without any rotation or scaling tolerances.For the third application, a handwritten digit image of size 28 × 28 pixels is described by 200 inhibition-augmented COSFIRE filters without any rotation or scaling tolerances in less than 5 s.
We used a sequential implementation in MATLAB 5 for all experiments that run on the same standard 3GHz processor.
There are various possible directions for future research.One direction is to apply the proposed inhibition-augmented filters in other objection localization and recognition tasks, as well as image classification.Another direction is to investigate a learning algorithm to determine the output function by assigning different weights to inhibitory and excitatory contour parts.

Conclusions
The proposed inhibition-augmented filters are versatile trainable keypoint and object detectors as they can be trained with any given positive and negative prototype patterns.We demonstrated the effectiveness of the method with three applications: detection of vascular bifurcations (i.e., without crossovers) in retinal fundus images (DRIVE data set), recognition of architectural and electrical symbols (GREC'11 data set) and the recognition of handwritten digits (MNIST data set).The inclusion of the inhibition mechanism improves the discrimination properties and the performance of COSFIRE filters.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Ms. Jiapan Guo received BSc (2009) degree from China Agricultural University, Beijing, China, and MSc (2012) degree in Signal Processing from Xidian University, Xi'an, China.She is currently a PhD candidate at the Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, the Netherlands.Her current research interest is brain-inspired computer vision, including computational visual models of feature and object recognition.
Mr. Chenyu Shi received BSc (2009) degree with honor of Technique Specialist Student in Electronic Information Engineering from China Agricultural University (CAU).In 2010, he was selected to join the China-Germany Public Private Partnership project in Bonn University and graduated in Electrification and Automation at CAU in 2011.Currently, he is a PhD candidate at the Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, the Netherlands.His current research interests are in the field of braininspired machine vision, which includes computational models of the visual system with applications to contour detection, feature and shape recognition.
Dr. George Azzopardi received BSc degree with honors (first class) in Computer Science from Goldsmiths University of London in 2006, and two years later, he received MSc with distinction in Advanced Methods of Computer Science from Queen Mary University of London.In 2013, he received a PhD cum laude from the University of Groningen in visual pattern recognition.Currently, he is Academic Resident (Lecturer) in the Intelligent Computer Systems department of the University of Malta.His current research is in the field of brain-inspired trainable pattern recognition with applications to contour detection, segmentation, feature and shape recognition, as well as predictive modelling of time series data.He was general co-chair of the 16th international CAIP (Computer Analysis of Images and Patterns) conference that was held in Malta in 2015.
Prof. Nicolai Petkov received Dr.sc.techn.degree in Computer Engineering (Informationstechnik) from Dresden University of Technology, Dresden, Germany.Since 1991, he is professor of computer science and head of the Intelligent Systems group.He is the author of two monographs and coauthor of another book on parallel computing, holds four patents and has authored and co-authored over 100 scientific papers.His current research is in pattern recognition, machine learning, data analytics and brain-inspired computing, with applications in health care, finance, surveillance, manufacturing, robotics, animal breeding, etc.He is a member of the editorial boards of several journals. 123

Fig. 1
Fig. 1 Examples of pairs of patterns that COSFIRE filters of the type proposed in [5] are not able to distinguish.a Two traffic signs which have the opposite messages: permission and prohibition of turning right.b Two Chinese characters that are translated into English as "big" and "dog."c Two music notes: quarter and eighth.d Two electrical symbols: normal and light-emitting diodes.e Bifurcations and crossovers in retinal fundus images.A COSFIRE filter that is trained to detect the upper pattern in a pair of two will give a strong response to the lower pattern too methods require many training examples to configure models of objects of interest.When such detectors and descriptors are trained, only positive examples are considered without the inclusion of inhibition mechanisms.The resulting detectors and descriptors can detect objects that are similar

Fig. 3 a
Fig. 3 a Synthetic input image (of size 300 × 300 pixels).The solid circle indicates a positive prototype of interest (a line ending) while the dashed circle indicates a negative prototype of interest (a continuous line segment).The images in (b, c) show enlargements of the selected positive and negative prototype patterns, respectively.The gray crosses in (b, c) indicate the center positions of interest, and the ellipses illustrate the orientation and location of the contour parts in the neighborhoods.The solid ellipses represent line segments that are present in both prototypes, while the dashed ellipse represents a line segment which is only present in the negative prototype

Fig. 4 aFig. 5
Fig. 4 a Intensity map (of size 21 × 21 pixels) of a symmetric Gabor function with wavelength λ = 6 and orientation θ = 0. Light and dark regions correspond to positive and negative values of the Gabor function, respectively.b-c The thresholded (at t 1 = 0.2) Gabor response images (of size 30 × 30 pixels) to Fig. 3b, c, respectively

Fig. 6
Fig. 6 Structure of an inhibition-augmented filter.The four ellipses indicate the responses of four Gabor filters with the parameter values specified by the set S f .The two red ellipses at the bottom represent the excitatory input to this inhibition-augmented filter, while the two blue ellipses at the top represent the inhibitory input (color figure online)

Fig. 8
Fig. 8 Illustration of the intermediate computations performed in an inhibition-augmented filter that is selective to vertical line endings pointing upwards.a We first convolve the input image (of size 300×300 pixels) with a Gabor filter which has a wavelength λ = 6 and an orientation θ = 0.The three enframed inlay images illustrate (top two) the enlarged positive and negative prototype patterns and (bottom) the structure of the resulting inhibition-augmented filter.The red ellipses represent the preferred wavelengths and orientations of the Gabor filters that provide excitatory input to the concerned filter S f , while the blue ellipses represent the channels of the Gabor filters that provide inhibitory

Fig. 9 aFig. 10 a
Fig.9a A systematically designed data set of line endings that vary in orientation (in intervals of π/8) as well as in scale (the line width ranges from 1 pixel to 5 pixels).The enframed feature is the same one shown in Fig.3bwhich is used as a positive prototype for configuring an

Fig. 11
Fig. 11 Example of a retinal fundus image from the DRIVE data set.a Original image (of size 564 × 584 pixels) with filename 21_training.tif.b Binary segmentation of vessels and background (also from DRIVE).The red circles surround vessel bifurcations, blue squares surround crossovers, and this labeling is part of the current work (color figure online)

Fig. 12
Fig. 12 Examples of positive and negative prototype patterns.b A positive prototype pattern, which is the feature of interest.a, c Negative prototype patterns.d-f The structure of the filters that are selective for the features in (a-c).g-h Two inhibition-augmented filters configured by one positive and one negative prototype.i An inhibition-augmented filter configured by one positive and two negative prototypes.The tuples that are indicated by the red ellipses come from the positive pattern in (b), and the tuples that are indicated by the blue and green ellipses come from the negative patterns in (a, c), respectively (color figure online)

f 1 f 2 f 3 f 4 Fig. 13 A
Fig. 13 A set of four bifurcations ( f 1 . . .f 4 ) taken from the DRIVE data set.These four bifurcations are extracted from the binary retinal fundus image shown in Fig. 11b with filename 21_manual1.gif

Fig. 15
Fig. 15 Example of a a symbol that is contained within b another symbol

Fig. 19
Fig.19 Plots show the recognition rates on the MNIST data set with different numbers of filters by using the methods with and without inhibition

Table 1
Optimal values of η