Vision transformerVision Transformer (ViT) based on self-attention architecture has become the de facto standard for NLP tasks and has achieved good results on large-scale datasets29. The overall framework of ViT is shown in Fig. 1. It divides the 2D image \(\:x\in\:{R}^{H\times\:W\times\:C}\) into a series of flattened 2D patches \(\:{x}_{p}\in\:{R}^{N\times\:({p}^{2}.c)}\), where, (H, W) is the original image resolution, C is denoted as the number of channels, (P, P) is the resolution of each image block, and \(\:N=HW/{P}^{2}\) is known as the number of generated blocks (i.e., the effective input sequence length). Transformer uses a constant vector size D in all its layers. The Transformer flattens the patch and maps it to the D dimension using a trainable linear projection (Eq. (1)), whose output is called patch embeddings. It adds position embedding into patch embeddings to retain location information and use the result as input to the Transformer encoder. For classification purposes, it uses standard methods to add an extra learnable “class token” to the sequence.Fig. 1The overall framework of ViTTransformer encoder consists of alternating layers of multi-head attention (MSA, Eq. (5)) and MLP blocks (Eqs. (2), (3)). It applies layer norm (LN) before each block and residual connections after each block. MLP contains two layers with GELU nonlinearity.$$\:{Z}_{0}=\left[{x}_{class};{x}_{p}^{1}E;{x}_{p}^{2}E;\dots\:;{x}_{p}^{Z}E\right]+{E}_{pos},\:\:\:E\in\:{R}^{\left({p}^{2}.c\right)\times\:D},\:\:{E}_{pos}\in\:{R}^{\left(N+1\right)\times\:D}\:$$
(1)
$$\:{{Z}^{{\prime\:}}}_{\mathcal{l}}=MSA\left(LN\left({Z}_{\mathcal{l}-1}\right)\right)+\:{Z}_{\mathcal{l}-1},\mathcal{\:}\mathcal{\:}\mathcal{\:}\mathcal{l}=1\dots\:L$$
(2)
$$\:{Z}_{\mathcal{l}}=MLP\left(LN\left({{Z}^{{\prime\:}}}_{\mathcal{l}}\right)\right)+{{Z}^{{\prime\:}}}_{\mathcal{l}},\mathcal{\:}\mathcal{\:}\mathcal{\:}\mathcal{\:}\mathcal{l}=1\dots\:L$$
(3)
$$\:y=LN\left({Z}_{L}^{0}\right)$$
(4)
$$\:A=Attention(Q,K,V)=softmax\left(\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}\right)V$$
(5)
Concept-cognitive computing systemConcept-cognitive learning (CCL) (that was first proposed by Mi. et al.30,31) is a cross-technology, including formal concept analysis, machine learning, granular computing and dynamic learning, which involves concept learning and knowledge processing in a dynamic environment. In the field of CCL, some recent works have included concurrent classification models (e.g., C3LM32), dynamic classification models (e.g., C3S33 and gC3S34), etc. Concept-cognitive computing (CCC) system is a dynamic concept learning framework. It mainly focuses on how to learn new concepts from different types of data under dynamic environments, and it is usually presented by three aspects of knowledge: storage, learning, and updating. It has good interpretability, as well as classification performance. For the sake of understanding, a set of formal symbols and definitions were briefly described as follows. For more details on the derivations, please refer to the works of Mi et al.33.
Definition 1
Let be a fuzzy formal context. Then for and, the operator can be defined as follows:$$\:{\varvec{X}}^{\varvec{*}}\left(\varvec{a}\right)=\underset{\varvec{x}\in\:\varvec{X}}{\bigwedge\:}\stackrel{\sim}{\varvec{I}}(\varvec{x},\varvec{a}),\varvec{a}\in\:\varvec{M},\:\:\:{\stackrel{\sim}{\varvec{B}}}^{\varvec{*}}=\left\{\varvec{x}\in\:\varvec{G}|\forall\:\varvec{a}\in\:\varvec{M},\stackrel{\sim}{\varvec{B}}\left(\varvec{a}\right)\le\:\stackrel{\sim}{\varvec{I}}\left(\varvec{x},\varvec{a}\right)\right\}.$$
Here, we call a pair \(\:(X,\stackrel{\sim}{B})\) a fuzzy concept if \(\:{X}^{*}=\stackrel{\sim}{B,}\:{\stackrel{\sim}{B}}^{*}=X.\) Meanwhile, X and \(\:\stackrel{\sim}{B}\) of the concept \(\:(X,\stackrel{\sim}{B})\) are known as the extent and intent, respectively. G is an object set and M is an attribute set, and a fuzzy relation between G and M is considered, i.e., \(\:\stackrel{\sim}{I}:G\times\:M\to\:\left[\text{0,1}\right]\). For a universe of discourse M, we denote the set of all fuzzy sets on M by \(\:{L}^{M}\).
Definition 2
Let be a fuzzy-crisp formal decision context. Then for any, the two pairs and are called a fuzzy conditional granular concept (fuzzy concept) and classical decision granular concept (classical concept), respectively. Furthermore, the sets of all fuzzy concepts and classical concepts are denoted as follows:$$\:{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{\varvec{c}}{\stackrel{\sim}{\mathcal{H}}}^{\varvec{c}}}=\left\{\right({\stackrel{\sim}{\mathcal{H}}}^{\varvec{c}}{\stackrel{\sim}{\mathcal{F}}}^{\varvec{c}}\left(\varvec{x}\right),{\stackrel{\sim}{\mathcal{F}}}^{\varvec{c}}\left(\varvec{x}\right)\left)\right|\varvec{x}\in\:\varvec{G}\},\:\:{\mathcal{G}}_{{\mathcal{F}}^{\varvec{d}}{\mathcal{H}}^{\varvec{d}}}={\left\{\right(\mathcal{H}}^{\varvec{d}}{\mathcal{F}}^{\varvec{d}}(\varvec{x}),{\mathcal{F}}^{\varvec{d}}(\varvec{x}\left)\right)|\in\:\varvec{G}\},$$ where \(\:{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{c}{\stackrel{\sim}{\mathcal{H}}}^{c}}\) and \(\:{\mathcal{G}}_{{\mathcal{F}}^{d}{\mathcal{H}}^{d}}\) are called the fuzzy conditional concept space and classical decision concept space, respectively. For convenience, a fuzzy conditional concept space is also called a concept space in what follows when no confusion exists.
Definition 3
Let be a concept space. For any fuzzy concept and its sub- concept, then the object-oriented fuzzy concept similarity (object-oriented FCS) is defined as$$\:{\varvec{\theta\:}}^{\varvec{o}}={\varvec{C}}^{\varvec{O}}\left({\varvec{X}}_{\varvec{j}},{\varvec{X}}_{\varvec{i}}\right)=\frac{|{\varvec{X}}_{\varvec{j}}\cap\:{\varvec{X}}_{\varvec{i}}|}{|{\varvec{X}}_{\varvec{j}}\cup\:{\varvec{X}}_{\varvec{i}}|},$$ similarly, the attribute-oriented fuzzy concept similarity (attribute-oriented FCS) is defined as$$\:{\varvec{\theta\:}}^{\varvec{a}}={\varvec{C}}^{\varvec{A}}\left({\stackrel{\sim}{\varvec{B}}}_{\varvec{j}},{\stackrel{\sim}{\varvec{B}}}_{\varvec{i}}\right)={‖{\stackrel{\sim}{\varvec{B}}}_{\varvec{j}}-{\stackrel{\sim}{\varvec{B}}}_{\varvec{i}}‖}_{2}^{2}.$$
Let \(\:{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{c}{\stackrel{\sim}{\mathcal{H}}}^{c}}^{{S}_{{\uplambda\:}}}\) be a subconcept space of \(\:{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{c}{\stackrel{\sim}{\mathcal{H}}}^{c}}\) and λ be an object-oriented FCS threshold. When the specific conditions are met, \(\:{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{c}{\stackrel{\sim}{\mathcal{H}}}^{c}}^{{S}_{{\uplambda\:}}}\) is known as an object-oriented conceptual cluster of the concept space.
Definition 4
Given an object-oriented FCS threshold λ (), and let be a partition of. Then we can define a new concept space as$$\:{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{\varvec{c}}{\stackrel{\sim}{\mathcal{H}}}^{\varvec{c}}}^{{\varvec{S}}_{\varvec{\lambda\:},\varvec{*}}}=\bigcup\:_{\varvec{i}=1}^{\varvec{m}}{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{\varvec{c}}{\stackrel{\sim}{\mathcal{H}}}^{\varvec{c}}}^{{\varvec{S}}_{\varvec{\lambda\:},\varvec{i}}}=\bigcup\:_{\varvec{i}=1}^{\varvec{m}}({\varvec{X}}_{{\varvec{S}}_{\varvec{\lambda\:},\varvec{i}}},{\stackrel{\sim}{\varvec{B}}}_{{\varvec{S}}_{\varvec{\lambda\:},\varvec{i}}}).$$
For brevity, the concept space \(\:{\mathcal{G}}_{{\stackrel{\sim}{\mathcal{F}}}^{c}{\stackrel{\sim}{\mathcal{H}}}^{c}}^{{S}_{{\uplambda\:},\text{*}}}\) is rewritten as \(\:{\mathcal{G}}^{{S}_{{\uplambda\:},\text{*}}}\) by omitting the suffix \(\:{\stackrel{\sim}{\mathcal{F}}}^{c}{\stackrel{\sim}{\mathcal{H}}}^{c}\) when no confusion exists.
Let \(\:G=\{{x}_{1},{x}_{2},\dots\:,{x}_{m}\}\) be a set of instances and \(\:\mathcal{Y}=\{1,\:2,\:.\:.\:.\:,\:l\}\) be a label space. Note that an instance x is also called an object x in formal concept analysis (FCA). According to the label information, we can ascertain that there exists a partition of the instances into l clusters \(\:{\mathcal{C}}_{1},{\mathcal{C}}_{2},\dots\:,{\mathcal{C}}_{l}\) such that they can cover all the instances, and namely \(\:{\mathcal{C}}_{1}\cup\:{\mathcal{C}}_{2}\cup\:\dots\:\cup\:{\mathcal{C}}_{l}=G\), where \(\:{\mathcal{C}}_{i}\cap\:{\mathcal{C}}_{j}=\varnothing\:\:(\forall\:i\ne\:j)\). Meanwhile, given an object-oriented FCS threshold \(\:\lambda\:=\lambda\:\left(i\right)\:(i\in\:\{\text{1,2},\dots\:,n\left\}\right)\), then the corresponding fuzzy conceptual clusters can be denoted by \(\:{\mathcal{G}}_{1}^{{S}_{{\uplambda\:}\left(i\right),\text{*}}},{\mathcal{G}}_{2}^{{S}_{{\uplambda\:}\left(i\right),\text{*}}},\dots\:,{\mathcal{G}}_{l}^{{S}_{{\uplambda\:}\left(i\right),\text{*}}}\). Furthermore, we denote the set of all fuzzy conceptual clusters by \(\:{\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\), namely \(\:{\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}=\{{\mathcal{G}}_{1}^{{S}_{{\uplambda\:}\left(1\right),\text{*}}},{\mathcal{G}}_{2}^{{S}_{{\uplambda\:}\left(1\right),\text{*}}},\dots\:,{\mathcal{G}}_{l}^{{S}_{{\uplambda\:}\left(1\right),\text{*}}}\}\).
Knowledge storageThe given parameter \(\:\lambda\:\left(i\right)\) is of great importance to construct the concept space. A novel knowledge storage mechanism, falling space, is investigated here. A sample space can be expressed as \(\:{\Omega\:}=\{\lambda\:\left(1\right),\:\:\lambda\:\left(2\right),\dots\:,\lambda\:\left(n\right),\}\), where \(\:\lambda\:\left(i\right)\)is an elementary event. Meanwhile, let\(\:\mathcal{G}=\{{\mathcal{C}}^{{S}_{{\uplambda\:}\left(1\right)}},{\mathcal{C}}^{{S}_{{\uplambda\:}\left(2\right)}},\dots\:,{\mathcal{C}}^{{S}_{{\uplambda\:}\left(n\right)}}\}\) be a set of the whole of concept spaces, and its power set is denoted by \(\:\mathcal{P}\left(\mathcal{G}\right)\).
Definition 5
Let and be two -algebras on the basic space and power set, respectively. For any and, we can define a concept measurable mapping as$$\:\varvec{\xi\:}\::\varvec{\varOmega\:}\to\:\mathcal{P}\left(\mathcal{G}\right),\:\:\:\:\:{\varvec{\xi\:}}^{-1}\left(\mathcal{C}\right)=\left\{\varvec{\lambda\:}\left(\varvec{i}\right)|\varvec{\xi\:}\left(\varvec{\lambda\:}\left(\varvec{i}\right)\right)\in\:\mathcal{C}\right\}\in\:\mathcal{A}.$$
Here, given a universe of discourse \(\:\mathcal{G}\), we call \(\:(\mathcal{P}\left(\mathcal{G}\right),\widehat{\mathcal{B}})\) a supermeasurable structure on \(\:\mathcal{G}\), and ξ is known as a random set on \(\:\mathcal{G}\).
Definition 6
Suppose is a random set on the universe of discourse. For any, concept falling with respect to ξ can be defined as$$\:{\varvec{\mu\:}}_{\varvec{\xi\:}}\left({\mathcal{C}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right)}}\right)=\varvec{P}\left\{\varvec{\lambda\:}\left(\varvec{i}\right)|\varvec{\xi\:}\left(\varvec{\lambda\:}\left(\varvec{i}\right)\right)\ni\:{\mathcal{C}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right)}}\right\},\:\:$$ where \(\:{\upxi\:}\left(\lambda\:\left(i\right)\right)\) is called a concept falling space with reference to ξ.
Given a concept space \(\:{\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\). Suppose the probability of the concept falling \(\:{\mu\:}_{\xi\:}\left({\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\right)\) is associated with the size of the concept falling space \(\:{\upxi\:}\left(\lambda\:\left(i\right)\right)\). Then we have$$\:{\varvec{\mu\:}}_{\varvec{\xi\:}}\left({\mathcal{C}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right)}}\right)=\varvec{P}\left\{\varvec{\lambda\:}\left(\varvec{i}\right)|\varvec{\xi\:}\left(\varvec{\lambda\:}\left(\varvec{i}\right)\right)\ni\:{\mathcal{C}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right)}}\right\}=\frac{1}{\left|\varvec{\xi\:}\left(\varvec{\lambda\:}\left(\varvec{i}\right)\right)\right|}.$$
This means that the difference of the concept falling \(\:{\mu\:}_{\xi\:}\left({\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\right)\) can be represented by its probability. Further, given a clue \(\:\lambda\:\left(i\right)\), we can obtain its corresponding concept \(\:{\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\); if \(\:{\mu\:}_{\xi\:}\left({\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\right)=1\), then we have the concept space \(\:{\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\). Similarly, if \(\:{\mu\:}_{\xi\:}\left({\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\right)=1/n\), then we can also obtain \(\:{\upxi\:}\left(\lambda\:\left(i\right)\right)=\{{\mathcal{C}}^{{S}_{{\uplambda\:}\left(1\right)}},{\mathcal{C}}^{{S}_{{\uplambda\:}\left(2\right)}},\dots\:,{\mathcal{C}}^{{S}_{{\uplambda\:}\left(n\right)}}\}\). Note that, for any \(\:\lambda\:\left(i\right)\), we only consider the scenario of \(\:{\mu\:}_{\xi\:}\left({\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right)}}\right)=1\).Dynamic concept learningAccording to Definition 4, given a parameter \(\:\lambda\:\left(i\right)\), we can obtain its corresponding concept space \(\:{\mathcal{G}}_{k}^{{S}_{{\uplambda\:}\left(i\right),\text{*}}}\) with reference to class label k. Suppose Dt and Dt+1 are two different sequential data chunks in a data stream S, and each learning step only processes one data chunk. Then given a data chunk Dt, the concept space \(\:{\mathcal{G}}_{k,t}^{{S}_{{\uplambda\:}\left(i\right),\text{*}}}\)can also be obtained with respect to class label k under the t-th stage.
Definition 7
Given any real concept, and concept similarity threshold, the α-concept neighborhood regarding the concept can be defined as follows:$$\:{\varvec{N}}_{\varvec{\alpha\:},\varvec{t}}\left(\varvec{X},\stackrel{\sim}{\varvec{B}}\right)=\left\{\left({\varvec{X}}_{1},{\stackrel{\sim}{\varvec{B}}}_{1}\right)\in\:{\mathcal{G}}_{\varvec{k},\varvec{t}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right),\varvec{*}}}|\varvec{s}\varvec{i}\varvec{m}\left(({\varvec{X}}_{1},{\stackrel{\sim}{\varvec{B}}}_{1}),(\varvec{X},\stackrel{\sim}{\varvec{B}})\right)\ge\:\varvec{\alpha\:}\right\}=\left\{\left({\varvec{X}}_{1},{\stackrel{\sim}{\varvec{B}}}_{1}\right)\in\:{\mathcal{G}}_{\varvec{k},\varvec{t}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right),\varvec{*}}}|\varvec{s}\varvec{i}\varvec{m}\left({\stackrel{\sim}{\varvec{B}}}_{1},\stackrel{\sim}{\varvec{B}}\right)\ge\:\varvec{\alpha\:}\right\},$$ where sim(·) is a metric function, and the cosine distance is adopted in this paper. Here, \(\:(X,\stackrel{\sim}{B})\) is also regarded as an instance with M-dimensional features.
Definition 8
Let be an α-concept neighborhood with a concept, and be the j-th dimension of the intent of the concept. For real concepts ), then a pair with respect to the α-concept neighborhood can be defined as$$\:{\varvec{X}}^{\diamond\:}=\bigcup\limits_{\varvec{i}=1}^{\varvec{m}}{\varvec{X}}_{\varvec{i}},\:\:{\stackrel{\sim}{\varvec{B}}}^{\diamond\:}=\left\{{\stackrel{\sim}{\varvec{B}}}^{\diamond\:}\left({\varvec{a}}_{1}\right),{\stackrel{\sim}{\varvec{B}}}^{\diamond\:}\left({\varvec{a}}_{2}\right),\dots\:,{\stackrel{\sim}{\varvec{B}}}^{\diamond\:}\left({\varvec{a}}_{\varvec{M}}\right)\right\},\:\:$$ where \(\:{\stackrel{\sim}{B}}^{\diamond\:}\left({a}_{j}\right)=\:(1/q)\sum\:_{i=1}^{q}{\stackrel{\sim}{B}}_{i}\left({a}_{j}\right)\). Here, we say that the pair \(\:({X}^{\diamond\:},{\stackrel{\sim}{B}}^{\diamond\:})\) is a virtual concept induced by the α-concept neighborhood \(\:{N}_{\alpha\:,t}(X,\stackrel{\sim}{B})\), and \(\:{X}^{\diamond\:}\)and \(\:{\stackrel{\sim}{B}}^{\diamond\:}\) are known as the extent and intent of the virtual concept \(\:({X}^{\diamond\:},{\stackrel{\sim}{B}}^{\diamond\:})\). Hereinafter, real concepts and virtual concepts in this paper are both called concepts when there is no risk of confusion.
The virtual concept is called the extremely abstract representation of a concept space. There is no doubt that this strategy can decrease computation time and improve computation efficiency; however, it will drop the performance significantly due to its extremely abstract representation. Therefore, a strategy of the local α-concept neighborhood (denoted by \(\:{LN}_{\alpha\:,t}(o,*)\)) is adopted for the concept space \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\). Namely, given a threshold \(\:ϵ\in\:[1,\left|{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\right|]\), then \(\:ϵ\) real concepts selected from \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\) can be used for constructing a new virtual concept. Here, a concept space by means of the local α-concept neighborhood is called a compressed concept space, and denoted by \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\).Let \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\) be a concept space in the t-th stage, and initialize \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}=\varnothing\:\). The process of obtaining the compressed concept space (POCCS) \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\) can be described as follows:Step 1. Select a concept \(\:(X,\stackrel{\sim}{B})\) from the concept space \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\). If it is a virtual concept, it can be added into \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\) directly; otherwise, we need to consider Step 2.Step 2. Given a range of concept neighborhood \(\:ϵ\) and a real concept \(\:(X,\stackrel{\sim}{B})\in\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\). For any real concept \(\:({X}_{i},{\stackrel{\sim}{B}}_{i})\), if \(\:sim({\stackrel{\sim}{B}}_{i},\stackrel{\sim}{B})\ge\:{\upalpha\:}\), then the concept \(\:({X}_{i},{\stackrel{\sim}{B}}_{i})\) is added into \(\:{LN}_{\alpha\:,t}(X,\stackrel{\sim}{B})\); otherwise, it is added into \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\) directly.Step 3. If \(\:\left|{LN}_{\alpha\:,t}\right(X,\stackrel{\sim}{B}\left)\right|=ϵ\), the local α-concept neighborhood \(\:{LN}_{\alpha\:,t}(X,\stackrel{\sim}{B})\) is expressed as a virtual concept, and then it is also input into \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\).Step 4. Select an unvisited concept from the concept space \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\), and repeat Steps 1–3 until all the concepts in \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\) are traversed.Formally, let \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),*}}\) be the concept space in the t-th stage. We denote its compressed concept space by$$\:{|\mathcal{G}}_{\varvec{k},\varvec{t}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right),\diamond\:}}\left|=\varvec{\eta\:}\right|{\mathcal{G}}_{\varvec{k},\varvec{t}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right),\varvec{*}}}|,\:\:$$ where \(\:\eta\:\in\:\left(\text{0,1}\right]\) is known as the compression ratio with respect to the original concept space.
Definition 9
Let be the concept space in the t-th stage. For any newly input object, we can get its corresponding concept according to Definition 2. Then we denote$$\:\varvec{s}\varvec{i}\varvec{m}({\varvec{C}}_{\varvec{r}},{\mathcal{G}}_{\varvec{k},\varvec{t}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right),\diamond\:}})=\left\{\varvec{s}\varvec{i}\varvec{m}\right({\varvec{C}}_{\varvec{r}},({\varvec{X}}_{\varvec{j}},{\stackrel{\sim}{\varvec{B}}}_{\varvec{j}})){\}}_{\varvec{j}=1}^{\varvec{m}}=\{\varvec{s}\varvec{i}\varvec{m}({\stackrel{\sim}{\mathcal{F}}}^{\varvec{c}}\left({\varvec{x}}_{\varvec{r}}\right),{\stackrel{\sim}{\varvec{B}}}_{\varvec{j}}){\}}_{\varvec{j}=1}^{\varvec{m}}=\{{\varvec{\theta\:}}_{\varvec{k},\varvec{j}}^{\varvec{\alpha\:}}{\}}_{\varvec{j}=1}^{\varvec{m}}.$$$$\:\text{W}\text{h}\text{e}\text{r}\text{e}\:\left({X}_{j},{\stackrel{\sim}{B}}_{j}\right)\in\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\:\text{a}\text{n}\text{d}\:m=\left|{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\right|.$$
Let \(\:{\widehat{\theta\:}}_{k,j}^{\alpha\:}=\underset{j\in\:\mathcal{J}}{\text{argmax}}\{{\theta\:}_{k,j}^{\alpha\:}\}\), where j represents the j-th concept \(\:\left({X}_{j},{\stackrel{\sim}{B}}_{j}\right)\), and \(\:\mathcal{J}=\left\{\text{1,2},\dots\:,m\right\}\). Then, considering the whole compressed space (denoted by \(\:{\mathcal{C}}^{{S}_{{\uplambda\:}\left(i\right),\diamond\:}}\)), its corresponding maximum class vector \(\:({\widehat{\theta\:}}_{1,j}^{\alpha\:},{\widehat{\theta\:}}_{2,j}^{\alpha\:},\dots\:,{\widehat{\theta\:}}_{l,j}^{\alpha\:}{)}^{T}\) can be obtained, and the final prediction with the maximum value will be output for our system. Namely,$$\hat {k}=\mathop {argmax}\limits_{{k \in \mathcal{Y}}} \{ \hat {\theta }_{{l,j}}^{\alpha }\} ,~~~$$
(6)
where \(\:\mathcal{Y}=\left\{\text{1,2},\dots\:,l\right\}\). It represents the instance (or object) \(\:{x}_{r}\) is classified into the \(\:\widehat{k}\)-th class.
Furthermore, given the concept space Ct+1 with reference to a data chunk Dt+1, we further denote$$\:\varvec{S}\varvec{i}\varvec{m}\_\varvec{C}\varvec{h}\varvec{u}\varvec{n}\varvec{k}({\varvec{C}}_{\varvec{t}+1},{\mathcal{G}}_{\varvec{k},\varvec{t}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right),\diamond\:}})=\{{\widehat{\varvec{\theta\:}}}_{\varvec{k},\varvec{j}}^{\varvec{\alpha\:}}{\}}_{\varvec{j}=1}^{\varvec{p}},\:\:$$ where p = |Ct+1|.Let Ct+1 be the concept space regarding a data chunk Dt+1 in the (t + 1)-th stage, and \(\:{C}_{t+1}^{1},{C}_{t+1}^{2},\dots\:,{C}_{t+1}^{d}\) be a partition of Ct+1. For the concept space \(\:{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}}\), then we have$$\:\varvec{S}\varvec{i}\varvec{m}\_\varvec{C}\varvec{h}\varvec{u}\varvec{n}\varvec{k}({\varvec{C}}_{\varvec{t}+1},{\mathcal{G}}_{\varvec{k},\varvec{t}}^{{\varvec{S}}_{\varvec{\lambda\:}\left(\varvec{i}\right),\diamond\:}})=\bigcup\:_{\varvec{r}=1}^{\varvec{d}}{\varvec{\varTheta\:}}_{\varvec{r}},$$ where \(\:{{\Theta\:}}_{r}=Sim\_Chunk({C}_{t+1}^{r},{\mathcal{G}}_{k,t}^{{S}_{\lambda\:\left(i\right),\diamond\:}})\).Given a sample (xr, yr), its prediction value \(\:\widehat{k}\) can be obtained by Eq. (6). It means that the instance xr should be classified into the \(\:\widehat{k}\)-th class.Updating concept spacesFor updating concept spaces, the instances should be integrated into the existing concept spaces to achieve dynamic learning processes. Given a concept (\(\:{\stackrel{\sim}{\mathcal{H}}}^{c}{\stackrel{\sim}{\mathcal{F}}}^{c}\left({x}_{j}\right),{\stackrel{\sim}{\mathcal{F}}}^{c}\left({x}_{j}\right)\)), its intent can be rewritten as \(\:{\stackrel{\sim}{\mathcal{F}}}^{c}({x}_{j},a)(\forall\:a\in\:M)\).The Process of Updating Concept Space (PUCS). Given a threshold λ(i), let \(\:{\mathcal{G}}_{j-1}^{{S}_{\lambda\:\left(i\right)}}\) be the concept space in the (j-1)-th period. For any concept \(\:({X}_{j-1},{\stackrel{\sim}{B}}_{j-1})\in\:{\mathcal{G}}_{j-1}^{{S}_{\lambda\:\left(i\right)}}\), the process of updating concept space can be described as follows:If \(\:\stackrel{\sim}{B}\left(a\right)\le\:{\stackrel{\sim}{\mathcal{F}}}^{c}({x}_{j},a)(\forall\:a\in\:M)\), then \(\:({X}_{j},{\stackrel{\sim}{B}}_{j})=({X}_{j-1}\cup\:{x}_{j},{\stackrel{\sim}{B}}_{j-1})\); if \(\:\stackrel{\sim}{B}\left(a\right)>{\stackrel{\sim}{\mathcal{F}}}^{c}({x}_{j},a)(\forall\:a\in\:M)\), then \(\:({X}_{j},{\stackrel{\sim}{B}}_{j})=({X}_{j-1}\cup\:{x}_{j},{\stackrel{\sim}{\mathcal{F}}}^{c}({x}_{j},a\left)\right)\); otherwise \(\:{\mathcal{G}}_{j}^{{S}_{\lambda\:\left(i\right)}}\leftarrow\:\:\left(\right\{{x}_{j}\},{\stackrel{\sim}{\mathcal{F}}}^{c}({x}_{j},a\left)\right)\).In PUCS, if there exists an ordering relation between a new forming concept and any concepts from \(\:{\mathcal{G}}_{j-1}^{{S}_{\lambda\:\left(i\right)}}\), the concept space \(\:{\mathcal{G}}_{j-1}^{{S}_{\lambda\:\left(i\right)}}\) will be updated.