A generic self-learning emotional framework for machines

Application of the framework on a practical case studyHere we detail the exact step-by-step procedures followed to obtain the results described in ‘Results’ in the classic RL environment chosen, which can be applied to other RL setups with little modification.Learning emotions from experiencePre-training of a conventional RL agent. For simplicity, the offline learning approach was chosen, in which the emotional model is trained with the experiences collected by an already competent non-emotional agent. The open-source library OpenAI’s Spinning Up88, compatible with OpenAI’s Gym71, was chosen because its modular and well-documented implementation of RL algorithms facilitated the extensions required for the experiments.The chosen method, actor-critic PPO (Proximal Policy Optimization)89, from the policy-gradient family, is broadly used for its stability during training, avoiding too large policy updates. Its training learns both a policy \(\pi\) (the actor) and a value function v (the critic).The non-emotional agent was trained to solve the task (average episode reward \(\ge\) 200 over 100 consecutive episodes), with these features:

Agent: Actor-critic PPO model, artificial neural network architecture: (64, 64), activation function: rectified linear unit (ReLU), seed: 10;

Hyperparameters: gamma: 0.99, lambda: 0.97, policy learning rate: 0.0003, state-value function learning rate: 0.001, target Kullback-Leibler (KL): 0.01.

The architectures explored involved combinations of 1-2-3 hidden layers, 32-64-128 neurons per layer, ReLU/tanh activation functions, and varied seeds.Dataset generation. Selection of input values and emotional window. The trained agent was run on unseen scenarios to obtain a representative dataset of 60 episodes as MTS with stepwise values for a broad set of potential variables (reward, state-value, temporal difference, average reward, exponential moving average reward, and cumulative reward).Upon review of the recorded episode dynamics and MTS, an Order III target mapping was chosen, for which only reward and state-value were required (see discussion of alternatives in the ‘Theoretical framework’, ‘Emotional orders’). (We anticipate that this mapping may perform well in a large variety of setups for its potential to capture a broad range of short-term, fundamental emotions with moderate, addressable complexity.) A tentative value for the emotional window was set at 20, expected to suffice to capture instantaneous emotions, and later corroborated by results.The resulting dataset contained 20,220 20-step long sequences from the recorded MTS (training/test split = 16,281/3939). The two-variables (reward, state-value) were z-score normalized for training based on training-set statistics, thus establishing their average values as homeostatic references.Training of the emotional model from dataset sequences. A 1D-Convolutional Autoencoder-a type of deep autoencoder (DAE)72 whose architecture is suitable for time series-was chosen for the task of representation learning (as explained in ‘Supplementary information’, ‘Model architecture’). The model was trained and tested on the dataset in an unsupervised manner to reproduce 20-step \(\times\) 2 values normalized MTS sequences by learning their latent representation in a low-dimensional latent space90.We used the Keras/TensorFlow library91, defining an encoder and a decoder with these features: encoding_dim = 5, l1_filters = 32, l1_kernel_size = 5, l1_strides = 2, l2_filters = 16, l2_kernel_size = 5, l2_strides = 2, padding = ‘same’, activation = ‘relu’. The separate encoder, used for emotion elicitation, consisted of 3,333 learned parameters. The training took 29 epochs with batch_size = 10 and validation_split = 0.1.Various architectures and parameters were tested, including different encoding dimensions. Among these, an encoding dimension of 5 provided the best balance between reproduction root mean squared error (RMSE) and compression ratio. For this configuration, the RMSE values were 2.97119 for rewards and 4.32781 for state-values, with a compression ratio of 5:40. Although an encoding dimension of 10 resulted in lower RMSE values (2.48047 for rewards and 4.21228 for state-values), it produced a less favorable compression ratio of 10:40.As noted, the model was not designed for input regeneration or denoising, but rather to capture high-level trends and magnitudes within the observed sequences. For context, the reward time series ranged from [− 100, 100], with a mean of 0.67 (standard deviation: 5.60), while the state-value time series spanned from [− 12.37, 109.51], with a mean of 62.41 (standard deviation: 18.99).Elicitation of emotionsThe trained emotional encoder obtained was used to encode the ongoing sequences of latest observed values from the full dataset, obtaining their 20,220 latent representations in the 5-dimensional latent emotional space. This was the emotional spectrum data used for interpretation. (The integration of the emotional encoder within the extended RL actor-critic architecture introduced in Fig. 1c was not addressed in this experiment.)Interpretation of the learned emotionsClustering of the emotional spectrum. To identify the distinct dynamic patterns in the emotional spectrum captured, a probabilistic Gaussian mixture model was trained on the latent space learned. This method models data as a mixture of a number of Gaussian distributions, capturing its covariance structure in clusters of uneven spatial extents, which suited the nature of the problem.For the clustering of the latent space, we trained a probabilistic Gaussian mixture model from the Scikit-learn library92 and found the most promising clustering distributions to consist of 7 or 8 clusters, with minimal BIC scores (Bayes Information Criterion) and sufficient differentiation, although somewhat dependent on the initial random seed. The final choice was arbitrary, following practical experimentation on real sequences, and settled on 8 components with covariance type = ‘full’ (assigning to each component its own general covariance matrix). This approach exhibited satisfactory performance, thereby obviating the need for automation. The eight resulting classes, along with the average multivariate sequence representing their corresponding cluster centroids, are shown in Fig. 3a.Selection and validation of the interpretability mapping. To map the eight resulting clusters with familiar emotion terms, the LOVE 2:5×5 mapping was initially tried (corresponding to the Order III emotional spectrum learned, with two values: reward and state-value (Fig. 2c), and then extended to LOVE 2:5×6 for a more accurate denomination of emotions 3 and 7 (Fig. 3b). This allowed the distinction between anger and fear based on the individual’s appraisal on its certainty and control on future outcomes (with anger associated with a decrease to average, and fear with a decrease to negative70).The final terms for the eight emotions learned in this use case, as well as for the full set of thirty, were theoretically validated as described in ‘Theoretical validation of LOVE profile terms’, verified in live simulations and, finally, experimentally contrasted with external references (see ‘Experimental validation of the learned emotions with humans’).Attribution of emotion terms. Based on the LOVE 2:5×6 mapping, the eight patterns were analytically associated with the following best-matching emotion terms (see Fig. 3a,b; note that the y-axis of the eight learned patterns is in logarithmic scale):

Cluster 0: Distress (reward: below-average values; expectation: negative, well below \(-\sigma\) values). The high variance of reward reflects very uneven values, which was deemed as subjectively negative due to the well-studied loss aversion principle (the pain of a loss is felt by individuals twice as intensively as the pleasure of an equivalent gain93).

Cluster 1: Optimism (reward: average values; expectation: positive values around \(+\sigma\)).

Cluster 2: Neutral/slight concern (reward: average values; expectation: below-average values). The most frequent emotion in this always uncertain environment (29.8% of the samples) falls closer to neutral than to concern, but the low expectation pattern justifies the compound naming.

Cluster 3: Satisfaction (reward: increased values; expectation: decreased-to-average values). The least frequent emotion, triggered upon reception of a significant reward, with expectations decreasing accordingly

Cluster 4: High optimism (reward: average values; expectation: well above \(+\sigma\) values). Technically, both 1 and 4 match Optimism, but expectations in 4 significantly exceed \(+\sigma\).

Cluster 5: Concern (reward: average values; expectation: negative, well below \(-\sigma\) values).

Cluster 6: Excitement (reward: average values; expectation: increased values).

Cluster 7: Fear (reward: average values; expectation: decreased to negative values).

Finally, the stepwise probabilities predicted by the emotional encoder were smoothened for more stable cluster attributions and easier external interpretation: a moving average of 5 steps on values and a minimal probability of 0.9 as reclassification threshold (or, alternatively, a minimal number of consecutive attributions of 10).Visualization. To visualize the learned emotional space in both 2D and 3D, we used t-SNE (T-distributed Stochastic Neighbor Embedding) (see Fig. 3c for 2D and Section ‘Data availability’ for a 3D animation). For the 2D representation, the Scikit-learn library92 was employed with the following parameters: seed = 90, n_components = 2, perplexity = 200, init = ’pca’, and n_iter = 2000.Although not technically required by the methodology, the use of colors to differentiate the classes learned by the emotional encoder was instrumental in the final selection of the autoencoder.Theoretical validation of LOVE profile termsThe principles described by the theoretical framework provide an initial foundation to associate LOVE profiles (idealized patterns of the latest values) with the best possible emotion terms in human language (see ‘Interpretation of the learned emotions’). However, given the difficulty of the task, their coherence was validated and refined both theoretically and experimentally. For the former, a sequence-coherence test was run, following this methodology:

1.

Attribute an initial term to each of the 30 profiles, based on said theoretical principles.

2.

Run offline simulations of event-guided, plausible emotional sequences where each profile is a state:

Start from a stable state (for example, neutral).

Try different sequences involving positive/negative evolutions of rewards and state-value (or expectations), with brief explanatory narratives.

Discard beyond-scope transitions (such as positive \(\rightarrow\) increased, or negative \(\rightarrow\) decreased).

Avoid too many abrupt transitions (positive \(\rightarrow\) negative, negative \(\rightarrow\) positive).

End in stable or already visited states.

3.

Review the resulting term sequences and repeat step 2 till fully natural transitions are obtained in all cases, leaving no pattern unused (ideally a few times).

For step 2, the sequence-coherence test was iterated and refined over thirty-eight simulated emotional sequences with full profile coverage, such as:
Neutral \(\rightarrow\) (An opportunity arises…) \(\rightarrow\) Excitement \(\rightarrow\) (and seems to hold.) \(\rightarrow\) Optimism \(\rightarrow\) (Suddenly the opportunity vanishes…) \(\rightarrow\) Anger \(\rightarrow\) (and we are back to normal.) \(\rightarrow\) END.
For additional clarity, a flow chart illustrating some examples of simulated emotional sequences is included in ‘Extended data’ under this same title.Experimental validation of learned emotions with humansThe following methodology was used for the emotional attribution survey and its ensuing mapping to psychology literature references.Emotional attribution test with humansDataset. A representative list of 3–6 s long sequences was automatically selected from the dataset in which one specific learned emotion clearly prevailed over the others (six for each of the eight learned emotions, totalling 48 sequences at different stages of the landing maneuver). This guaranteed equal representation of all eight emotions (which presented some difficulty in the case of 3, satisfaction, the least frequent emotion, often smoothened out at sequence end by the probability smoothing applied).To reduce the effect of fatigue on raters, the test was randomly split in two evenly-distributed lists of 24 videos (A and B), the sequence order further randomized in two versions each, and raters assigned alternating versions (A1, A2, B1, B2, A1, etc.).The tests with Lang’s SAM manikin. All participants were adult volunteers who were native Spanish speakers, recruited from diverse academic and professional backgrounds. A subset of 26 participants received university credits as acknowledgment for their involvement. The study was conducted online, in Spanish language, and after registering and giving legal consent, an introduction was displayed explaining the dynamics of the study, the information shown during the sequences, the scoring system and the mission of the agent trying to land on a lunar base, described as a life-or-death task.Once familiarized with the agent’s task, Lang’s Self-Assessment Manikin (SAM)74, a test extensively applied in psychological experiments and market research, was introduced to the participants. They were allowed to try it on two practice videos, characterizing a successful and a failed landing, whose respective results were discarded.Finally, the subjects proceeded to the test, and for each of the 24 videos, reproduced on a separate screen, they were asked to describe the emotions they would associate to the state of the pilot at the end of each sequence using SAM. The test had no time limit, the videos could be played as many times as desired and ratings could be reviewed before the final submission.SAM is a pictorial assessment technique that directly measures emotional responses on three main dimensions: pleasure, arousal and dominance75, associated with a person’s affective reaction to a wide variety of stimuli, typically by rating each dimension from 1 to 9 on a Likert scale. Some SAM tests incorporate a collection of words positioned at the relevant end of each Semantic Differential scale to identify the anchors of each dimension to the subject94. These original terms95 were predominantly translated to Spanish from the version by Gurbindo96, supplemented with nuanced contributions derived from the French version by Detandt97. The final terminology and set-up used is shown in ‘Supplementary information’.Statistical significance. The study engaged raters aged between 18 and 64 (n = 96), with a fairly even split of 53 males and 43 females. The majority of participants (89) held a University degree, reflecting a diverse pool of individuals with varied educational backgrounds for comprehensive statistical analysis.Test reliability. In order to assess the reliability of the ratings, the ICC2k statistic was used (Intraclass Correlation Coefficient, two-way random effects model, absolute agreement). The pleasure and dominance dimensions obtained ‘excellent’ correlation rates according to the orientative criteria by (Koo, 2016)98 (greater than 0.90) and (Cicchetti, 1994)99 (greater than 0.75), while arousal achieved ‘good’ per one guideline (between 0.75 and 0.90) and ‘excellent’ per another (greater than 0.75) (see Table in ‘Extended data’).PAD values attributed to the 48 videos. Each of the 48 sequences was assigned a pleasure, arousal and dominance (PAD) triad of values, obtained as an average of the registered values, as shown in Fig. 6a. Tables with the obtained values and Pearson correlation values are included in ‘Extended data’.PAD values attributed to the eight learned emotions. Similarly, each of the eight emotions was assigned a PAD triad of values as an aggregation over all raters from its six corresponding videos. The results can be visualized in Fig. 6b, and their values and correlations in ‘Extended data’.Distinguishability of the learned emotions. To accurately validate the distinguishability of the eight emotions from their assigned PAD values, the robust Hotelling’s T-squared statistical test was used, comparing the distribution of each pair of multivariate samples.Mapping versus documented experimental accountsThe PAD values of each learned emotion, obtained from human ratings, were compared to select pivotal experimental findings in psychology literature. From a diverse array of studies and reports, priority was given to those offering well-documented values for the three referential dimensions, showcasing the highest significance and impact within their field. The references chosen, all of them detailing mean and standard deviation for all three dimensions, were:

1.

Russell-Mehrabian (1977) [RM]75: 151 emotional states. The terms seem to have been selected by the authors.

2.

Bradley-Lang (1999) [BL]76: Affective Norms for English Words (ANEW), including 1,034 terms. The list contains very heterogeneous terms (like abduction, abortion, absurd, abundance, etc.) along with actual emotions.

3.

Redondo (2007) [RE]77: Spanish ANEW; 1,034 Spanish words corresponding to the original ANEW with newly obtained PAD values.

4.

Landowska (2018) [LA]78: ANEW-MEHR; 112 words selected from the Russell & Mehrabian’s list with the PAD values from ANEW.

5.

Scott (2019) [SC]79: Glasgow Norms, including 5553 words. The list contains very heterogeneous terms (like abattoir, abbey, abbreviate, abdicate, etc.) along with actual emotions.

The task required overcoming a number of difficulties; firstly, we found a high degree of discrepancy in the terms included, heterogeneity of the emotional scopes, abundance of non-emotion terms, and other arbitrary peculiarities.

Arbitrariness: the terminologies chosen by the authors, far from conforming a shared, standard set of emotions, often seemed inconsistently artificious (such as weary with responsibility, quietly indignant, proud and lonely, snobbish and lonely, in RM), or nuanced (for instance, angry and angry but detached; hostile and hostile but controlled in RM). All these terms were kept, despite producing somewhat heterogeneous top-match lists.

Heterogeneity of the emotional scopes: All authors seamlessly mingled complex affects (like social, moral, self-conscious) with primary, instantaneous or more basic emotions (like fright, anxious, euphoria). For the purpose of tagging this agent’s task (short life-or-death landing maneuvers), we did not map the emotions associated with social relationships, moral judgements or self-conscious reflections, along with a few overly redundant or vague ones and one case of bodily needs (including guilty, kind, repentant, lonely, hungry in RM and LA; unfaithful, loyal, insolent, admired in BL and RE; dignity, paranoid, emotional or achievement and achieved, frightened and fright in SC).

Non-emotions: The scope of some references was not limited to emotion terms, including all sorts of concepts (such as butter, cemetery, chair in BL and RE; abdominal, apple (fruit) or musketeers in SC), which were not used for emotional interpretation.

The terminology for the three dimensions has also historically differed among authors (namely, pleasure/valence; arousal/activation; dominance/control); however, we found that the more traditional (pleasure-arousal-dominance) model was easier to articulate and comprehend for non-expert participants.As for the methodologies followed by the authors to obtain the PAD values, they also differed in format and profile of the participants, which probably contributed to the variance found in the reported values. For example, the different values reported for ‘angry’ (within the range of [− 1, 1]) are as follows:

RM: (− 0.510, 0.590, 0.250)

BL, LA: (− 0.538, 0.543, 0.138)

RE: (− 0.700, 0.403, − 0.290)

SC: (− 0.652, 0.227, 0.105)

Finally, despite all authors reporting standard deviations and number of samples, the lack of covariance matrices limited the applicability of standard statistical tests to compare two distributions, like Hotelling T-squared. To address this, we applied three different methods to map the PAD distributions obtained from our test for each emotion (sample 1) against reported PAD mean and standard deviation values (sample 2), often with unequal results in each table:

Method 1: Hotelling T-squared test, assuming three independent variables in sample 2 (diagonal covariance matrix).

Method 2: Hotelling T-squared test, assuming sample 2 had the same covariance matrix as sample 1.

Method 3: Euclidean distance, comparing only the means.

To obtain the final mapping of each learned emotion, drawing inspiration from the ensemble of models concept, we independently mapped it against each reference table, identifying its three top matches, and then we merged the five top matches across authors into a semantic collage (see details in ‘Extended data’).

Hot Topics

Related Articles