README.md 6.54 KB
Newer Older
Philipp Mueller's avatar
Philipp Mueller committed
1
# multimediate22_baselines_internal
Philipp Mueller's avatar
Philipp Mueller committed
2

Philipp Mueller's avatar
Philipp Mueller committed
3
Baseline implementations for MultiMediate'22.
Philipp Mueller's avatar
Philipp Mueller committed
4
In the following we will introduce each baseline and report the corresponding achieved performances.
Philipp Mueller's avatar
Philipp Mueller committed
5
6


Philipp Mueller's avatar
Philipp Mueller committed
7
# Backchannel Detection
Philipp Mueller's avatar
Philipp Mueller committed
8

Philipp Mueller's avatar
Philipp Mueller committed
9
## Method
Philipp Mueller's avatar
Philipp Mueller committed
10

Philipp Mueller's avatar
Philipp Mueller committed
11
12
13
14
Our baseline approach for backchannel detection makes use of features extracted with OpenFace[1] and OpenPose[2].
We extract features from the last second of the 10 second input window and aggregate each feature channel by computing the mean and/or mean of absolute differences ("mean delta") of adjacent frames over this second.  
Features from OpenFace are mean and mean delta of the AU intensity estimates, along with mean delta in gaze and head pose features (gaze_0_x, gaze_0_y, gaze_0_z, gaze_1_x, gaze_1_y, gaze_1_z, pose_Tx, pose_Ty, pose_Tz, pose_Rx, pose_Ry, pose_Rz - according to the OpenFace output format).  
From pose estimates obtained by OpenPose, we compute the mean delta of upper body- and head angles used in [3].
Philipp Mueller's avatar
Philipp Mueller committed
15
16
We furthermore extracted the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) [4] on the last second of
the input window. This set consists of 88 acoustic parameters.
Philipp Mueller's avatar
Philipp Mueller committed
17

Philipp Mueller's avatar
Philipp Mueller committed
18
We train an SVM with rbf kernel and tune C via cross-validation on the trianing set.
Philipp Mueller's avatar
Philipp Mueller committed
19
20


Philipp Mueller's avatar
Philipp Mueller committed
21
## Results
Philipp Mueller's avatar
Philipp Mueller committed
22

Philipp Mueller's avatar
Philipp Mueller committed
23
We report results on the validation and test sets.
Philipp Mueller's avatar
Philipp Mueller committed
24
The "Head" featureset includes all features based on OpenFace, the "Pose" featuresets of all features based on OpenPose output.
Philipp Mueller's avatar
Philipp Mueller committed
25

Philipp Mueller's avatar
Philipp Mueller committed
26
27
28
29
30
31
32
33
34
35
36
37
| Featureset           | ACC (val)  | ACC (test) |
| -------------------- | ---------- | ---------- |
| Head                 | 0.621      | NA         |
| Head (AUs only)      | 0.591      | NA         |
| Head (Head Pose only)| 0.636      | NA         |
| Head (Gaze only)     | 0.622      | NA         |
| Pose                 | 0.531      | NA         |
| Voice                | 0.567      | NA         |
| Head + Pose          | 0.639      | 0.596      |
| Head + Pose + Voice  | 0.636      | 0.592      |
| -------------------- | ---------- | ---------- |
| Random Baseline      | 0.500      | 0.500      |
Philipp Mueller's avatar
Philipp Mueller committed
38
39


Philipp Mueller's avatar
Philipp Mueller committed
40
# Agreement Estimation
Philipp Mueller's avatar
Philipp Mueller committed
41

Philipp Mueller's avatar
Philipp Mueller committed
42
## Method
Philipp Mueller's avatar
Philipp Mueller committed
43

Philipp Mueller's avatar
Philipp Mueller committed
44
45
For an agreement estimation baseline, we use the same approach as for backchannel detection.
The only difference is the use of a support vector regressor instead of a classifier.
Philipp Mueller's avatar
Philipp Mueller committed
46

Philipp Mueller's avatar
Philipp Mueller committed
47
## Results
Philipp Mueller's avatar
Philipp Mueller committed
48

Philipp Mueller's avatar
Philipp Mueller committed
49
We report results on the validation and test sets in mean squared error (MSE). 
Philipp Mueller's avatar
Philipp Mueller committed
50
The mean baseline is the MSE of the mean computed on the training set.
Philipp Mueller's avatar
Philipp Mueller committed
51
52
The agreement estimation task is hard: only features extracted from the head can improve over the trivial baseline.

Philipp Mueller's avatar
Philipp Mueller committed
53
54
55
56
57
58
59
60
61
62
63
64
| Featureset           | MSE (val)  | MSE (test) |
| -------------------- | ---------- | ---------- |
| Head                 | 0.079      | NA         |
| Head (AUs only)      | 0.085      | NA         |
| Head (Head Pose only)| 0.075      | 0.061      |
| Head (Gaze only)     | 0.078      | NA         |
| Pose                 | 0.086      | NA         |
| Voice                | 0.085      | NA         |
| Head + Pose          | 0.079      | NA         |
| Head + Pose + Voice  | 0.079      | 0.064      |
| -------------------- | ---------- | ---------- |
| Mean Baseline        | 0.085      | 0.066      |
Philipp Mueller's avatar
Philipp Mueller committed
65
66


Philipp Mueller's avatar
Philipp Mueller committed
67
# Eye Contact Detection
Philipp Mueller's avatar
Philipp Mueller committed
68

Philipp Mueller's avatar
Philipp Mueller committed
69
## Method
Philipp Mueller's avatar
Philipp Mueller committed
70

Philipp Mueller's avatar
Philipp Mueller committed
71
72
73
Our baseline approach to eye contact detection consists of a RBF-SVM on top of features extracted with OpenFace. All features are extracted from the last frame of the input window.
The gaze features are (according to OpenFace output specification): gaze_0_x, gaze_0_y, gaze_0_z, gaze_1_x, gaze_1_y, gaze_1_z  
The pose features are: pose_Tx, pose_Ty, pose_Tz, pose_Rx, pose_Ry, pose_Rz  
Philipp Mueller's avatar
Philipp Mueller committed
74

Philipp Mueller's avatar
Philipp Mueller committed
75
We train a separate SVM for each seating position. And tune the C parameter via cross-validation on the training set.
Philipp Mueller's avatar
Philipp Mueller committed
76

Philipp Mueller's avatar
Philipp Mueller committed
77
## Results
Philipp Mueller's avatar
Philipp Mueller committed
78

Philipp Mueller's avatar
Philipp Mueller committed
79
We report results on the validation and test sets.
Philipp Mueller's avatar
Philipp Mueller committed
80

Philipp Mueller's avatar
Philipp Mueller committed
81
82
83
84
85
86
87
| Featureset        | ACC (val)  | ACC (test) |
| ----------------- | ---------- | ---------- |
| Gaze + Headpose   | 0.64       | 0.576      |
| Gaze              | 0.61       | NA         |
| Headpose          | 0.62       | NA         |
| ----------------- | ---------- | ---------- |
| Most likely class | 0.33       | 0.26       |
Philipp Mueller's avatar
Philipp Mueller committed
88
89


Philipp Mueller's avatar
Philipp Mueller committed
90
# References
Philipp Mueller's avatar
Philipp Mueller committed
91

Philipp Mueller's avatar
Philipp Mueller committed
92
93
94
[1] Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L. P. (2018). Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 59-66). IEEE.  
[2] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299).  
[3] Beyan, C., Katsageorgiou, V. M., & Murino, V. (2017). Moving as a leader: Detecting emergent leadership in small groups using body pose. In Proceedings of the 25th ACM international conference on Multimedia (pp. 1425-1433).  
Philipp Mueller's avatar
Philipp Mueller committed
95
[4] Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., André, E., Busso, C., ... & Truong, K. P. (2015). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing, 7(2), 190-202.
Philipp Mueller's avatar
Philipp Mueller committed
96
97


Philipp Mueller's avatar
Philipp Mueller committed
98
## Running the code
Philipp Mueller's avatar
Philipp Mueller committed
99

Philipp Mueller's avatar
Philipp Mueller committed
100
Required dependencies: numpy, pandas, scikit-learn, joblib
Philipp Mueller's avatar
Philipp Mueller committed
101

Philipp Mueller's avatar
Philipp Mueller committed
102
103
104
105
In backchannel_config.py and eye_contact_config.py, you need to set paths to three different directories:  
LABEL_DIR: points to the directory containing the label files.  
FEATURE_DIR: points to the directory of the openpose and openface estimates (as they are supplied by us for download with the dataset)  
INTERMEDIATE_RESULTS_DIR: a directory where models and training results are saved.  
Philipp Mueller's avatar
Philipp Mueller committed
106

Philipp Mueller's avatar
Philipp Mueller committed
107
108
109
110
111
112
113
114
115
116
117
118
Please contact me in case you have any issues running the code or reproducing the results!


# License

Copyright 2022 Philipp Müller

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.