fixed additional langauge mistakes in evaluation

This commit is contained in:
hannes.kuchelmeister
2020-05-07 12:47:52 +02:00
parent 26655d68ac
commit eb4a543884

View File

@@ -219,29 +219,28 @@ This section gives an overview on the hypotheses tested during data analysis. Ea
\begin{itshape}
\label{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} More heterogeneous groups see a bigger increase in satisfaction than less heterogeneous groups when switching from a decision of a dictator to a decision made by the recommender.
\end{itshape} \medskip \\*
The assumption is made that in more heterogeneous groups the satisfaction with the dictator's decision is less. Therefore, there is a higher possible increase in satisfaction. A homogenous group that already satisfies all group members with the dictator's decision cannot see an increase in satisfaction, therefore, the assumption is made that with a higher number of people dissatisfied or neutral with the dictator's decision, more people will be be lifted into satisfaction and the increase in satisfaction will be bigger. However a group that has contradicting interest actually might not be able to reach high satisfaction levels.
The assumption is made that in more heterogeneous groups the satisfaction with the dictator's decision is less. Therefore, there is a higher possible increase in satisfaction. A homogenous group that already satisfies all group members with the dictator's decision cannot see an increase in satisfaction, therefore, the assumption is made that with a higher number of people dissatisfied or neutral with the dictator's decision, more people will be be lifted into satisfaction and the increase in satisfaction will be bigger. However a group that has divergent interests actually might not be able to reach high levels of satisfaction.
\end{hypothesis}
\begin{hypothesis}
\begin{itshape}
\label{hyp:Evaluation:StoreSizeBetterResults} A higher amount of stored finished configurations results in a higher number of satisfied and a lower number of dissatisfied group members when the recommender is used to make the group decision.
\end{itshape} \medskip \\*
This hypothesis is born by the fact that having a bigger pool of configurations to choose from increases the chances of having a good recommendation. This of course requires the assumption that aggregation strategies that pick recommendations pick configurations that also fare better in the chosen satisfaction metric. If that is not the case this hypothesis should not hold.
This hypothesis is based on the fact that the possibility to chose a bigger pool of configurations increases the chances of arriving at a good recommendation. This of course requires the assumption that aggregation strategies that pick recommendations pick configurations that also fare better in the chosen satisfaction metric. If this is not the case this hypothesis is not sustainable.
\end{hypothesis}
\begin{hypothesis}
\begin{itshape}
\label{hyp:Evaluation:AggregationStrategies} Multiplication and best average aggregation strategies perform better than least misery across the board.
\label{hyp:Evaluation:AggregationStrategies} The multiplication and best average aggregation strategies perform better than the least misery aggregation strategy.
\end{itshape} \medskip \\
Best average and multiplication are strategies that are performing best in some of the, by \citeauthor{Masthoff2015} \cite[~ 755f]{Masthoff2015}, listed online experiments. Therefore, it is reasonable to assume that they perform well here, too. Least misery was listed in some studies as performing worst. Therefore, there is an expectation of it faring less good than other group aggregation strategies.
Best average and multiplication are strategies that perform best in some of the by \citeauthor{Masthoff2015} \cite[~ 755f]{Masthoff2015} listed online experiments. Therefore, it is reasonable to assume that they perform well here too. Least misery was listed in some studies as performing worst. Accordingly, it is expected to fare less good than other group aggregation strategies.
\end{hypothesis}
\section{Results}
\label{sec:Evaluation:Findings}
\subsection{Threshold Center Selection}
In this section the goal is to find a $tc$ parameter for the analysis. This is needed to reduce dimensionality of data that has to be looked at and to get results of value. Therefore, all parameters except $tc$ will be fixed. The preference aggregation strategy looked at is multiplication because this strategy shows good results across the board when briefly looking at the generated data. The configuration database is used with all possible solutions (which is 148 in total). This results in a bigger visible effect in terms of satisfaction and dissatisfaction change as the recommender has access to all possible configurations and also provides more solid and predictable results. \autoref{fig:Evaluation:tcChange} shows the satisfaction change based on choice of $tc$. Of note is that the maxima of satisfaction change precedes the minima of dissatisfaction change for all group types. Maxima and minima occur at different tc values depending on the group type. Heterogeneous groups peek earliest while homogenous groups only show a peek towards the maximum $tc$ value. Changes in dissatisfaction are minimal even with $tc$ close to its maximum value for homogeneous groups. \autoref{fig:Evaluation:tcCount} shows the amount of group members satisfied and dissatisfied with the dictator's decision. The number of satisfied people decreases with an increasing $tc$ and its downward movement accelerates. The dissatisfaction curve shows a similar trend in reverse. Here the number of dissatisfied group members increases with an increase in $tc$. The curve accelerates its growth analogous to the acceleration of the satisfaction curve. The behaviour of heterogeneous groups and random groups is similar but the curve for heterogeneous groups shows less satisfaction and more dissatisfaction for a given tc. Also both curves have a negative satisfaction change when $tc$ reaches a certain height. Homogeneous groups only have satisfied group members for most $tc$ values but they decrease rapidly for values greater than $85$. Dissatisfied group members are at zero for the whole value range of $tc$ except a very slight upward tick at the end that is barely noticeable.
This section aims at finding a $tc$ parameter for the analysis. This is required to reduce the amount of data that has to be looked at and to get valuable results. For this purpose all parameters except $tc$ will be fixed. The preference aggregation strategy looked at is multiplication because this strategy shows good results across the board when briefly looking at the generated data. The configuration database is used with all possible solutions (which is 148 in total). This results in a bigger visible effect in terms of satisfaction and dissatisfaction change as the recommender has access to all possible configurations and also provides more solid and predictable results. \autoref{fig:Evaluation:tcChange} shows the satisfaction change based on choice of $tc$. Of note is that the maxima of satisfaction change precedes the minima of dissatisfaction change for all group types. Maxima and minima occur at different tc values depending on the group type. Heterogeneous groups peek earliest while homogenous groups only show a peek towards the maximum $tc$ value. Changes in dissatisfaction are minimal even with $tc$ close to its maximum value for homogeneous groups. \autoref{fig:Evaluation:tcCount} shows the amount of group members satisfied and dissatisfied with the dictator's decision. The number of satisfied people decreases with an increasing $tc$ and its downward movement accelerates. The dissatisfaction curve shows a similar trend in reverse. Here the number of dissatisfied group members increases with an increase in $tc$. The curve accelerates its growth analogous to the acceleration of the satisfaction curve. The behaviour of heterogeneous groups and random groups is similar but the curve for heterogeneous groups shows less satisfaction and more dissatisfaction for a given tc. Also both curves have a negative satisfaction change when $tc$ reaches a certain height. Homogeneous groups only have satisfied group members for most $tc$ values but they decrease rapidly for values greater than $85$. Dissatisfied group members are at zero for the whole value range of $tc$ except a very slight upward tick at the end that is barely noticeable.
\begin{figure}
\centering
@@ -250,7 +249,7 @@ In this section the goal is to find a $tc$ parameter for the analysis. This is n
\label{fig:Evaluation:tcChange}
\end{figure}
\autoref{hyp:Evaluation:MaximumMinimum} states that the highest satisfaction change is expected at places where the overall satisfaction with the dictator's decision is close to two. However, the data shows a slightly different result. This hypothesis does not hold true. When looking at the data we see peeks in satisfaction change when values are equal to $2.81, 2.51$ and $3$ (heterogeneous, random, homogenous). Therefore, the expectation does not hold up. Most likely this happens because at lower satisfaction numbers with the dictator's decision the threshold for satisfaction is set too high which causes the group compromise to classify less group members as satisfied. Moreover, valleys for dissatisfaction change are also not at the expected value of \textit{two}. They are instead at $1.19, 1.49, 0.04$ (heterogeneous, random, homogenous). Here the valleys are lower than expected. However, the data from homogenous groups seems to be cut of. Therefore, a judgement for homogenous groups is difficult and with slightly less heterogeneous groups this graph should show bigger effects.
\autoref{hyp:Evaluation:MaximumMinimum} states that the highest satisfaction change is expected at places where the overall satisfaction with the dictator's decision is close to two. However, the data shows a slightly different result. This hypothesis does not hold true. When looking at the data we see peeks in satisfaction change when values are equal to $2.81, 2.51$ and $3$ (heterogeneous, random, homogenous). Therefore, the expectation does not hold up. Most likely this happens because at lower satisfaction numbers with the dictator's decision the threshold for satisfaction is set too high which causes the group compromise to classify less group members as satisfied. Moreover, valleys for dissatisfaction change are also not at the expected value of \textit{two}. They are instead at $1.19, 1.49, 0.04$ (heterogeneous, random, homogenous). Here the valleys are lower than expected. However, the data from homogenous groups seems to be cut off. Therefore, a judgement for homogenous groups is difficult and with slightly less heterogeneous groups this graph should show bigger effects.
\begin{figure}
\centering
@@ -259,11 +258,11 @@ In this section the goal is to find a $tc$ parameter for the analysis. This is n
\label{fig:Evaluation:tcCount}
\end{figure}
The predicted trend that a higher $tc$ results in a lower satisfaction and a higher dissatisfaction with the dictator's decision, as predicted by \autoref{hyp:Evaluation:HigherTcLessSatisfied}, can be clearly seen in \autoref{fig:Evaluation:tcCount} and has been described in this section already. This means for the evaluation that the behaviour of the recommender is predictable and suggests that used metrics are modelling behaviour expected in reality.
The predicted trend that a higher $tc$ results in a lower satisfaction and a higher dissatisfaction with the dictator's decision, as predicted by \autoref{hyp:Evaluation:HigherTcLessSatisfied}, can be clearly seen in \autoref{fig:Evaluation:tcCount} and has already been described in this section. This means for the evaluation that the behaviour of the recommender is predictable and suggests that the used metrics are modelling behaviour expected in reality.
\autoref{hyp:Evaluation:OnlyOneSatisfied} predicts that the satisfaction with the individual decision eventually reaches one and that no one is satisfied with the group recommender's decision. This means the satisfaction change should reach minus one. \autoref{fig:Evaluation:tcCount} shows a downward trend that comes close to one for heterogeneous and random groups. Therefore, the trend suggests that the hypothesis holds with regard to heterogeneous and random groups but as the drop for homogenous groups just reaches below $2.8$ suggesting that the hypothesis does not hold for homogenous groups. Also, satisfaction change in heterogeneous groups reaches close to minus one but this value is neither reached by random groups, nor by homogenous groups. The hypothesis therefore holds true only for heterogeneous groups. A likely cause why it does not seem to hold true for random or homogenous groups is that as the highest tc value still includes multiple configurations and a recommended configuration keeps some group members satisfied for some of the time. Also possibly for random groups another group member than the dictator could be satisfied with the group decision.
\autoref{hyp:Evaluation:OnlyOneSatisfied} predicts that the satisfaction with the individual decision eventually reaches one and that no one is satisfied with the group recommender's decision. This means the satisfaction change should decrease to minus one. \autoref{fig:Evaluation:tcCount} shows a downward trend that comes close to one for heterogeneous and random groups. Therefore, the trend suggests that the hypothesis holds true with regard to heterogeneous and random groups but the drop for homogenous groups just reaches below $2.8$ suggesting that the hypothesis does not hold for homogenous groups. Also, satisfaction change in heterogeneous groups decreases close to minus one while this value is neither fully reached by random groups nor by homogenous groups. The hypothesis therefore holds true only for heterogeneous groups. A likely cause why it does not seem to hold true for random or homogenous groups is that the highest tc value still includes multiple configurations and a recommended configuration keeps some group members satisfied for some of the time. For random groups it may also be possible that a group member other than the dictator could be satisfied with the group decision.
During a group decision it is better to make one less person dissatisfied opposed to one more person satisfied. Therefore, this thesis uses $tc$ values that are closer to the minima of dissatisfaction change than to the maxima of satisfaction change. The minima for heterogeneous groups is at $tc = 70\%$, therefore, this is the chosen value for evaluation of the remaining hypotheses. This is needed because otherwise analysis would be infeasible due to the parameter space being too large. For random groups the minima of dissatisfaction change can be found at $tc = 85\%$ which is the value used for all following analysis of random groups. For homogenous group dissatisfaction change is decreasing until the highest possible value of $tc$ is reached. Because of that $tc = 94\%$ is used for analysis.
During a group decision it is better to make one less person dissatisfied than to make one more person satisfied. Therefore, this thesis uses $tc$ values that are closer to the minima of dissatisfaction change than to the maxima of satisfaction change. The minima for heterogeneous groups is at $tc = 70\%$, therefore, this is the chosen value for evaluation of the remaining hypotheses. This is needed because otherwise analysis would be infeasible due to a too large parameter space. For random groups the minima of dissatisfaction change can be found at $tc = 85\%$ which is the value used for all following analyses of random groups. For homogenous group dissatisfaction change is decreasing until the highest possible value of $tc$ is reached. Because of that $tc = 94\%$ is used for analysis.
\subsection{Recommender Performance Analysis}