diff --git a/30_Thesis/sections/60_evaluation.tex b/30_Thesis/sections/60_evaluation.tex index f94bc75..83a66ae 100644 --- a/30_Thesis/sections/60_evaluation.tex +++ b/30_Thesis/sections/60_evaluation.tex @@ -179,29 +179,57 @@ When evaluating a subset of stored finished configurations it is important to av \section{Hypotheses} \label{sec:Evaluation:Hypotheses} -Understanding data is made easier by first posing hypotheses. This section gives an overview over the hypothesis used during data analysis. +This section gives an overview over the hypothesis used during data analysis. First a hypothesis is posed followed by its explanation. -\begin{enumerate}[font={\bfseries},label={H\arabic*}] - \item \label{hyp:Evaluation:MaximumMinimum} Highest improvements with group recommendation are when the amount of people satisfied with the dictator's decision is slightly lower than two. Respectively that holds true for dissatisfaction. - \item \label{hyp:Evaluation:HigherTcLessSatisfied} A higher $tc$ value results in less satisfied people and more unsatisfied people with regard to the dictator's decision. - \item \label{hyp:Evaluation:OnlyOneSatisfied} There exists a $tc$ value which causes only one person to be satisfied with the dictator's decision and no one is satisfied with the group recommender's decision. - \item \label{hyp:Evaluation:HomogenousMoreSatisfied} Homogeneous groups have more satisfied members with the recommender's decision but also with the dictator's decision compared to heterogeneous groups. - \item \label{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} More heterogeneous groups see a bigger satisfaction increase than less heterogeneous groups when switching from the decision of a dictator to a decision made by the recommender. - \item \label{hyp:Evaluation:StoreSizeBetterResults} A higher amount of stored finished configurations results in a higher amount of satisfied and a lower amount of dissatisfied group member. - \item \label{hyp:Evaluation:AggregationStrategies} Multiplication and best average aggregation strategies perform better than least misery. % -\end{enumerate} +\begin{hypothesis} + \begin{itshape} + \label{hyp:Evaluation:MaximumMinimum} Highest improvements with group recommendation are when the amount of people satisfied with the dictator's decision is slightly lower than two. Respectively that holds true for dissatisfaction. + \end{itshape} \medskip \\* + This expectation is made because the assumption is made that in a real situation a group of four with having a few less than two satisfied members on average (with a dictator's decision) has enough room for improvement so that potentially three group members can be satisfied after the use of the recommender. Meaning that at least one more person is satisfied with the compromise. Potentially in some groups it might even be possible to then lift the last person from dissatisfaction towards a neutral attitude. A higher base satisfaction is assumed to reduce the possibility to make an additional group member satisfied. +\end{hypothesis} -These hypotheses require some explanation about the reasoning behind them. -\begin{description} - \item[\hyporef{hyp:Evaluation:MaximumMinimum}] This expectation is made because the assumption is made that in a real situation a group of four with having a few less than two staisfied members on average (with a dictator's decision) has enough room for improvement so that potentially three group members can be satisfied after the use of the recommender. Meaning that at least one more person is satisfied with the compromise. Potentially in some groups it might even be possible to then lift the last person from dissatisfaction towards a neutral attitude. A higher base satisfaction is assumed to reduce the possibility to make an additional group member satisfied. - \item[\hyporef{hyp:Evaluation:HigherTcLessSatisfied}] A higher $tc$ value causes a person to be unsatisfied with a higher amount of configurations. Also it causes a person to be satisfied with less configurations. Therefore recommending a random configuration causes the chance of making an individual satisfied sink while increasing the chance of that person being unsatisfied. Already the change in probability leads to the assumption that this should be seen with non random recommendations too. - \item[\hyporef{hyp:Evaluation:OnlyOneSatisfied}] A $tc$ value that reaches a high enough level eventually should make only the dictator herself satisfied with the dictator's decision. The bar for satisfaction lies so high that any group recommendation will cause the dictator to also be not satisfied or at least neutral with the group decision. This can be understood as that in a group where nobody is willing to compromise everyone is only satisfied with one's own decision. Having two members with identical interest of course results in this effect not being present but this is expected to be rare for a group size of four. - \item[\hyporef{hyp:Evaluation:HomogenousMoreSatisfied}] As the interest in homogenous groups are more aligned there is an expectation that the overall hapiness levels for more homogenous groups is higher. If the base level is higher already it is likely that even just a slight increase lifts recommendations for homogenous groups to satisfaction levels not reachable by heterogeneous groups. - \item[\hyporef{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease}] The assumption is made that in more heterogeneous groups the satisfaction with the dictator's decision is less. Therefore there is a higher possible increase in satisfaction. A homogenous group that already satisfies all group members with the dictator's decision cannot see an increase in satisfaction therefore the assumption is made, that with a higher amount of people dissatisfied and not satisfied with the dictator's decision, there will be more people that can be lifted into satisfaction and therefore the increase will be bigger. However a group that has contradicting interest actually might not be able to reach high satisfaction levels. - \item[\hyporef{hyp:Evaluation:StoreSizeBetterResults}] This hypothesis is born by the fact that having a bigger pool of configurations to choose from increases the chances of having a good recommendation. This of course requires the assumption that aggregation strategies that pick recommendations pick configurations that also fare better in the chosen satisfaction metric. If that is not the case this hypothesis should not hold. - \item[\hyporef{hyp:Evaluation:AggregationStrategies}] Best average and multiplication are strategies that are performing best in some of the, by \citeauthor{Masthoff2015} \cite[p. 755f]{Masthoff2015}, listed online experiments. Therefore it is reasonable to assume that they perform well here too. Least misery was listed in some studies as performing worst. Therefore there is an expectation of it faring less good than other group aggregation strategies. -\end{description} +\begin{hypothesis} + \begin{itshape} + \label{hyp:Evaluation:HigherTcLessSatisfied} A higher $tc$ value results in less satisfied people and more unsatisfied people with regard to the dictator's decision. + \end{itshape} \medskip \\* + A higher $tc$ value causes a person to be unsatisfied with a higher amount of configurations. Also it causes a person to be satisfied with less configurations. Therefore recommending a random configuration causes the chance of making an individual satisfied sink while increasing the chance of that person being unsatisfied. Already the change in probability leads to the assumption that this should be seen with non random recommendations too. +\end{hypothesis} + +\begin{hypothesis} + \begin{itshape} + \label{hyp:Evaluation:OnlyOneSatisfied} There exists a $tc$ value which causes only one person to be satisfied with the dictator's decision and no one is satisfied with the group recommender's decision. + \end{itshape} \medskip \\* + A $tc$ value that reaches a high enough level eventually should make only the dictator herself satisfied with the dictator's decision. The bar for satisfaction lies so high that any group recommendation will cause the dictator to also be not satisfied or at least neutral with the group decision. This can be understood as that in a group where nobody is willing to compromise everyone is only satisfied with one's own decision. Having two members with identical interest of course results in this effect not being present but this is expected to be rare for a group size of four. +\end{hypothesis} + +\begin{hypothesis} + \begin{itshape} + \label{hyp:Evaluation:HomogenousMoreSatisfied} Homogeneous groups have more satisfied members with the recommender's decision but also with the dictator's decision compared to heterogeneous groups. + \end{itshape} \medskip \\* + As the interest in homogenous groups are more aligned there is an expectation that the overall hapiness levels for more homogenous groups is higher. If the base level is higher already it is likely that even just a slight increase lifts recommendations for homogenous groups to satisfaction levels not reachable by heterogeneous groups. +\end{hypothesis} + +\begin{hypothesis} + \begin{itshape} + \label{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} More heterogeneous groups see a bigger satisfaction increase than less heterogeneous groups when switching from the decision of a dictator to a decision made by the recommender. + \end{itshape} \medskip \\* + The assumption is made that in more heterogeneous groups the satisfaction with the dictator's decision is less. Therefore there is a higher possible increase in satisfaction. A homogenous group that already satisfies all group members with the dictator's decision cannot see an increase in satisfaction therefore the assumption is made, that with a higher amount of people dissatisfied and not satisfied with the dictator's decision, there will be more people that can be lifted into satisfaction and therefore the increase will be bigger. However a group that has contradicting interest actually might not be able to reach high satisfaction levels. +\end{hypothesis} + +\begin{hypothesis} + \begin{itshape} + \label{hyp:Evaluation:StoreSizeBetterResults} A higher amount of stored finished configurations results in a higher amount of satisfied and a lower amount of dissatisfied group member. + \end{itshape} \medskip \\* + This hypothesis is born by the fact that having a bigger pool of configurations to choose from increases the chances of having a good recommendation. This of course requires the assumption that aggregation strategies that pick recommendations pick configurations that also fare better in the chosen satisfaction metric. If that is not the case this hypothesis should not hold. +\end{hypothesis} + +\begin{hypothesis} + \begin{itshape} + \label{hyp:Evaluation:AggregationStrategies} Multiplication and best average aggregation strategies perform better than least misery across the board. + \end{itshape} \medskip \\ + Best average and multiplication are strategies that are performing best in some of the, by \citeauthor{Masthoff2015} \cite[p. 755f]{Masthoff2015}, listed online experiments. Therefore it is reasonable to assume that they perform well here too. Least misery was listed in some studies as performing worst. Therefore there is an expectation of it faring less good than other group aggregation strategies. +\end{hypothesis} \section{Findings} \label{sec:Evaluation:Findings} @@ -217,7 +245,7 @@ To get an understanding of the data all parameters except the $tc$ will be fixed \label{fig:Evaluation:tcChange} \end{figure} -\hyporef{hyp:Evaluation:MaximumMinimum} states that the highest satisfaction change is expected at places where the overall satisfaction with the dictator's decision is one. However the data shows a slightly different result. This hypothesis does not hold true. When looking at the data we see peeks in satisfaction change when values are equal to $2.81, 2.51$ and $3$ (heterogeneous, random, homogenous). Therefore the expectation does not hold up. Moreover, valleys for dissatisfaction change are also not at the expected value of \textit{two}. They are instead at $1.19, 1.49, 0.04$ (heterogeneous, random, homogenous). Here the valleys are lower than expected. However the data from homogenous groups seems to be cut of. Therefore, it is not possible to say if there would be a potentially bigger decrease with a use case with more possible solutions. +\autoref{hyp:Evaluation:MaximumMinimum} states that the highest satisfaction change is expected at places where the overall satisfaction with the dictator's decision is close to two. However the data shows a slightly different result. This hypothesis does not hold true. When looking at the data we see peeks in satisfaction change when values are equal to $2.81, 2.51$ and $3$ (heterogeneous, random, homogenous). Therefore the expectation does not hold up. Moreover, valleys for dissatisfaction change are also not at the expected value of \textit{two}. They are instead at $1.19, 1.49, 0.04$ (heterogeneous, random, homogenous). Here the valleys are lower than expected. However the data from homogenous groups seems to be cut of. Therefore, it is not possible to say if there would be a potentially bigger decrease with a use case with more possible solutions. \begin{figure} \centering @@ -226,9 +254,9 @@ To get an understanding of the data all parameters except the $tc$ will be fixed \label{fig:Evaluation:tcCount} \end{figure} -The predicted trend that a higher $tc$ results in a lower satisfaction and a higher dissatisfaction, with the dictator's decision, as predicted by \hyporef{hyp:Evaluation:HigherTcLessSatisfied} can be clearly seen in \autoref{fig:Evaluation:tcCount} and has been described in this section already. +The predicted trend that a higher $tc$ results in a lower satisfaction and a higher dissatisfaction, with the dictator's decision, as predicted by \autoref{hyp:Evaluation:HigherTcLessSatisfied} can be clearly seen in \autoref{fig:Evaluation:tcCount} and has been described in this section already. -\hyporef{hyp:Evaluation:OnlyOneSatisfied} predicts that the satisfaction with the individual decision eventually reaches one and that no one is satisfied with the group recommender decision. This means the satisfaction change should reach minus one. \autoref{fig:Evaluation:tcCount} shows a downward trend that comes close to one for heterogeneous and random groups. Therefore, the trend suggests that the hypothesis holds in regards to heterogeneous and random groups but as the drop for homogenous groups just reaches below $2.8$ suggesting that the hypothesis does not hold for homogenous groups. Also, satisfaction change in heterogeneous groups reaches close to minus one but this value is neither reached by random groups, nor by homogenous groups. The hypothesis therefore should not be seen as confirmed in that regard as well and further investigation is needed. +\autoref{hyp:Evaluation:OnlyOneSatisfied} predicts that the satisfaction with the individual decision eventually reaches one and that no one is satisfied with the group recommender decision. This means the satisfaction change should reach minus one. \autoref{fig:Evaluation:tcCount} shows a downward trend that comes close to one for heterogeneous and random groups. Therefore, the trend suggests that the hypothesis holds in regards to heterogeneous and random groups but as the drop for homogenous groups just reaches below $2.8$ suggesting that the hypothesis does not hold for homogenous groups. Also, satisfaction change in heterogeneous groups reaches close to minus one but this value is neither reached by random groups, nor by homogenous groups. The hypothesis therefore should not be seen as confirmed in that regard as well and further investigation is needed. During a group decision it is better to make one less person dissatisfied opposed to one more person satisfied. Therefore, this thesis uses $tc$ values that are closer to the minima of dissatisfaction change than to the maxima of satisfaction change. The minima for heterogeneous groups is at $tc = 70\%$ therefore this is the chosen value for the evaluation of other aspects. For random groups the minima of dissatisfaction change can be found at $tc = 85\%$ which is the value used for all following analysis of random groups. For homogenous group dissatisfaction change is decreasing until the highest possible value of $tc$ is reached. Because of that $tc = 94\%$ is used for analysis. @@ -263,13 +291,13 @@ Random groups have less overall satisfaction with $tc = 85\%$ as seen in \autore \end{figure} After description of the data now the focus shifts to the hypotheses left that have not been evaluated. -\hyporef{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied member's with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However \autoref{fig:Evaluation:HeteroSatisfactionTotal}, \autoref{fig:Appendix:HomoSatisfactionTotal} and \autoref{fig:Appendix:RandomSatisfactionTotal} show that for satisfaction with the recommender's decision this does not hold when looking at $tc$ values where the recommender performs best for each segment. In those places the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups perform better. It is important to note, when the same $tc$ values are used homogenous groups have a higher amount of satisfied people across the board. +\autoref{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied member's with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However \autoref{fig:Evaluation:HeteroSatisfactionTotal}, \autoref{fig:Appendix:HomoSatisfactionTotal} and \autoref{fig:Appendix:RandomSatisfactionTotal} show that for satisfaction with the recommender's decision this does not hold when looking at $tc$ values where the recommender performs best for each segment. In those places the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups perform better. It is important to note, when the same $tc$ values are used homogenous groups have a higher amount of satisfied people across the board. -\hyporef{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} states that the increase in satisfaction should be bigger for more heterogeneous groups. However \autoref{fig:Evaluation:HeteroSatisfactionIncrease}, \autoref{fig:Appendix:HomoSatisfactionIncrease} and \autoref{fig:Appendix:RandomSatisfactionIncrease} show this to be not true. The recommendations for heterogeneous groups indeed cause a larger change in satisfaction compared to homogeneous groups but random groups cause a positive change of higher magnitude. Also the decrease in dissatisfaction is higher among random groups. +\autoref{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} states that the increase in satisfaction should be bigger for more heterogeneous groups. However \autoref{fig:Evaluation:HeteroSatisfactionIncrease}, \autoref{fig:Appendix:HomoSatisfactionIncrease} and \autoref{fig:Appendix:RandomSatisfactionIncrease} show this to be not true. The recommendations for heterogeneous groups indeed cause a larger change in satisfaction compared to homogeneous groups but random groups cause a positive change of higher magnitude. Also the decrease in dissatisfaction is higher among random groups. -The data shows that having a larger configuration database causes the amount of satisfied group members to be greater than recommendation's using a smaller database. With dissatisfaction the same is seen in inverse. A larger configuration database causes the number of dissatisfied group members to drop compared to a small database. However in some runs there have been instances of least misery that have seen a slight drop. This can be seen in \autoref{fig:Evaluation:HeteroSatisfactionIncrease} when comparing $74$ and $148$ as number of stored configurations. Why this happens is not entirely clear but a cause of that might be that least misery just takes into account the worst performing group member of the group. Therefore it is possible that there is a second slightly worse solution, when comparing least misery scores, which actually has a slight advantage in terms of dissatisfaction. Having this second best configuration can cause it to be in the second database partition therefore resulting in less dissatisfaction on average. \hyporef{hyp:Evaluation:StoreSizeBetterResults} therefore is supported by the data but it does not fully hold up when looking at least misery. +The data shows that having a larger configuration database causes the amount of satisfied group members to be greater than recommendation's using a smaller database. With dissatisfaction the same is seen in inverse. A larger configuration database causes the number of dissatisfied group members to drop compared to a small database. However in some runs there have been instances of least misery that have seen a slight drop. This can be seen in \autoref{fig:Evaluation:HeteroSatisfactionIncrease} when comparing $74$ and $148$ as number of stored configurations. Why this happens is not entirely clear but a cause of that might be that least misery just takes into account the worst performing group member of the group. Therefore it is possible that there is a second slightly worse solution, when comparing least misery scores, which actually has a slight advantage in terms of dissatisfaction. Having this second best configuration can cause it to be in the second database partition therefore resulting in less dissatisfaction on average. \autoref{hyp:Evaluation:StoreSizeBetterResults} therefore is supported by the data but it does not fully hold up when looking at least misery. -\hyporef{hyp:Evaluation:AggregationStrategies} states least misery performs worse than multiplication. For a change in satisfaction this can be seen across the board however for dissatisfaction change this is not true everywhere. \autoref{fig:Evaluation:HeteroSatisfactionIncrease} shows that least misery performs better than best average in terms of dissatisfaction reduction. However in other cases it performs visibly worse. Also of note is multiplication performs best across the board. This supports the findings by \citeauthor{Masthoff2015} \cite[p. 755f]{Masthoff2015} and also shows that the satisfaction model does show some similar results to online evaluations. +\autoref{hyp:Evaluation:AggregationStrategies} states least misery performs worse than multiplication. For a change in satisfaction this can be seen across the board however for dissatisfaction change this is not true everywhere. \autoref{fig:Evaluation:HeteroSatisfactionIncrease} shows that least misery performs better than best average in terms of dissatisfaction reduction. However in other cases it performs visibly worse. Also of note is multiplication performs best across the board. This supports the findings by \citeauthor{Masthoff2015} \cite[p. 755f]{Masthoff2015} and also shows that the satisfaction model does show some similar results to online evaluations. To go back to \autoref{sec:Evaluation:Questions} this section has shown that for random and heterogeneous groups the recommender performs better than a dictator. The average satisfaction depends on the chosen parameters but for the chosen value range average satisfaction with the recommender decision lies above two and can reach close to three satisfied group members for a high number of stored configurations and for some group types. The amount of stored finished configurations plays an important role in performance but with a fraction of stored configurations the recommender still yields good results. diff --git a/30_Thesis/thesis.tex b/30_Thesis/thesis.tex index 755404d..bcb8766 100644 --- a/30_Thesis/thesis.tex +++ b/30_Thesis/thesis.tex @@ -79,7 +79,7 @@ %% -------------------------------- %% | Ref Hypothesis | %% -------------------------------- -\newcommand{\hyporef}[1]{\hyperref[#1]{Hypothesis \ref*{#1}}} +\newcommand{\hypothesisautorefname}{Hypothesis} %% -------------------------------- %% | PDF Comments | @@ -112,6 +112,12 @@ %% | Generating Frames | %% --------------------- \usepackage[framemethod=TikZ]{mdframed} +\mdtheorem[ + linecolor=gray!60, + linewidth=1pt, + frametitlebackgroundcolor=gray!20, + frametitlefont=\sffamily\bfseries\color{black}, +]{hypothesis}{Hypothesis}[section] %% -------------------------------- %% | PDF Comments |