From d3e2add2659782d734f4132fd26bee5cb570f359 Mon Sep 17 00:00:00 2001 From: "hannes.kuchelmeister" Date: Thu, 7 May 2020 16:31:29 +0200 Subject: [PATCH] finish fixing language mistakes in evaluation chapter --- 30_Thesis/sections/60_evaluation.tex | 36 ++++++++++++++-------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/30_Thesis/sections/60_evaluation.tex b/30_Thesis/sections/60_evaluation.tex index 7962312..888b45a 100644 --- a/30_Thesis/sections/60_evaluation.tex +++ b/30_Thesis/sections/60_evaluation.tex @@ -260,54 +260,54 @@ The predicted trend that a higher $tc$ results in a lower satisfaction and a hig \autoref{hyp:Evaluation:OnlyOneSatisfied} predicts that the satisfaction with the individual decision eventually reaches one and that no one is satisfied with the group recommender's decision. This means the satisfaction change should decrease to minus one. \autoref{fig:Evaluation:tcCount} shows a downward trend that comes close to one for heterogeneous and random groups. Therefore, the trend suggests that the hypothesis holds true with regard to heterogeneous and random groups but the drop for homogenous groups just reaches below $2.8$ suggesting that the hypothesis does not hold for homogenous groups. Also, satisfaction change in heterogeneous groups decreases close to minus one while this value is neither fully reached by random groups nor by homogenous groups. The hypothesis therefore holds true only for heterogeneous groups. A likely cause why it does not seem to hold true for random or homogenous groups is that the highest tc value still includes multiple configurations and a recommended configuration keeps some group members satisfied for some of the time. For random groups it may also be possible that a group member other than the dictator could be satisfied with the group decision. -During a group decision it is better to make one less person dissatisfied than to make one more person satisfied. Therefore, this thesis uses $tc$ values that are closer to the minima of dissatisfaction change than to the maxima of satisfaction change. The minima for heterogeneous groups is at $tc = 70\%$, therefore, this is the chosen value for evaluation of the remaining hypotheses. This is needed because otherwise analysis would be infeasible due to a too large parameter space. For random groups the minima of dissatisfaction change can be found at $tc = 85\%$ which is the value used for all following analyses of random groups. For homogenous group dissatisfaction change is decreasing until the highest possible value of $tc$ is reached. Because of that $tc = 94\%$ is used for analysis. +During a group decision it is better to make one less person dissatisfied than to make one more person satisfied. Therefore, this thesis uses $tc$ values that are closer to the minima of dissatisfaction change than to the maxima of satisfaction change. The minima for heterogeneous groups is at $tc = 70\%$, therefore, this is the chosen value for evaluation of the remaining hypotheses. This is needed because otherwise analysis would be infeasible due to a too large parameter space. For random groups the minima of dissatisfaction change can be found at $tc = 85\%$ which is the value used for all following analyses of random groups. For homogenous group-dissatisfaction change is decreasing until the highest possible value of $tc$. Because of that $tc = 94\%$ is used for analysis. \subsection{Recommender Performance Analysis} \begin{figure}[p] \centering \includegraphics[width=1\textwidth]{./figures/60_evaluation/heterogeneous_combined__amount-1000__tc-70} - \caption[Satisfaction and Dissatisfaction: Heterogeneous Groups]{The satisfaction and dissatisfaction using the group recommender for heterogeneous groups with $tc = 70$.} + \caption[Satisfaction and Dissatisfaction: Heterogeneous Groups]{Satisfaction and dissatisfaction using the group recommender for heterogeneous groups with $tc = 70$.} \label{fig:Evaluation:HeteroSatisfaction} \end{figure} \begin{figure}[p] \centering \includegraphics[width=1\textwidth]{./figures/60_evaluation/random_combined__amount-1000__tc-85} - \caption[Satisfaction and Dissatisfaction: Random Groups]{The satisfaction and dissatisfaction using the group recommender for random groups with $tc = 85$.} + \caption[Satisfaction and Dissatisfaction: Random Groups]{Satisfaction and dissatisfaction using the group recommender for random groups with $tc = 85$.} \label{fig:Evaluation:RandomSatisfaction} \end{figure} \begin{figure}[p] \centering \includegraphics[width=1\textwidth]{./figures/60_evaluation/homogeneous_combined__amount-1000__tc-94} - \caption[Satisfaction and Dissatisfaction: Homogeneous Groups]{The satisfaction and dissatisfaction using the group recommender for homogeneous groups with $tc = 94$.} + \caption[Satisfaction and Dissatisfaction: Homogeneous Groups]{Satisfaction and dissatisfaction using the group recommender for homogeneous groups with $tc = 94$.} \label{fig:Evaluation:HomoSatisfaction} \end{figure} -This subsection holds fixed parameters of $tc$. It describes the satisfaction change and the total amount of satisfied people with the recommenders decision dependent on the amount of stored configurations. +This subsection fixes parameters of $tc$. It describes the satisfaction change and the total amount of satisfied people with the recommender's decision dependent on the amount of stored configurations. -\autoref{fig:Evaluation:HeteroSatisfaction} shows the relationship between the satisfaction and dissatisfaction and the number of stored configurations. The left y-axis shows the change in satisfaction compared to a decision made by a dictator. The right axis shows the average number of group members being satisfied. The left figure shows numbers for satisfaction and the right for dissatisfaction. On the left higher numbers are better and on the right lower ones (with regards to change). There are three graphs each. One for multiplication, one for least misery and one for best average. The graphs for satisfaction are similar to a logarithmic curve. The increase in change of satisfaction decelerates with a higher number of stored configurations. The change in satisfaction is always above zero and a satisfaction increase of more than three quarters of the maximum can already be seen at around 25 stored configurations. Moreover, the curve for multiplication is greater than all other curves for all parameters. Least misery reaches the lowest amount of change across all values. The minimum number of satisfaction change is $0$ for least misery, and $0.1$ for best average and multiplications. The highest number is around $0.3$ for least misery, $0.4$ for best average and $0.5$ for multiplication -When looking at dissatisfaction change the graphs are all in the negative number range. Multiplication reaches the lowest number and best average the highest. The gap between all three functions is less than that of satisfaction increase. And overall the curves are flatter meaning the change with 25 stored configurations already reaches close to five sixth of the minimum value. The highest number of satisfaction change is $-0.4$ for all strategies meanwhile the lowest number is around $-0.57$ for least misery, $-0.53$ for best average and $-0.63$ for multiplication. +\autoref{fig:Evaluation:HeteroSatisfaction} shows the relationship between satisfaction and dissatisfaction and the number of stored configurations. The left y-axis shows the change in satisfaction compared to a decision made by a dictator. The right axis shows the average number of satisfied group members. The left figure shows numbers for satisfaction and the right for dissatisfaction. On the left, higher numbers are better and on the right, lower ones (with regards to change). There are three graphs each. One for multiplication, one for least misery and one for best average. The graphs for satisfaction are similar to a logarithmic curve. The increase in change of satisfaction slows with a higher number of stored configurations. The change in satisfaction is always above zero and a satisfaction increase of more than three quarters of the maximum can already be seen at around 25 stored configurations. Moreover, the curve for multiplication is greater than all other curves for all parameters. Least misery reaches the lowest amount of change across all values. The minimum number of satisfaction change is $0$ for least misery, and $0.1$ for best average and multiplications. The highest number is around $0.3$ for least misery, $0.4$ for best average and $0.5$ for multiplication +When looking at dissatisfaction change the graphs are all in the negative number range. Multiplication reaches the lowest number and best average the highest. The gap between all three functions is less than that of satisfaction increase. And overall the curves are flatter indicating that the change with 25 stored configurations already reaches close to five sixth of the minimum value. The highest number of satisfaction change is $-0.4$ for all strategies while the lowest number is around $-0.57$ for least misery, $-0.53$ for best average and $-0.63$ for multiplication. -The figures for homogenous (\autoref{fig:Evaluation:HomoSatisfaction}) and random groups (\autoref{fig:Evaluation:RandomSatisfaction}) have a similar shape but their values and slope vary. The satisfaction change for homogenous groups is mostly negative, starting at $-2$, and only reaches a positive level for more than $100$ stored configurations with a value of $0.04$. Multiplication and best average have higher values than least misery here, too. Moreover the dissatisfaction change is always positive with a value range of $[0,1]$, except it slightly falls below zero after more than $75$ configurations are stored. -Random groups as seen in \autoref{fig:Evaluation:RandomSatisfaction} mostly have a positive change in satisfaction. Values range here from $-0.55$ to $0.27$ for least misery, from $-0.27$ and $-0.28$ to $0.74$ for best average and multiplication. The change is higher than the change for heterogeneous groups. Dissatisfaction also changes similarly to heterogeneous groups. Here the values for random groups reach a lower level. They range from $0$ to $-0.59$ for least misery. Multiplication and best average both have as minimum value around $-0.21$ and behave similarly. The range goes down to $-0.84$ for best average and $-0.86$ for multiplication. +The figures for homogenous (\autoref{fig:Evaluation:HomoSatisfaction}) and random groups (\autoref{fig:Evaluation:RandomSatisfaction}) have a similar shape but their values and slopes vary. The satisfaction change for homogenous groups is mostly negative, starting at $-2$, and only reaches a positive level for more than $100$ stored configurations with a value of $0.04$. Multiplication and best average have higher values than least misery here too. Moreover the dissatisfaction change is always positive with a value range of $[0,1]$, except for a slight fall below zero after more than $75$ configurations are stored. +Random groups as seen in \autoref{fig:Evaluation:RandomSatisfaction} mostly have a positive change in satisfaction. Here, values range from $-0.55$ to $0.27$ for least misery, from $-0.27$ and $-0.28$ to $0.74$ for best average and multiplication. The change is higher than the change for heterogeneous groups. Dissatisfaction also changes similarly to heterogeneous groups. Here the values for random groups reach a lower level. They range from $0$ to $-0.59$ for least misery. Multiplication and best average both have a minimum value around $-0.21$ and behave similarly. The range goes down to $-0.84$ for best average and $-0.86$ for multiplication. -\autoref{fig:Evaluation:HeteroSatisfaction} also shows the average number of group members satisfied and dissatisfied with the recommender's decision. Satisfaction with the recommender's decision starts at $2.4$ and quickly reaches $2.65$ for least misery and $2.8$ for best average and multiplication. The highest value for multiplication is at $2.89$. Dissatisfaction also quickly plateaus. Here values for different recommenders are closer together. They start at $0.74$ (least misery) to $0.78$ (best average) and go as low as $0.62$ for least misery, $0.66$ for best average and $0.56$ for multiplication. +\autoref{fig:Evaluation:HeteroSatisfaction} also shows the average number of group members satisfied and dissatisfied with the recommender's decision. Satisfaction with the recommender's decision starts at $2.4$ and quickly reaches $2.65$ for least misery and $2.8$ for best average and multiplication. The highest value for multiplication is at $2.89$. Dissatisfaction also quickly plateaus. Here values for different recommenders are closer together. They start at $0.74$ (least misery) to $0.78$ (best average) and fall as low as $0.62$ for least misery, $0.66$ for best average and $0.56$ for multiplication. -As shown in \autoref{fig:Evaluation:HomoSatisfaction} when looking at the total numbers the value range for homogenous groups is much larger but the overall shape stays the same. Here satisfaction numbers go from $0.55$ to $2.95$. Least misery performs visibly worse than multiplication and best average reaching only $2.7$. Dissatisfaction values range from $1.21$ to $0.01$ and the values are not really visibly distinguishable besides that in the range $[25,50]$ least misery seems to have the highest number of dissatisfied group members. +When looking at the total numbers as shown in \autoref{fig:Evaluation:HomoSatisfaction} the value range for homogenous groups is much larger but the overall shape stays the same. Here satisfaction numbers go from $0.55$ to $2.95$. Least misery performs visibly worse than multiplication and best average reaches only $2.7$. Dissatisfaction values range from $1.21$ to $0.01$ and the values are not really visibly distinguishable, except in the range of $[25,50]$. Least misery seems to have the highest number of dissatisfied group members in this range. -Random groups have less overall satisfaction with $tc = 85\%$ as seen in \autoref{fig:Evaluation:RandomSatisfaction} when looking at the total numbers. Satisfaction numbers start from $1.33$ (least misery), $1.61$ (best average) and $1.6$ (multiplication) and go up to $2.15$ for least misery and $2.62$ for best average and multiplication. The dissatisfaction numbers start at $1.5$ for least misery and $1.27$ for best average and multiplication and level of at $0.9$ (least misery), $0.65$ (best average) and $0.63$ (multiplication). Visibly there is a big difference between least misery and the other two aggregation functions. +Random groups have less overall satisfaction with $tc = 85\%$ as seen in \autoref{fig:Evaluation:RandomSatisfaction} when looking at the total numbers. Satisfaction numbers start from $1.33$ (least misery), $1.61$ (best average) and $1.6$ (multiplication) and go up to $2.15$ for least misery and $2.62$ for best average and multiplication. The dissatisfaction numbers start at $1.5$ for least misery and $1.27$ for best average and multiplication and level off at $0.9$ (least misery), $0.65$ (best average) and $0.63$ (multiplication). There is a big difference visible between least misery and the other two aggregation functions. \subsection{Discussion} -After description of the data the remaining hypotheses are discussed. -\autoref{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied member's with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for the dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show that for satisfaction with the recommender's decision this does not hold when looking at $tc$ values where the recommender performs best for each segment. In those places the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups achieve a higher satisfaction. This likely happens because of the similarity between group members. A recommender with imperfect knowledge and a, in size reduced, configuration database gives results that are not good enough and cannot compete with the dictator who always finds the perfect individual match that group members of homogeneous groups are satisfied with. It is important to note, when the same $tc$ values are used homogenous groups have a higher amount of satisfied people across the board. +Having described the data the remaining hypotheses are discussed. +\autoref{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied members with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for the dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show that for satisfaction with the recommender's decision this does not hold true when looking at $tc$ values where the recommender performs best for each segment. In these cases the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups achieve a higher satisfaction. This is likely to happen due to the similarity between group members. A recommender with imperfect knowledge and a size-reduced, configuration database generates results that are not good enough and cannot compete with the dictator who always finds the perfect individual match that group members of homogeneous groups are satisfied with. It is important to note that homogeneous groups show a higher number of satisfied people across the board when the same $tc$ values are used. -\autoref{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} states that the increase in satisfaction should be bigger for more heterogeneous groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show this to be not true. The recommendations for heterogeneous groups indeed cause a larger change in satisfaction compared to homogeneous groups but random groups cause a positive change of higher magnitude. Also the decrease in dissatisfaction is higher among random groups. This possibly happens due to random groups having interest that are more aligned and their preferences among group members therefore they do not diverge as much, thereby resulting in compromises for the group that can satisfy more individual members. Also the group preferences are still far apart enough to cause enough dissatisfaction and neutrality with the dictator's decision. +\autoref{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} states that the increase in satisfaction should be bigger for more heterogeneous groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show this not to be true. The recommendations for heterogeneous groups indeed cause a larger change in satisfaction compared to homogeneous groups but random groups cause a positive change of higher magnitude. Also, the decrease in dissatisfaction is higher among random groups. This possibly happens because random groups have more aligned interests and preferences among group members and, therefore, they do not diverge as much which results in compromises for the group that can satisfy more individual members. Also the group preferences are still far enough apart to cause dissatisfaction and neutrality with the dictator's decision. -The data shows that having a larger configuration database causes the amount of satisfied group members to be greater than recommendation's using a smaller database. With dissatisfaction the same is seen in inverse. A larger configuration database causes the number of dissatisfied group members to drop compared to a small database. However, in some runs there have been instances of least misery that have seen a slight drop. This can be seen in \autoref{fig:Evaluation:HeteroSatisfaction} when comparing $74$ and $148$ as number of stored configurations. Why this happens is not entirely clear but a cause of that might be that least misery just takes into account the worst performing group member of the group. Therefore, it is possible that there is a second slightly worse solution, when comparing least misery scores, which actually has a slight advantage in terms of dissatisfaction. Having this second best configuration can cause it to be in the second database partition therefore resulting in less dissatisfaction on average. \autoref{hyp:Evaluation:StoreSizeBetterResults} therefore is supported by the data but it does not fully hold up when looking at least misery. +The data shows that a larger configuration database causes the amount of satisfied group members to be greater than recommendations using a smaller database. With dissatisfaction the same is seen in the inverse. A larger configuration database causes the number of dissatisfied group members to drop compared to a small database. However, in some runs there have been instances of least misery that show a slight drop. This can be seen in \autoref{fig:Evaluation:HeteroSatisfaction} when comparing $74$ and $148$ as the number of stored configurations. Why this happens is not entirely clear but a cause might be that least misery just takes into account the worst performing group member of the group. Therefore, it is possible that there is a second slightly worse solution, when comparing least misery scores, which actually has a slight advantage in terms of dissatisfaction. This second best configuration can cause it to be in the second database partition therefore resulting in less dissatisfaction on average. \autoref{hyp:Evaluation:StoreSizeBetterResults} therefore is supported by the data but it does not fully hold true when looking at least misery. -\autoref{hyp:Evaluation:AggregationStrategies} states least misery performs worse than multiplication. For a change in satisfaction this can be seen across the board, however, for dissatisfaction change this is not true everywhere. \autoref{fig:Evaluation:HeteroSatisfaction} shows that least misery performs better than best average in terms of dissatisfaction reduction. This behaviour possibly occurs because an average metric yields the same results for heavily polarised decisions and decisions that everyone feels neutral about. Least misery on the other hand takes only the group member least satisfied with the decision into account therefore this metric performs better. However, in other cases it performs visibly worse. Also of note is multiplication performs best across the board. This supports the findings by \citeauthor{Masthoff2015} \cite[~p. 755f]{Masthoff2015} and also shows that the satisfaction model does show some similar results to online evaluations. +\autoref{hyp:Evaluation:AggregationStrategies} states least misery performs worse than multiplication. For a change in satisfaction this can be seen across the board, however, for the change in dissatisfaction this is not true everywhere. \autoref{fig:Evaluation:HeteroSatisfaction} shows that least misery performs better than best average in terms of dissatisfaction reduction. This behaviour possibly occurs because an average metric yields the same results for heavily polarised decisions and decisions that everyone feels neutral about. Least misery on the other hand takes into account only the group member least satisfied with the decision and, therefore, this metric performs better. However, in other cases it performs visibly worse. Also notable is that multiplication performs best across the board. This supports the findings by \citeauthor{Masthoff2015} \cite[~p. 755f]{Masthoff2015} and also shows that the satisfaction model does show some similar results to online evaluations. -To go back to in \autoref{sec:Evaluation:Questions} posed evaluation questions this section has shown that for random and heterogeneous groups the recommender performs better than a dictator. The average satisfaction depends on the chosen parameters but for the chosen value range average satisfaction with the recommender decision lies above two and can reach close to three satisfied group members for a high number of stored configurations and for some group types. The amount of stored finished configurations plays an important role in the recommender's performance but with a fraction of stored configurations the recommender still yields good results. This shows that the recommender provides useful decision support for helping in group decisions. It provides a solid basis for groups and can help their group decision. Most decisions the recommender does improve group satisfaction which shows that the recommender is able to be used to improve group decisions. \ No newline at end of file +To go back to the evaluation questions posed in \autoref{sec:Evaluation:Questions} this section has shown that for random and heterogeneous groups the recommender performs better than a dictator. The average satisfaction depends on the chosen parameters but for the chosen value range average satisfaction with the recommender decision lies above two and can reach close to three satisfied group members for a high number of stored configurations and for some group types. The amount of stored finished configurations plays an important role in the recommender's performance but with a fraction of stored configurations the recommender still yields good results. This shows that the recommender provides useful decision support for helping in group decisions. It provides a solid basis for groups and can help their group decision. Most decisions the recommender does improve group satisfaction which shows that the recommender is able to be used to improve group decisions. \ No newline at end of file