improve results section

This commit is contained in:
hannes.kuchelmeister
2020-04-07 17:44:43 +02:00
parent 37c8489f53
commit 826f3e9b04

View File

@@ -307,14 +307,13 @@ Random groups have less overall satisfaction with $tc = 85\%$ as seen in \autore
\subsection{Discussion}
After description of the data the remaining hypotheses are discussed.
\autoref{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied member's with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show that for satisfaction with the recommender's decision this does not hold when looking at $tc$ values where the recommender performs best for each segment. In those places the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups achieve a higher satisfaction \todo[]{Interpretation: warum ist das so?}. It is important to note, when the same $tc$ values are used homogenous groups have a higher amount of satisfied people across the board.
\autoref{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied member's with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show that for satisfaction with the recommender's decision this does not hold when looking at $tc$ values where the recommender performs best for each segment. In those places the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups achieve a higher satisfaction. This likely happens because of similarity between group members. A recommender with imperfect knowledge and a, in size, reduced configuration database gives results that are not good enough and cannot compete with the dictator who always finds the perfect individual match that group member's of homogeneous groups are satisfied with. It is important to note, when the same $tc$ values are used homogenous groups have a higher amount of satisfied people across the board.
\autoref{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} states that the increase in satisfaction should be bigger for more heterogeneous groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show this to be not true. The recommendations for heterogeneous groups indeed cause a larger change in satisfaction compared to homogeneous groups but random groups cause a positive change of higher magnitude. Also the decrease in dissatisfaction is higher among random groups \todo[]{Interpretation: warum ist das so? Warum trifft die Hypothese nicht zu?}.
\autoref{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} states that the increase in satisfaction should be bigger for more heterogeneous groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show this to be not true. The recommendations for heterogeneous groups indeed cause a larger change in satisfaction compared to homogeneous groups but random groups cause a positive change of higher magnitude. Also the decrease in dissatisfaction is higher among random groups. This possibly happens due to random groups having interest that are more aligned and their preferences among group membes therefore do not diverge as much, therefore resulting in compromises for the group that can satisfy more individual members. Also the group preferences are still far apart enough to cause enough dissatisfaction and neutrality with the dictator's decision.
The data shows that having a larger configuration database causes the amount of satisfied group members to be greater than recommendation's using a smaller database. With dissatisfaction the same is seen in inverse. A larger configuration database causes the number of dissatisfied group members to drop compared to a small database. However, in some runs there have been instances of least misery that have seen a slight drop. This can be seen in \autoref{fig:Evaluation:HeteroSatisfaction} when comparing $74$ and $148$ as number of stored configurations. Why this happens is not entirely clear but a cause of that might be that least misery just takes into account the worst performing group member of the group. Therefore it is possible that there is a second slightly worse solution, when comparing least misery scores, which actually has a slight advantage in terms of dissatisfaction. Having this second best configuration can cause it to be in the second database partition therefore resulting in less dissatisfaction on average. \autoref{hyp:Evaluation:StoreSizeBetterResults} therefore is supported by the data but it does not fully hold up when looking at least misery.
\autoref{hyp:Evaluation:AggregationStrategies} states least misery performs worse than multiplication. For a change in satisfaction this can be seen across the board however for dissatisfaction change this is not true everywhere. \autoref{fig:Evaluation:HeteroSatisfaction} shows that least misery performs better than best average in terms of dissatisfaction reduction \todo[]{Warum könnte dieses Verhalten auftreten?}. However in other cases it performs visibly worse. Also of note is multiplication performs best across the board. This supports the findings by \citeauthor{Masthoff2015} \cite[p. 755f]{Masthoff2015} and also shows that the satisfaction model does show some similar results to online evaluations.
\autoref{hyp:Evaluation:AggregationStrategies} states least misery performs worse than multiplication. For a change in satisfaction this can be seen across the board however for dissatisfaction change this is not true everywhere. \autoref{fig:Evaluation:HeteroSatisfaction} shows that least misery performs better than best average in terms of dissatisfaction reduction. This behaviour possibly occurs because an average metric yields the same results for heavily polarised decisions and decisions that everyone feels neutral about. Least misery on the other hand takes only the group member least satisfied with the decision into account therefore this metric performs better. However in other cases it performs visibly worse. Also of note is multiplication performs best across the board. This supports the findings by \citeauthor{Masthoff2015} \cite[p. 755f]{Masthoff2015} and also shows that the satisfaction model does show some similar results to online evaluations.
To go back to in \autoref{sec:Evaluation:Questions} posed evaluation questions this section has shown that for random and heterogeneous groups the recommender performs better than a dictator. The average satisfaction depends on the chosen parameters but for the chosen value range average satisfaction with the recommender decision lies above two and can reach close to three satisfied group members for a high number of stored configurations and for some group types. The amount of stored finished configurations plays an important role in performance but with a fraction of stored configurations the recommender still yields good results. \todo[inline]{an dieser Stelle nochmal ausführlicher den Bogen schlagen zu deinem Recommender als sinnvolle Unterstützung bei der Entscheidungsfindung in Gruppen}
To go back to in \autoref{sec:Evaluation:Questions} posed evaluation questions this section has shown that for random and heterogeneous groups the recommender performs better than a dictator. The average satisfaction depends on the chosen parameters but for the chosen value range average satisfaction with the recommender decision lies above two and can reach close to three satisfied group members for a high number of stored configurations and for some group types. The amount of stored finished configurations plays an important role in performance but with a fraction of stored configurations the recommender still yields good results. This shows that the recommender provides useful decision support for helping in group decisions. It provides a solid basis for groups and can help their group decision. Most decisions the recommender makes improve group satisfaction which shows that the recommender is able to be used to improve group decisions.