From 225462b4ae4c351391b152db0ee29760020c1db7 Mon Sep 17 00:00:00 2001 From: "hannes.kuchelmeister" Date: Thu, 9 Apr 2020 10:58:28 +0200 Subject: [PATCH] add thesis suggestions and fix errors --- 30_Thesis/sections/60_evaluation.tex | 34 +++++++++++++++------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/30_Thesis/sections/60_evaluation.tex b/30_Thesis/sections/60_evaluation.tex index 3b9624e..1f84bca 100644 --- a/30_Thesis/sections/60_evaluation.tex +++ b/30_Thesis/sections/60_evaluation.tex @@ -8,6 +8,8 @@ In this chapter the prototype is evaluated in terms of its functionality and its For the evaluation a metric to evaluate by is needed. The proposed metric for usage is that of satisfaction. This metric has been newly created because existing literature did not provide metrics usable for this thesis. Satisfaction is quantified in this thesis by a threshold metric. A user's preference is used to calculate a rating for each possible solution. Each configuration solution gets an individual score determined by the user's preferences. The score is calculated using the average of a user's preference for each characteristic that is part of the configuration. The result allows that a configuration can be compared to all other configurations and ranked according to the percentage of configurations that it beats for a specific user. The threshold metric consists of two parameters. First the threshold center $tc$ and second the satisfaction distance $sd$. The threshold for a person being satisfied is at $tc + sd$ and of a person being dissatisfied is at $tc - sd$. If a recommendation lies in between these two thresholds the person is classified to neither by satisfied nor be unsatisfied with the solution. For this thesis $sd=5\%$ will be used. This choice is guided by the assumption that people switch from satisfied to unsatisfied rather quickly \todo{find a source psychology}. Therefore the parameter considered in this thesis is the $tc$. An example is the choice of $tc = 60\%$. This results in a person being satisfied with a recommendation if it is better than at least $65\%$ of all possible finished configurations. Moreover, a person is dissatisfied if the recommendation is not better than $55\%$ of possible finished configurations. A recommendation that is better than at least $55\%$ and not better than $65\%$ of possible solutions is considered neutral by the individual. +\todo{(optional) visualize tc value with an example configuration} + Different $tc$ values allow to model different situations. A situation where there is a low willingness to compromise is modelled by a high $tc$. A contrary situation where a group has a high willingness to compromise is modelled by a low $tc$. A satisfaction and dissatisfaction classification allows groups to be measured by the amount of people that are satisfied and dissatisfied. Moreover, changes in satisfaction and dissatisfaction for different parameters can be compared. A reasonable $tc$ value has to be found for groups otherwise any derived metrics will not show any meaningful results. @@ -15,7 +17,7 @@ A satisfaction and dissatisfaction classification allows groups to be measured b \section{Evaluation Objective} \label{sec:Evaluation:Questions} -This section poses three questions that will be answered during the evaluation. The questions aim is to guide through this chapter. They set the guidelines for this evaluation and where focuses are set. The questions answered during the evaluation are: +This section poses three questions that will be answered during the evaluation. The question's aim is to guide through this chapter. They set the guidelines for this evaluation and where focuses are set. The questions answered during the evaluation are: \begin{itemize} \item Main question: How does the satisfaction with a group decision, guided by the recommender, differ from the decision of a single decision maker, the dictator, who does not take the other group member's opinions into account? @@ -23,12 +25,12 @@ This section poses three questions that will be answered during the evaluation. \item How does the amount of stored finished configurations relate to satisfaction with a recommendation? \end{itemize} -The main question is used to understand the usefulness of the recommender and if it gives benefits to groups. The second question is aimed at providing information regarding the data and how satisfaction looks like in group decisions and what factors influence it. Last, a technical question is posed. This question is relevant because it shows technical aspects of the recommender. This is important because other work for using the recommender in other possibly larger use cases depend on performance figures in relation to number of stored configurations. +The main question is used to understand the usefulness of the recommender and whether it gives benefits to groups. The second question is aimed at providing information regarding the data and how satisfaction looks like in group decisions and what factors influence it. Last, a technical question is posed. This question is relevant because it shows technical aspects of the recommender. This is important because other work for using the recommender in other possibly larger use cases depend on performance figures in relation to number of stored configurations. \section{Use Case} \label{sec:Evaluation:UseCase} -To evaluate the recommender, a use case is needed. In this thesis, a forestry use case is evaluated. This is a use case with four stakeholders. \autoref{fig:Concept:ForestExample} presents the attributes and characteristics of this use case but an extension is needed to fully show the whole use case. Namely rules of non valid configurations. The constraints for this use case are listed in \emph{not with} form in \autoref{tab:Evaluation:UseCase}. +To evaluate the recommender, a use case is needed. In this thesis, a forestry use case is evaluated. This is a use case with four stakeholders. \autoref{fig:Concept:ForestExample} presents the attributes and characteristics of this use case but an extension is needed to fully show the whole use case. Namely the rules of non valid configurations are missing. Therefore the constraints for this use case are listed in \emph{not with} form in \autoref{tab:Evaluation:UseCase}. \begin{table} \tiny @@ -52,11 +54,11 @@ To evaluate the recommender, a use case is needed. In this thesis, a forestry us \multirow{3}{*}{\textit{effort}} & low & & & & & & & & & & - & - & - & & & n & n & n & & & & \\ \cline{2-23} & moderate & & & & & & & & & & - & - & - & & & & & & & & n & n \\ \cline{2-23} - & high & & & & & & & & & & - & - & - & & & & & & & & & \\ \hline + & high & & & & & & & & & & - & - & - & & & & & & & & n & n \\ \hline \multirow{3}{*}{\textit{quantity}} & low & & & & & & & & & & & & & - & - & - & n & n & & & & \\ \cline{2-23} - & moderate & & & & & & & & & & & & & - & - & - & n & n & & & n & n \\ \cline{2-23} - & high & & & n & & & n & n & & & n & & & - & - & - & & & & & & \\ \hline + & moderate & & & & & & & & & & & & & - & - & - & n & n & & & & \\ \cline{2-23} + & high & & & n & & & n & n & & & n & & & - & - & - & & & & & n & n \\ \hline \multirow{3}{*}{\textit{price}} & low & & & n & & & n & n & & & n & & & n & n & & - & - & - & & & \\ \cline{2-23} & moderate & & & & & & n & n & & & n & & & n & n & & - & - & - & & & \\ \cline{2-23} @@ -73,7 +75,7 @@ To evaluate the recommender, a use case is needed. In this thesis, a forestry us \end{table} The stakeholders in this use case are: a forest owner, an athlete, an environmentalist, and a consumer. The owner sees the forest as an investment, he is interested in a high long term profit. On the other hand the consumer is interested in reasonable wood price as she uses wood for furniture and also for her fireplace. In contrast, the environmentalist is interested in a healthy forest that is not impacted negatively by human activity. Last is the athlete who is interested in good accessibility of the forest and that there is some plant and animal life. -Every group consists of four people whereby they need to try and find a compromise. Diverging preferences make this difficult. All stakeholders have an interest in getting their will but also all parties need the acceptance of others with the decision. None of the stakeholders want to have a decision go exactly their way but end up with protests that arise from the deep dissatisfaction of other groups members. +Every group consists of four people whereby they need to try and find a compromise. Diverging preferences make this difficult. All stakeholders have an interest in getting their will but also all parties need the acceptance of others with the decision. It is not in the interest of a stakeholder to fully have their preferences met, while ending up with protests that arise from the deep dissatisfaction of other groups members. \section{Data Generation} \label{sec:Evaluation:GeneratingGroups} @@ -163,7 +165,7 @@ For the forest use case, the idea is that there are multiple types of user profi \end{center} \end{table} -These user profiles can be used to generate rather homogenous groups but also to create groups that have interests that are more conflicting. The following group types, with four members each are generated: +These user profiles can be used to generate rather homogenous groups but also to create groups that have interests that are more conflicting. The following group types, with four members each, are generated: \begin{itemize} \item random groups (preferences are uniformly random) @@ -171,7 +173,7 @@ These user profiles can be used to generate rather homogenous groups but also to \item homogeneous groups (only one preference profile for all group members which in this evaluation is the forest owner) \end{itemize} -The natural group type for the use case is a heterogeneous group but to widen the evaluation and to see how the recommender performs with different types of groups the two other group types are evaluated, too. Therefore more general statements about the recommender's performance can be made: +The natural group type for the use case is a heterogeneous group but to widen the evaluation and to see how the recommender performs with different types of groups the two other group types are evaluated, too. Therefore more general statements about the recommender's performance can be made. \subsection{The Effect of Stored Finished Configurations} @@ -187,7 +189,7 @@ This section gives an overview over the hypothesis tested during data analysis. \begin{itshape} \label{hyp:Evaluation:MaximumMinimum} Highest improvements with group recommendation are when the amount of people satisfied with the dictator's decision is slightly lower than two and the highest reduction in dissatisfied group members can be seen at around two group members dissatisfied respectively. \end{itshape} \medskip \\* - This expectation is made because the assumption is made that in a real situation a group of four with having a few less than two satisfied members on average (with a dictator's decision) has enough room for improvement so that potentially three group members can be satisfied after the use of the recommender. Meaning that at least one more person is satisfied with the compromise. Potentially in some groups it might even be possible to then lift the last person from dissatisfaction towards a neutral attitude. A higher base satisfaction is assumed to reduce the possibility to make an additional group member satisfied. + This stems from the assumption that in a real situation a group of four with having a few less than two satisfied members on average (with a dictator's decision) has enough room for improvement so that potentially three group members can be satisfied after the use of the recommender. Meaning that at least one more person is satisfied with the compromise. Potentially in some groups it might even be possible to then lift the last person from dissatisfaction towards a neutral attitude. A higher base satisfaction is assumed to reduce the possibility to make an additional group member satisfied. \end{hypothesis} @@ -209,7 +211,7 @@ This section gives an overview over the hypothesis tested during data analysis. \begin{itshape} \label{hyp:Evaluation:HomogenousMoreSatisfied} Homogeneous groups have more satisfied members with the recommender's decision but also with the dictator's decision compared to heterogeneous groups. \end{itshape} \medskip \\* - As the interest in homogenous groups are more aligned there is an expectation that the overall hapiness levels for more homogenous groups is higher. If the base level is higher already it is likely that even just a slight increase lifts recommendations for homogenous groups to satisfaction levels not reachable by heterogeneous groups. + As the interest in homogenous groups are more aligned there is an expectation that the overall satisfaction levels for more homogenous groups is higher. If the base level is already higher it is likely that even just a slight increase lifts recommendations for homogenous groups to satisfaction levels not reachable by heterogeneous groups. \end{hypothesis} \begin{hypothesis} @@ -238,7 +240,7 @@ This section gives an overview over the hypothesis tested during data analysis. \subsection{Threshold Center Selection} -In this section the goal is to find a $tc$ parameter for the analysis. This is needed to reduce dimensionality of data that has to be looked at and to get results of value. Therefore all parameters except $tc$ will be fixed. The preference aggregation strategy looked at is multiplication because this strategy shows good results across the board when briefly looking at the generated data. The configuration database is used with all possible solutions (which is 148 in total). This results in a bigger visible effect in terms of satisfaction and dissatisfaction change as the recommender has access to all possible configurations and also provides more solid and predictable results. \autoref{fig:Evaluation:tcChange} shows the satisfaction change based on choice of $tc$. Of note is that the maxima of satisfaction change precedes the minima of dissatisfaction change for all group types. Maxima and minima occur at different tc values depending on the group type. Heterogeneous groups peek earliest while homogenous groups only show a peek towards the maximum $tc$ value. Changes in dissatisfaction are minimal even with $tc$ close to its maximum value for homogeneous groups. \autoref{fig:Evaluation:tcCount} shows the amount of group members satisfied and dissatisfied with the dictator's decision. The number of satisfied people decreases with an increasing $tc$ and its downward movement accelerates. The dissatisfaction curve shows a similar trend in reverse. Here the number of dissatisfied group members increases with an increase in $tc$. The curve accelerates its growth analogous to the acceleration of the satisfaction curve. The behaviour of heterogeneous groups and random groups is similar but the curve for heterogeneous groups show less satisfaction and more dissatisfaction for a given tc. Also both curves have a negative satisfaction change when $tc$ reaches a certain height. Homogeneous groups only have happy group members for most $tc$ values but they decrease rapidly for values greater $85$. Dissatisfied group members are at zero for the whole value range of $tc$ except a very slight upward tick at the end that is barely noticeable. +In this section the goal is to find a $tc$ parameter for the analysis. This is needed to reduce dimensionality of data that has to be looked at and to get results of value. Therefore all parameters except $tc$ will be fixed. The preference aggregation strategy looked at is multiplication because this strategy shows good results across the board when briefly looking at the generated data. The configuration database is used with all possible solutions (which is 148 in total). This results in a bigger visible effect in terms of satisfaction and dissatisfaction change as the recommender has access to all possible configurations and also provides more solid and predictable results. \autoref{fig:Evaluation:tcChange} shows the satisfaction change based on choice of $tc$. Of note is that the maxima of satisfaction change precedes the minima of dissatisfaction change for all group types. Maxima and minima occur at different tc values depending on the group type. Heterogeneous groups peek earliest while homogenous groups only show a peek towards the maximum $tc$ value. Changes in dissatisfaction are minimal even with $tc$ close to its maximum value for homogeneous groups. \autoref{fig:Evaluation:tcCount} shows the amount of group members satisfied and dissatisfied with the dictator's decision. The number of satisfied people decreases with an increasing $tc$ and its downward movement accelerates. The dissatisfaction curve shows a similar trend in reverse. Here the number of dissatisfied group members increases with an increase in $tc$. The curve accelerates its growth analogous to the acceleration of the satisfaction curve. The behaviour of heterogeneous groups and random groups is similar but the curve for heterogeneous groups show less satisfaction and more dissatisfaction for a given tc. Also both curves have a negative satisfaction change when $tc$ reaches a certain height. Homogeneous groups only have satisfied group members for most $tc$ values but they decrease rapidly for values greater $85$. Dissatisfied group members are at zero for the whole value range of $tc$ except a very slight upward tick at the end that is barely noticeable. \begin{figure} \centering @@ -258,7 +260,7 @@ In this section the goal is to find a $tc$ parameter for the analysis. This is n The predicted trend that a higher $tc$ results in a lower satisfaction and a higher dissatisfaction with the dictator's decision, as predicted by \autoref{hyp:Evaluation:HigherTcLessSatisfied}, can be clearly seen in \autoref{fig:Evaluation:tcCount} and has been described in this section already. This means for the evaluation that the behaviour of the recommender is predictable and suggests that used metrics are modelling behaviour expected in reality. -\autoref{hyp:Evaluation:OnlyOneSatisfied} predicts that the satisfaction with the individual decision eventually reaches one and that no one is satisfied with the group recommender decision. This means the satisfaction change should reach minus one. \autoref{fig:Evaluation:tcCount} shows a downward trend that comes close to one for heterogeneous and random groups. Therefore, the trend suggests that the hypothesis holds with regard to heterogeneous and random groups but as the drop for homogenous groups just reaches below $2.8$ suggesting that the hypothesis does not hold for homogenous groups. Also, satisfaction change in heterogeneous groups reaches close to minus one but this value is neither reached by random groups, nor by homogenous groups. The hypothesis therefore holds true only for heterogeneous groups. A likely cause why it does not seem to hold true for random or homogenous groups is that as the highest tc value still includes multiple configurations and a recommended configuration keeps some group members happy for some of the time. Also possibly for random groups another group member than the dictator could be satisfied with the group decision. +\autoref{hyp:Evaluation:OnlyOneSatisfied} predicts that the satisfaction with the individual decision eventually reaches one and that no one is satisfied with the group recommender's decision. This means the satisfaction change should reach minus one. \autoref{fig:Evaluation:tcCount} shows a downward trend that comes close to one for heterogeneous and random groups. Therefore, the trend suggests that the hypothesis holds with regard to heterogeneous and random groups but as the drop for homogenous groups just reaches below $2.8$ suggesting that the hypothesis does not hold for homogenous groups. Also, satisfaction change in heterogeneous groups reaches close to minus one but this value is neither reached by random groups, nor by homogenous groups. The hypothesis therefore holds true only for heterogeneous groups. A likely cause why it does not seem to hold true for random or homogenous groups is that as the highest tc value still includes multiple configurations and a recommended configuration keeps some group members satisfied for some of the time. Also possibly for random groups another group member than the dictator could be satisfied with the group decision. During a group decision it is better to make one less person dissatisfied opposed to one more person satisfied. Therefore, this thesis uses $tc$ values that are closer to the minima of dissatisfaction change than to the maxima of satisfaction change. The minima for heterogeneous groups is at $tc = 70\%$ therefore this is the chosen value for evaluation of the remaining hypotheses. This is needed because otherwise analysis would be infeasible due to the parameter space being too large. For random groups the minima of dissatisfaction change can be found at $tc = 85\%$ which is the value used for all following analysis of random groups. For homogenous group dissatisfaction change is decreasing until the highest possible value of $tc$ is reached. Because of that $tc = 94\%$ is used for analysis. @@ -285,9 +287,9 @@ During a group decision it is better to make one less person dissatisfied oppose \label{fig:Evaluation:HomoSatisfaction} \end{figure} -This subsection holds fixed parameters of $tc$. It describes the satisfaction change and the total amount of satisfied people with the recommenders decision dependent on the amount of stored configurations. For clarity reasons not all graphs of the data are included. The missing graphs can be found in the appendix and have references to them. +This subsection holds fixed parameters of $tc$. It describes the satisfaction change and the total amount of satisfied people with the recommenders decision dependent on the amount of stored configurations. -\autoref{fig:Evaluation:HeteroSatisfaction} shows the relationship between the satisfaction and dissatisfaction and the number of stored configurations. The left y-axis shows the change in satisfaction compared to a decision made by a dictator. The right axis shows the average number of group members. The left figure shows numbers for satisfaction and the right for dissatisfaction. On the left higher numbers are better and on the right lower ones (in regards to change). There are three graphs each. One for multiplication, one for least misery and one for best average. The graphs for satisfaction are similar to a logarithmic curve. The increase in change of satisfaction decelerates with a higher number of stored configurations. The change in satisfaction is always above zero and a satisfaction increase of more than three quarters of the maximum can already be seen at around 25 stored configurations. Moreover, the curve for multiplication is greater than all other curves for all parameters. Least misery reaches the lowest amount of change across all values. The minimum number of satisfaction change is $0$ for least misery, and $0.1$ for best average and multiplications. The highest number is around $0.3$ for least misery, $0.4$ for best average and $0.5$ for multiplication +\autoref{fig:Evaluation:HeteroSatisfaction} shows the relationship between the satisfaction and dissatisfaction and the number of stored configurations. The left y-axis shows the change in satisfaction compared to a decision made by a dictator. The right axis shows the average number of group members being satisfied. The left figure shows numbers for satisfaction and the right for dissatisfaction. On the left higher numbers are better and on the right lower ones (with regards to change). There are three graphs each. One for multiplication, one for least misery and one for best average. The graphs for satisfaction are similar to a logarithmic curve. The increase in change of satisfaction decelerates with a higher number of stored configurations. The change in satisfaction is always above zero and a satisfaction increase of more than three quarters of the maximum can already be seen at around 25 stored configurations. Moreover, the curve for multiplication is greater than all other curves for all parameters. Least misery reaches the lowest amount of change across all values. The minimum number of satisfaction change is $0$ for least misery, and $0.1$ for best average and multiplications. The highest number is around $0.3$ for least misery, $0.4$ for best average and $0.5$ for multiplication When looking at dissatisfaction change the graphs are all in the negative number range. Multiplication reaches the lowest number and best average the highest. The gap between all three functions is less than that of satisfaction increase. And overall the curves are flatter meaning the change with 25 stored configurations already reaches close to five sixth of the minimum value. The highest number of satisfaction change is $-0.4$ for all strategies meanwhile the lowest number is around $-0.57$ for least misery, $-0.53$ for best average and $-0.63$ for multiplication. The figures for homogenous (\autoref{fig:Evaluation:HomoSatisfaction}) and random groups (\autoref{fig:Evaluation:RandomSatisfaction}) have a similar shape but their values and slope vary. The satisfaction change for homogenous groups is mostly negative, starting at $-2$, and only reaches a positive level for more than $100$ stored configurations with a value of $0.04$. Multiplication and best average have higher values than least misery here, too. Moreover the dissatisfaction change is always positive with a value range of $[0,1]$, except it slightly falls below zero after more than $75$ configurations are stored. @@ -302,7 +304,7 @@ Random groups have less overall satisfaction with $tc = 85\%$ as seen in \autore \subsection{Discussion} After description of the data the remaining hypotheses are discussed. -\autoref{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied member's with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show that for satisfaction with the recommender's decision this does not hold when looking at $tc$ values where the recommender performs best for each segment. In those places the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups achieve a higher satisfaction. This likely happens because of similarity between group members. A recommender with imperfect knowledge and a, in size, reduced configuration database gives results that are not good enough and cannot compete with the dictator who always finds the perfect individual match that group member's of homogeneous groups are satisfied with. It is important to note, when the same $tc$ values are used homogenous groups have a higher amount of satisfied people across the board. +\autoref{hyp:Evaluation:HomogenousMoreSatisfied} states that homogenous groups have more satisfied member's with regards to the dictator's and the group recommender's decision. \autoref{fig:Evaluation:tcCount} shows that this holds true for dictator's decision as for every instance satisfaction in homogeneous groups is higher than that of other groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show that for satisfaction with the recommender's decision this does not hold when looking at $tc$ values where the recommender performs best for each segment. In those places the homogenous group only reaches the highest amount of satisfaction when the recommender has access to all stored configurations. With a decreasing number of stored configurations both random groups and heterogeneous groups achieve a higher satisfaction. This likely happens because of similarity between group members. A recommender with imperfect knowledge and a, in size, reduced configuration database gives results that are not good enough and cannot compete with the dictator who always finds the perfect individual match that group members of homogeneous groups are satisfied with. It is important to note, when the same $tc$ values are used homogenous groups have a higher amount of satisfied people across the board. \autoref{hyp:Evaluation:HeterogenousBiggerSatisfactionIncrease} states that the increase in satisfaction should be bigger for more heterogeneous groups. However, \autoref{fig:Evaluation:HeteroSatisfaction}, \autoref{fig:Evaluation:HomoSatisfaction} and \autoref{fig:Evaluation:RandomSatisfaction} show this to be not true. The recommendations for heterogeneous groups indeed cause a larger change in satisfaction compared to homogeneous groups but random groups cause a positive change of higher magnitude. Also the decrease in dissatisfaction is higher among random groups. This possibly happens due to random groups having interest that are more aligned and their preferences among group membes therefore do not diverge as much, therefore resulting in compromises for the group that can satisfy more individual members. Also the group preferences are still far apart enough to cause enough dissatisfaction and neutrality with the dictator's decision.