From 075ef5101583d1a5891846754a95dacc6e4763b1 Mon Sep 17 00:00:00 2001 From: "hannes.kuchelmeister" Date: Fri, 8 May 2020 19:44:32 +0200 Subject: [PATCH] fix minor mistakes and add some todos --- 30_Thesis/sections/60_evaluation.tex | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/30_Thesis/sections/60_evaluation.tex b/30_Thesis/sections/60_evaluation.tex index 0ca0d3a..117fb33 100644 --- a/30_Thesis/sections/60_evaluation.tex +++ b/30_Thesis/sections/60_evaluation.tex @@ -3,6 +3,8 @@ In this chapter the prototype is evaluated in terms of its functionality and its properties. The evaluation is an offline evaluation with synthetic data. All possible valid configurations are generated for one use case, i.e. all possible valid configurations for the forestry use case. Moreover, groups with explicit preferences and a configuration state (which, e.g. would be the currently existing forest) are generated, too. +\todo[inline]{kernaussage vorweggreifen} + \section{Metric} \label{sec:Evaluation:Metrics} @@ -68,7 +70,7 @@ To evaluate the recommender, a use case is needed. In this thesis, a forestry us & high & & & & & & & & & n & & n & n & & & n & & & & - & - & - \\ \hline \end{tabularx} - \caption[Forestry Use Case: Constraints]{Constrains in \emph{not with} form for the forestry use case.} + \caption[Forestry Use Case: Constraints]{Constraints in \emph{not with} form for the forestry use case.} \label{tab:Evaluation:UseCase} \end{center} \end{table} @@ -176,12 +178,14 @@ The natural group type for the use case is a heterogeneous group but to widen th \subsection{The Effect of Stored Finished Configurations} -Another important component of the evaluation is the influence of stored finished configurations. When evaluating a subset of stored finished configurations it is important to avoid outliers. This is the reason why a process inspired by \emph{cross validation} \todo{referenz hinzufügen} is used. The configuration database is randomly ordered and sliced into sub-databases of the needed size. As an example, if the evaluated stored data size is 20, a configuration database containing 100 configurations is split into five sub-databases of size 20. Now the evaluation is carried out for each of the sub-databases and finally the average is determined. This avoids the random picking of a subset which either performs much better than most other possible combinations of databases or which performs much worse. This way the data is more aligned to the \emph{expected value}. \todo{referenz} +Another important component of the evaluation is the influence of stored finished configurations. When evaluating a subset of stored finished configurations it is important to avoid outliers. This is the reason why a process inspired by \emph{cross validation} \todo{referenz hinzufügen} is used. The configuration database is randomly ordered and sliced into sub-databases of the needed size. As an example, if the evaluated stored data size is 20, a configuration database containing 100 configurations is split into five sub-databases of size 20. Now the evaluation is carried out for each of the sub-databases and finally the average is determined. This avoids the random picking of a subset which either performs much better than most other possible combinations of databases or which performs much worse. This way the data is more aligned to the expected value. \section{Hypotheses} \label{sec:Evaluation:Hypotheses} +\todo{define dictator} + This section gives an overview on the hypotheses tested during data analysis. Each hypothesis is followed by an explanation as to why the hypothesis is presented. In later sections the truthfulness of the hypothesis is examined. This allows to verify if expectations about the behaviour of the recommender are true or false. \begin{hypothesis} @@ -290,12 +294,12 @@ This subsection fixes parameters of $tc$. It describes the satisfaction change a \autoref{fig:Evaluation:HeteroSatisfaction} shows the relationship between satisfaction and dissatisfaction and the number of stored configurations. The left y-axis shows the change in satisfaction compared to a decision made by a dictator. The right axis shows the average number of satisfied group members. The left figure shows numbers for satisfaction and the right for dissatisfaction. On the left, higher numbers are better and on the right, lower ones (with regards to change). There are three graphs each. One for multiplication, one for least misery and one for best average. The graphs for satisfaction are similar to a logarithmic curve. The increase in change of satisfaction slows with a higher number of stored configurations. The change in satisfaction is always above zero and a satisfaction increase of more than three quarters of the maximum can already be seen at around 25 stored configurations. Moreover, the curve for multiplication is greater than all other curves for all parameters. Least misery reaches the lowest amount of change across all values. The minimum number of satisfaction change is $0$ for least misery, and $0.1$ for best average and multiplications. The highest number is around $0.3$ for least misery, $0.4$ for best average and $0.5$ for multiplication When looking at dissatisfaction change the graphs are all in the negative number range. Multiplication reaches the lowest number and best average the highest. The gap between all three functions is less than that of satisfaction increase. And overall the curves are flatter indicating that the change with 25 stored configurations already reaches close to five sixth of the minimum value. The highest number of satisfaction change is $-0.4$ for all strategies while the lowest number is around $-0.57$ for least misery, $-0.53$ for best average and $-0.63$ for multiplication. -The figures for homogenous (\autoref{fig:Evaluation:HomoSatisfaction}) and random groups (\autoref{fig:Evaluation:RandomSatisfaction}) have a similar shape but their values and slopes vary. The satisfaction change for homogenous groups is mostly negative, starting at $-2$, and only reaches a positive level for more than $100$ stored configurations with a value of $0.04$. Multiplication and best average have higher values than least misery here too. Moreover the dissatisfaction change is always positive with a value range of $[0,1]$, except for a slight fall below zero after more than $75$ configurations are stored. +The graphs for homogenous (\autoref{fig:Evaluation:HomoSatisfaction}) and random groups (\autoref{fig:Evaluation:RandomSatisfaction}) have a similar shape but their values and slopes vary. The satisfaction change for homogenous groups is mostly negative, starting at $-2$, and only reaches a positive level for more than $100$ stored configurations with a value of $0.04$. Multiplication and best average have higher values than least misery here too. Moreover the dissatisfaction change is always positive with a value range of $[0,1]$, except for a slight fall below zero after more than $75$ configurations are stored. Random groups as seen in \autoref{fig:Evaluation:RandomSatisfaction} mostly have a positive change in satisfaction. Here, values range from $-0.55$ to $0.27$ for least misery, from $-0.27$ and $-0.28$ to $0.74$ for best average and multiplication. The change is higher than the change for heterogeneous groups. Dissatisfaction also changes similarly to heterogeneous groups. Here the values for random groups reach a lower level. They range from $0$ to $-0.59$ for least misery. Multiplication and best average both have a minimum value around $-0.21$ and behave similarly. The range goes down to $-0.84$ for best average and $-0.86$ for multiplication. \autoref{fig:Evaluation:HeteroSatisfaction} also shows the average number of group members satisfied and dissatisfied with the recommender's decision. Satisfaction with the recommender's decision starts at $2.4$ and quickly reaches $2.65$ for least misery and $2.8$ for best average and multiplication. The highest value for multiplication is at $2.89$. Dissatisfaction likewise quickly plateaus. Here values for different recommenders are closer together. They start at $0.74$ (least misery) to $0.78$ (best average) and fall as low as $0.62$ for least misery, $0.66$ for best average and $0.56$ for multiplication. -When looking at the total numbers as shown in \autoref{fig:Evaluation:HomoSatisfaction} the value range for homogenous groups is much larger but the overall shape stays the same. Here satisfaction numbers go from $0.55$ to $2.95$. Least misery performs visibly worse than multiplication and best average reaches only $2.7$. Dissatisfaction values range from $1.21$ to $0.01$ and the values are not really visibly distinguishable, except in the range of $[25,50]$. Least misery seems to have the highest number of dissatisfied group members in this range. +When looking at the total numbers as shown in \autoref{fig:Evaluation:HomoSatisfaction} the value range for homogenous groups is much larger but the overall shape stays the same. Here satisfaction numbers go from $0.55$ to $2.95$. \emph{Least misery} performs visibly worse than \emph{multiplication} and \emph{best average} reaches only $2.7$. Dissatisfaction values range from $1.21$ to $0.01$ and the values are not really visibly distinguishable, except in the range of $[25,50]$. Least misery seems to have the highest number of dissatisfied group members in this range. Random groups have less overall satisfaction with $tc = 85\%$ as seen in \autoref{fig:Evaluation:RandomSatisfaction} when looking at the total numbers. Satisfaction numbers start from $1.33$ (least misery), $1.61$ (best average) and $1.6$ (multiplication) and go up to $2.15$ for least misery and $2.62$ for best average and multiplication. The dissatisfaction numbers start at $1.5$ for least misery and $1.27$ for best average and multiplication and level off at $0.9$ (least misery), $0.65$ (best average) and $0.63$ (multiplication). There is a big difference visible between least misery and the other two aggregation functions.