bachelor_thesis/30_Thesis/sections/60_evaluation.tex

\chapter{Evaluation}
\label{ch:Evaluation}

In this chapter the prototype is evaluated in terms of its functionality and its properties.

We will generate all possible valid configurations for one use case i.e. generate all possible valid configurations for the forest use case.

Generate groups with preferences (explicit preferences) and configuration state (which would be for example the currently existing forest).

\section{Group Types During Evaluation}
\label{sec:Evaluation:GroupTypes}

\begin{itemize}
    \item Groups shall be generated with random preferences
    \item With grouped preferences: people adhere more or less to one profile (Forest Owner, Athlete, Consumer, Environmentalist)
    \item Group of only one profile type: rather homogenous group
\end{itemize}

\section{Metrics}
For the evaluation metrics to evaluate by are needed.

\label{sec:Evaluation:Metrics}

\subsection{Random Individual}
When comparing a group to individual scores, a member of the group is randomly chosen. As metric of difference used metrics are MSE, RMSE and similar metrics.
% see: https://medium.com/@george.drakos62/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0 or https://en.wikipedia.org/wiki/Error_metric

\subsection{Satisfaction}
As a metric on overall satisfaction within the group we propose a threshold metric that defines a user as satisfied if his score is above a threshold of 60\% and as unsatisfied with a score of less than 40\%. Now we can measure group satisfaction by amount of members being satisfied, neutral and unsatisfied.

\subsection{Group Score}
The group score metric is to simply take the score the recommender has given to a group. This score can be compared with other configurations' score.


\section{Questions to Answer During the Evaluation}
\label{sec:Evaluation:Questions}

\begin{itemize}
    \item Main question: How does the group decision differ from the decision of a single decision maker? (random individual)
    \item How much impact does the configuration state have on the recommendation outcome? (random individual, satisfaction)
    \item How many group members are satisfied by the group decision on average? (satisfaction)
    \item Is the recommender fair, i.e. no user type is always worse off than others? (satisfaction)
    \item How does the amount of stored finished configurations relate to recommendation quality? (Compare the recommended configuration to the best possible configuration in terms of score using MSE and similar and in terms of satisfaction and random individual)
    \item How much higher is the score of the best configuration compared to average? (group score, satisfaction, random individual)
\end{itemize}

\section{Generating Data}
\label{sec:Evaluation:GeneratingGroups}

For the forest use case, the idea is that there are multiple types of user profiles. Each group profile is represented by a neutral, negative or positive attitude to an attribute value. Now during data generation the attitude is converted to a preference using a normal distribution. \autoref{fig:Evaluation:DataGeneration} shows how we convert the user profile to preferences.

\pgfplotsset{height=5cm,width=\textwidth,compat=1.8}
\pgfmathdeclarefunction{gauss}{2}{%
  \pgfmathparse{1/(#2*sqrt(2*pi))*exp(-((x-#1)^2)/(2*#2^2))}%
}
\begin{figure}
    \begin{tikzpicture}
        \begin{axis}[
            every axis plot post/.append style={
                mark=none, domain=0:1, samples=50, smooth
            },
            axis x line*=bottom,
            xmin=0,
            xmax=1,
            ymin=0.1,
            xticklabel style={
                /pgf/number format/precision=3,
            },
            xtick={0,0.25, 0.5, 0.75,1},
            hide y axis]
          \addplot [draw=red][very thick] {gauss(0.25,0.1)} node[text=red][above,pos=0.5] {negative};
          \addplot [draw=blue][very thick] {gauss(0.5,0.05)} node[text=blue][above,pos=0.48] {neutral};
          \addplot [draw=green!60!black][very thick] {gauss(0.75,0.1)} node[text=green!60!black][above,pos=0.5] {positive};
        \end{axis}
        \end{tikzpicture}
 \caption{Distribution of preferences for a user type.}
\label{fig:Evaluation:DataGeneration}
\end{figure}

These user profiles can be used to generate rather homogenous groups but also to create groups that have interests that are more conflicting. For completely random groups a uniform distribution is used to create more chaotic groups. The whole process is shown in \autoref{fig:Evaluation:GeneratingDataProcess}.


\begin{figure}
    \centering
    \includegraphics[width=1\textwidth]{./figures/60_evaluation/bpmn_evaluation_input_data_generation.pdf}
    \caption{The process used for generating data for the evaluation.}
    \label{fig:Evaluation:GeneratingDataProcess}
\end{figure}