\chapter{Concept}
\label{ch:Concept}

\section{Foundations Extension}
\label{sec:Concept:Requirements}

The definitions described in \autoref{ch:Foundations} need to be extended for this thesis. This section adds definitions that are needed.

\subsection{Configuration State}

A \emph{configuration} $S$ will be defined as a tuple of variables (\autoref{eq:Foundations:ProductConfiguration:Variables}) and their corresponding domain value with
\begin{equation} \label{eq:Foundations:ProductConfiguration:ConfigurationState}
    S = \{ (v_i,\ d) \ |\ v_i \in V \ \land \ d \in \mathfrak{D}(i),\ i=1,\dotsc,m \}.
\end{equation}
Essentially it is a set of variables and assigned values.

\subsection{Finished Configuration}
To define what a \emph{finished configuration} is, it is required to first define what it means for a configuration to be valid. Therefore $is\_valid$ is defined as
\begin{equation} \label{eq:Foundations:ProductConfiguration:IsValid}
    is\_valid : S \to \{true, false\}; x \mapsto 
    \begin{cases}
        true, & S \in solution\_space \\
        false, & \text{otherwise}
    \end{cases},
\end{equation}
with $solution\_space$ being the solution space of the corresponding constraint satisfaction problem. A \emph{finished configuration} $S_F$ is a configuration that contains all variables and is a valid configuration:
\begin{equation} \label{eq:Foundations:ProductConfiguration:FinishedConfiguration}
    S_F \subset S,\ where \ \forall v_i \in V (\exists (v_i, d) \in S_F : d \in \mathfrak{D}(i)) \land is\_valid(S_F).
\end{equation}
In practice a finished configuration of a product is something that is ready to be produced. For example if a car is being configured, this means that the car can be produced in the specified way that is given by the finished configuration.


\subsection{Group-Based Product Configuration}
\label{sec:Foundations:GroupBasedProductConfiguration}

Instead of a single person configuring a product, a group of people is configuring one product which can be useful in multi-stakeholder decisions. This setting needs mechanisms for describing the preferences of multiple people. Therefore there will be introduced a set of users $U$ with
\begin{equation}\label{eq:Foundations:ProductConfiguration:Users}
    U = \{1, \dotsc, n\},
\end{equation}
and a user's \emph{utility function} that maps a domain value to a utility value and is only known to the user
\begin{equation}
    \begin{split}
        u_i(d_j), \qquad \text{where}\ & d_j \in  \mathfrak{D}(j),\\
        & 1 <= j <= m, \\
        & 1 <= i <= n .
    \end{split}
\end{equation}

\subsection{Group Recommender}

For a group recommender system additional definitions are needed. The attitude of a user is represented by their preferences $P$ which is directly related to the utility a user has from a domain value being present in the configuration. Let 
\begin{gather} \label{tab:Foundations:GroupRecommenderSystem:Preferences}
    P = \{ P_1, \dotsc, P_n\},\ \text{where} \\
    P_i = \{(d,\ u_i(d)) \ | \ \forall d \in \mathfrak{D}(i),\ i=1,\dotsc,m \} \notag
\end{gather}


\section{Requirements}
\label{sec:Concept:Requirements}

\todo[inline]{sind alles muss-Anforderungen, oder auch kann-Anforderungen?

Gibt es am Ende ein Kapitel, wo du das Erfüllen der Anforderungen bewertest?}

This section lists requirements that are considered and implemented in this thesis \todo{genauer: auf was beziehen sich die Requirements? -> auf den Gruppen Recommender}. 

\begin{itemize}
    \item Have a simple user interface that only uses three discrete states (like, neutral, dislike). \todo[]{passt das zu deinen kontinuierlichen Nutzen?}
    \item The recommender should support a continuous value range for preferences
    \item The recommendation engine can be used without proprietary software.
    \item Give recommendations to a group based on their preferences and the current configuration state.
    \item The system supports multiple users at the same time.
    \item The system should be able to work with other configuration systems.
    \item The system should take the current configuration state into account.
    \item Recommendations should allow different scoring functions.
    \item Recommendations should always be valid solutions.
    \item They system has to respond in a timely manner.
\end{itemize}


\section{Assumptions}
\label{sec:Concept:Assumptions}

Due to a thesis having limited resources, some assumptions have to be made. The assumptions made are listed in this section.
\todo[inline]{es wäre gut, die Annahmen kurz zu erläutern: warum werden sie getroffen, wie wird die Arbeit dadurch einfacher, welchen Effekt hat das auf die Ergebnisse / wie steht es um die Allgemeingültigkeit der Ergebnisse?}

\begin{itemize}
    \item Only one product/solution is supposed to be configured at the same time by one group.
    \item Features only support single value attributes.
    \item Users join the system and start configuring only after all group members have joined.
    \item The user interface is for demoing purposes.
    \item Speed and optimization of the system is not a high priority.
\end{itemize}

\section{User Interaction with the System}
\label{sec:Concept:UserSystemInteraction}

The system has one main way to be used as defined in \autoref{tab:Concept:MainUseCase}. This process is also visualized in \autoref{fig:Concept:ConfigurationProcess}.
hier braucht es noch eine genauere textliche Beschreibung, was in der Grafik zu sehen ist. Auch: woher kommt diese Prozessdefinition? Wurde sie aus den Requirements und Einschränkungen abgeleitet?

\begin{figure}
    \centering
    \includegraphics[width=1\textwidth]{./figures/40_concept/bpmn_configuration_process_with_continious_recommendation.pdf}
    \caption{A BPMN diagram of the configuration process.}
    \label{fig:Concept:ConfigurationProcess}
\end{figure}

\begin{table}
    \begin{center}
        \begin{tabularx}{\columnwidth}{l|X}
            \multicolumn{2}{c}{Main System Usage} \\
            \hline
            Preconditions   & 
                \begin{itemize}
                    \item The configurator is opened with the same session on each of the group member's machines
                    \item The configuration is in an unfinished state (this state is a consensus state)
                \end{itemize} \\
            \hline
            Postcondition   & 
                \begin{itemize}
                    \item All users have entered their preferences for each attribute explicitly.
                    \item The system gives a recommendation based on all preferences and the unfinished configuration state
                \end{itemize} \\
            \hline
            Basic Flow      & 
                \begin{enumerate}
                    \item A user indicates a preference for an attribute
                    \item The system generates a recommendation (based on preferences and configuration status)
                    \item If not all users have given their preferences go to step 1.
                \end{enumerate} \\
            \hline
        \end{tabularx}
        \caption{A description of the main way users will interact with the system}
        \label{tab:Concept:MainUseCase}
    \end{center}
\end{table}

\section{Case Study}
\label{sec:Concept:CaseStudy}

The case study used in this thesis is a simplified version from forestry \todo[]{hier evtl ergänzen: wo kommt der Use Case her / aus welchem Forschungsprojekt / warum ist er interessant?}.
The used characteristics and attributes are shown in \autoref{fig:Concept:ForestExample}. Additionally as example are given preferences, a configuration state and a finished configuration.

\begin{figure}
    \begin{mdframed}[frametitle={Example for Forest Use Case}, linecolor=black, frametitlerulecolor=black, frametitlebackgroundcolor=gray!5]
        In this example there are a small group of users. The use case is a piece of forest and variables are for example harvesting activity, which trees to grow and accessibility for people.
        \begin{align}
            \begin{split}
                V = \{ & \textit{indigenous}, \textit{resilient}, \textit{usable}, \textit{effort}, \textit{quantity}, \textit{price}, \textit{accessibility} \},
            \end{split} \notag \\
            \mathfrak{D}(\textit{indigenous}) =  \{ & \text{low}, \text{moderate}, \text{high}\}, \notag \\
            \mathfrak{D}(\textit{resilient}) = \{ & \text{low}, \text{moderate}, \text{high}\}, \notag \\
            \mathfrak{D}(\textit{usable}) = \{ & \text{low}, \text{moderate}, \text{high}\}, \notag \\
            \mathfrak{D}(\textit{effort}) = \{ & \text{manual}, \text{harvester}, \text{autonomous}\}, \notag \\
            \mathfrak{D}(\textit{quantity}) = \{ & \text{low}, \text{moderate}, \text{high}\}, \notag \\
            \mathfrak{D}(\textit{price}) = \{ & \text{low}, \text{moderate}, \text{high}\}, \notag\\
            \mathfrak{D}(\textit{accessibility}) = \{ & \text{low}, \text{moderate}, \text{high}\},\notag \\
            U = \{ & 1,2\} \notag\\
            P = \{ & P_1, P_2\} \notag\\
            \begin{split}
                P_1 = \{ & (\text{manual}, 0.8), (\text{harvester}, 0.3) \} \\ 
                & \cup \{ (d,0)\ |\ d \in \mathfrak{D}(i),\ i \in V,\ i \notin \{ \text{manual}, \text{harvester}\} \ \} \ 
            \end{split} \notag \\
            P_2 = \{ & (d,0)\ |\ d \in \mathfrak{D}(i),\ i \in V \} \notag \\
            S  =  \{ & (\textit{indigenous}, \text{low}), (\textit{quantity}, \text{moderate}) \} \notag \\
            \begin{split}
            S_F  =  \{ & (\textit{indigenous}, \text{low}), (\textit{resilient}, \text{low}), (\textit{usable},\text{low}), (\textit{effort}, \text{manual}), \\
            & (\textit{quantity}, \text{low}), (\textit{price},\text{high}),(\textit{accessibility}, \text{low}) \} 
            \end{split} \notag
        \end{align}
    \end{mdframed}
    \caption{An example of a forest use case that includes two people.}
    \label{fig:Concept:ForestExample}
\end{figure}


\section{Solution Generation}
\label{sec:Concept:SolutionGeneration}

Given an unfinished configuration and preferences of all group members, rate a finished configuration on how well it reflects the configuration state and preferences. Use this to choose the best finished configuration out of a list to recommend. This approach is an aggregated preference strategy of ranking of candidate items (see \autoref{sec:Foundations:GroupRecommenderSystem}).

\subsection{Generating a Recommendation}

Hereby the idea is there is a database of complete configurations (possibly historic from other groups or automatically generated or both).
Now the recommendation procedure looks as follows:

\begin{enumerate}
    \item Assign a score to each stored configuration according to $$score_{group}(\overline{configurationState},\ \overline{preferences}, \ configurationInStore)$$
    \item Optional: Filter out configurations that have a score below a certain value using a different scoring function. For example filter out configurations that cause a certain level of misery.
    \item Chose the configuration with the highest score as recommendation.
\end{enumerate}

\subsection{Scoring Function}
\label{subsec:Concept:SolutionGeneration:ScoringFunction}

\emph{Group configuration scoring function} includes preferences and current configuration state. This function gives a score for a finished configuration (while using the current configuration state and all user preferences):
\begin{equation}
    score_{group}: S \times P \times S_F \to \mathbb{R}
\end{equation}

An example group configuration scoring function is $score_{group}$ with
\begin{equation}
    score_{group}(\overline{s},\ \overline{p},\ s) = score(\overline{p},\ s) \cdot penalty(\overline{s},\ s)
\end{equation}

This thesis will use multiple scoring functions. Among those are ones for least misery, average and multiplicative which all are implemented by $score$. Average and multiplicative yield good results among the studies presented by \citeauthor{Masthoff2015} \cite{Masthoff2015}. Strategies can also be combined, one example here is average without misery. The scoring functions used for this thesis all combine $penalty$ and $score$ by multiplication. However it is possible to use other combination strategies and it is possible to combine multiple scoring functions into one group scoring function. This thesis will use simpler scoring functions that are not combined but improvement here is possible.

\subsubsection{Preference Scoring}

\todo[inline]{possibly remove distance from average scoring from thesis}

All of the aggregation functions mentioned in \autoref{subsec:Concept:SolutionGeneration:ScoringFunction} have one preference per product. For configuration where a preference for all characterises exists there needs to be a function that combines the preferences of one user into her configuration score. After one score has been calculated per user the mentioned preference aggregation strategies can be used.
This thesis proposes two different scoring functions. First, to use the difference from the selected characteristic compared to the average rating of all characteristics of the corresponding feature. This approach includes all preferences of a user meaning a preference is also seen relative to other preferences.

As an example a feature could be
\begin{equation}
    F = \text{ClimateResilientTrees},
\end{equation} with characteristics
\begin{equation}
    \mathfrak{D}(F)= \{\text{low}, \text{medium}, \text{high}\},
\end{equation}
preferences
\begin{equation}
    P_1 = \{(\text{low}, 0), (\text{medium},0.6), (\text{high},0.9) \}
\end{equation} 
and the configuration that is supposed to be rated
\begin{equation}
    S_F = \{(\text{ClimateResilientTrees}, \text{high})\}.
\end{equation}
The average rating for the feature is $F = 0.5$. Therefore the score given by a user $1$ is $0.9-0.5 = 0.4$.
A second user with preferences 
\begin{equation}
    P_2 = \{(\text{low}, 0), (\text{medium},0), (\text{high},0.9) \}
\end{equation} 
on the other hand results in a feature score of $0.9-0.3=0.6$. For this user characteristic \emph{high} is of higher importance.

As scores should be kept as percentages and not in the interval $[-1,1]$ a normalisation is applied by adding one and dividing by two. Therefore the respective scores are $0.7$ for user one and $0.95$ for user two. A configuration usually consists of more than one feature therefore an average rating over all features is taken to get the score one user gives to a configuration. Based on that score the in \autoref{subsec:Concept:SolutionGeneration:ScoringFunction} mentioned aggregation functions can be used.

The second simpler scoring function approach is to use the the preference for each characteristic that is part of the configuration and then use the average. This approach is more transparent because the preference of a user is directly translated into the score and no weighting is done. It means that a configuration score is more simple to understand and to calculate. However, if needed, for example to give one group member more power, it allows relative weighting too. This can be done with preprocessing of preferences. Moreover, an approach like this ensures that through preprocessing feature weights can be added. It is therefore possible that a user gives different importances to features. Also, other means of weighting ratings is possible. For example the ratings of one group member who has more knowledge in an area can be increased by multiplication with a factor or alternatively the preferences for all other users can be decreased.
The example above would not result in different feature scores for $P_1$ and $P_2$. Both would result in a score of $0.9$. Therefore there is a more direct link between a users preference and the score. 

The simplicity of the second approach in combination with transparency is why it is the approach that will be used in further chapters in this thesis, especially as trust in a recommendation system is important.

\subsubsection{Cofiguration Change Penalty}
\label{subsubsec:Concept:SolutionGeneration:ScoringFunction:Penalty}

In this thesis a penalty function is proposed which gives the percentage of characteristics that exist in the configuration that is to be rated. This value can be tuned to be more or less strict by potentiating. Thereby allowing more deviation or less deviation from the current configuration state. The penalty function is defined as
\begin{equation}
    \notag \alpha \in \mathbb{R}, \qquad     unchanged(d,\overline{s}, s) = 
    \begin{cases}
      1, & d \in \overline{s} \land d \in s \\
      0, & \text{otherwise}
    \end{cases}
\end{equation}
\begin{equation}
    penalty_{proportion}(\overline{s},\ s) =  \left(\frac{\sum_{d \in \overline{s}} unchanged(d,\overline{s}, s)}{|\overline{s}|}\right)^\alpha.
\end{equation}

By including the current configuration into scoring the scoring function can take into account, changes that have been already implemented and therefore might be very costly to change.

\section{Illustration}
\label{sec:Concept:Illustration}

This section gives an example to illustrate how the recommendation works. The example in \autoref{fig:Concept:ForestExample} is used for that but the preferences are extended. \autoref{tab:Concept:UseCaseConfigurations} shows the current configuration state which consists of the characteristic moderate for the feature \textit{indigenous} and  \textit{resilient} respectively. $S_{F1}$ to $S_{F4}$ show the stored configurations for this example. The features that will be focused on are \textit{indigenous}, \textit{resilient} and \textit{effort}. In the presented example $S_{F1}$ performs best. The exact reason for that will be presented here. $S_{F1}$ is compared to $S_{F2}$ to show the effect of divergence from the configuration state.  A comparison between $S_{F1}$  and $S_{F3}$ is done to show the difference between preferences and the effect on the score and last, $S_{F4}$ is done to show the effect of switching to better preferences but diverging from the current state. The configurations all differ to $S_{F1}$ in only one characteristic that is chosen differently. As aggregation strategy the \emph{average} metric is used. The parameter $\alpha$ (see \autoref{subsubsec:Concept:SolutionGeneration:ScoringFunction:Penalty}) is set to 1. A lower $\alpha$ reduces the penalty given to configurations that deviate from the configuration state $S$.

The difference between  $S_{F1}$ and  $S_{F2}$ is that instead of containing \emph{moderate} for the feature \emph{resilient} $S_{F2}$ contains \emph{high}. The scores for these two characteristics is the same as both users have rated them at $0.5$ but as $S_{F2}$ deviates from the configuration state there will be a penalty. There are two characteristics in the configuration state $S$ therefore the the penalty is $(\frac{1}{2})^\alpha = (\frac{1}{2})^1 = 0.5$. This means the score of $S_{F2}$ is half that of $S_{F1}$.

The only difference between $S_{F1}$ and $S_{F3}$ is that $S_{F3}$ changes the selection for the feature \emph{effort}. The characteristic \emph{manual} is chosen in $S_{F1}$ and the characteristic \emph{harvester} for $S_{F3}$. The individual score for user one increases as he prefers \emph{harvester} with $0.8$ over \emph{manual} with $0.6$. However, user two has an individual score reduction as her score changes from $0.8$ for \emph{manual} to $0.3$ for \emph{harvester}. The larger decrease in the score of user two causes a decrease in the overall score when comparing  $S_{F1}$ to $S_{F3}$. The scores for both users are closer together for $S_{F1}$ however this doesn't necessarily have to be the case because if the preference of user two for harvester were to change to $0.6$ both configurations would have the same score. A different user preference aggregation strategy can change that.

Last, $S_{F1}$ and $S_{F4}$ differentiate in terms of characteristic choice for the feature \emph{indigenous}. The switch from \emph{moderate} to \emph{high} when changing from $S_{F1}$ to $S_{F4}$ causes an increase in the individual scoring function of user two. This is caused because her preference for \emph{moderate} is $0.6$ and for \emph{high} is $0.9$. Yet, the change that causes the preference scoring function to give a higher score entails a penalty as the characteristic \emph{high} is not part of the configuration state. This penalty causes the overall score to drop far below that of $S_{F1}$.

\begin{table}
    \tiny
    \begin{tabularx}{\columnwidth}{C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|}
        & \multicolumn{3}{c|}{\textit{indigenous}} & \multicolumn{3}{c|}{\textit{resilient}} & \multicolumn{3}{c|}{\textit{usable}} & \multicolumn{3}{c|}{\textit{effort}} & \multicolumn{3}{c|}{\textit{quantity}} & \multicolumn{3}{c|}{\textit{price}} & \multicolumn{3}{c|}{\textit{accessibility}} \\
        \rotatebox[origin=c]{90}{\ preferences} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{manual} & \rotatebox[origin=c]{90}{harvester} & \rotatebox[origin=c]{90}{autonomous} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} \\
        \hline
        $P_1$   & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & \textbf{0.1} & \textbf{0.7} & \textbf{0.9} & \textbf{0.6} & \textbf{0.8} & \textbf{0.2} & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 \\
        $P_2$   & \textbf{0.1} & \textbf{0.6} & \textbf{0.9} & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & \textbf{0.8} & \textbf{0.3} & \textbf{0.1} & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 \\
    \end{tabularx}
    \caption{A table showing the preferences of an example for this section.}
    \label{tab:Concept:UseCaseRating}
\end{table}

\begin{table}
    \tiny
    \begin{tabularx}{\columnwidth}{C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|}
        & \multicolumn{3}{c|}{\textit{indigenous}} & \multicolumn{3}{c|}{\textit{resilient}} & \multicolumn{3}{c|}{\textit{usable}} & \multicolumn{3}{c|}{\textit{effort}} & \multicolumn{3}{c|}{\textit{quantity}} & \multicolumn{3}{c|}{\textit{price}} & \multicolumn{3}{c|}{\textit{accessibility}} \\
        \rotatebox[origin=c]{90}{\ configuration} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{manual} & \rotatebox[origin=c]{90}{harvester} & \rotatebox[origin=c]{90}{autonomous} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} & \rotatebox[origin=c]{90}{low} & \rotatebox[origin=c]{90}{moderate} & \rotatebox[origin=c]{90}{high} \\
        \hline
        $S_{\ \ }$         & - & \pmb{\checkmark} & - & - & \pmb{\checkmark} & - & - & - & - & - & - & - & - & - & - & - & - & - & - & - & - \\
        $S_{F1}$    & - & \pmb{\checkmark} & - & - & \pmb{\checkmark} & - & - & \checkmark & - & \pmb{\checkmark} & - & - & - & - & \checkmark & - & \checkmark & - & \checkmark & - & - \\
        $S_{F2}$    & - & \pmb{\checkmark} & - & - & - & \pmb{\checkmark} & - & \checkmark & - & \pmb{\checkmark} & - & - & - & - & \checkmark & - & \checkmark & - & \checkmark & - & - \\
        $S_{F3}$    & - & \pmb{\checkmark} & - & - & \pmb{\checkmark} & - & - & \checkmark & - & - & \pmb{\checkmark} & - & - & - & \checkmark & - & \checkmark & - & \checkmark & - & - \\
        $S_{F4}$    & - & - & \pmb{\checkmark} & - & \pmb{\checkmark} & - & - & \checkmark & - & \pmb{\checkmark} & - & - & - & - & \checkmark & - & \checkmark & - & \checkmark & - & - \\
    \end{tabularx}
    \caption{A table showing the current configuration state $ S $ and the stored finished configurations $ S_{Fi} $.}
    \label{tab:Concept:UseCaseConfigurations}
\end{table}